# Understanding FLARE Federated Job Structure


 ## What is an NVFlare Job?

In NVIDIA FLARE, a job is a unit of work that defines the specific tasks to be executed during a federated learning process. It encapsulates all necessary configurations, scripts, and resources needed to run an FL task, such as training, validation, or evaluation, across multiple participants in a federated system.

A job may have many apps. Each app consists of code specific for the site (client site or server site) as well as configurations.

In this section, we will take a look at the Job structure as well as the Job API (aka job construction API).


## Job creation API

NVIDIA FLARE introduces a python API, [Job API](https://nvflare.readthedocs.io/en/main/programming_guide/fed_job_api.html), making it easy to create and configure a job. 

Let's take a closer look at Job API.

The overall job creation pattern consists of the following steps:

1. **Create a Fed Job**

```python
from nvflare.job_config.api import FedJob

class MyJob(FedJob):
    ...

job = MyJob()
```
You can use the base [`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/api.py#L162) class, or override to create your custom job class.


2. **Server-side configuration**

We then create a server-side algorithm / workflow (called a Controller, for instance the FedAvg Controller) and add to the server of the Fed Job. 

```python
server_side_algorithm =  FedAvg( ...)  # create server controller
job.to_server(server_side_algorithm)
```

3. **Client-side configuration**

On the client side, we a can leverage the [`ScriptRunner` class](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/script_runner.py#L313), which can run a client-side training script directly. We assign the client code to each site, using the `job.to()` function.

```python
client_side_algorithm = ScriptRunner( script=train_script, script_args="") 

# assign it to each client site
job.to(client_side_algorithm, site_name)
...
```

Notice that you can use `job.to()` function to send any components to the server or clients, if needed.

4. **Export the job**

We can use `job.export_job` to export the job to a local folder, for instance:
```python
job.export_job("/tmp/nvflare/jobs/job_config")
```
The exported job can later be run using FLARE's runtime, such as the [FL Simulator](https://nvflare.readthedocs.io/en/main/user_guide/nvflare_cli/fl_simulator.html).

We can also run the job using FL Simulator directly:

```python
job.simulator_run("/tmp/nvflare/jobs/workdir") 
```
This will actually create a job, then use simulator to run the job. 

Let's run the example script provide in [code/fl_job_config.py](code/fl_job_config.py) to export a job:

In [None]:
! cd code && python3 fl_job_config.py

Now we have create job configuration, let's take a closer look. 

## Job structure

In [None]:
! tree /tmp/nvflare/jobs/workdir/fedavg -L 2

The job name `fedavg` is a folder structure, with each folder representing one app at one site.

* **`meta.json`** is a file that gives meta information related to the applications deployed to each site (server and clients). 

```json
{
    "name": "fedavg",
    "resource_spec": {},
    "min_clients": 1,
    "deploy_map": {
        "app_server": [
            "server"
        ],
        "app_site-1": [
            "site-1"
        ],
        "app_site-2": [
            "site-2"
        ],
        "app_site-3": [
            "site-3"
        ],
        "app_site-4": [
            "site-4"
        ],
        "app_site-5": [
            "site-5"
        ]
    }
}
```

* **"app_server"**:  is the name for the server app

* **"app_site-n"**:  is the name for the client app

* for each site: it consits of 
   * **config**:  directory which contains side specific configuration

   * **custom**:  store the custom code for the specifc site

These names can be modified using custom configurations. By default Job API uses above conventions. 

A simplifed format of job structure can also be used when the code and configuration are the same for all clients:

```shell
/tmp/nvflare/jobs/job_config/fedavg
├── app_server
│   ├── config
│   │   └── config_fed_server.json
│   └── custom
│       └── src
│           └── network.py
├── app_client
│   ├── config
│   │   └── config_fed_client.json
│   └── custom
│       ├── network.py
│       └── src
│           └── client.py
└── meta.json
```

In this case, `meta.json` needs to be 

```json
{
    "name": "fedavg",
    "resource_spec": {},
    "min_clients": 1,
    "deploy_map": {
        "app_server": [
            "server"
        ],
        "app_client": [
            "site-1", "site-2", "site-3", "site-4", "site-5" 
        ]
    }
}
```


If the code and configuration are the same for all sites (server and clients), and if we don't mind deploy all code to all sites, we further simplify the job structure into the followings:
```bash
/tmp/nvflare/jobs/job_config/fedavg
├── app
│   ├── config
    |   └── config_fed_client.json
│   │   └── config_fed_server.json
│   └── custom
│       └── src
│           └── network.py
|           └── client.py
└── meta.json
```

In this case, `meta.json` needs to be: 

```json
{
    "name": "fedavg",
    "resource_spec": {},
    "min_clients": 1,
    "deploy_map": {
         app = ["@ALL"]
    }
}
```

## Job configuration in-depth

We have covered a lot of ground so far. You could stop here and move to the next chapter of the training materials.

But if you would like to further understand how NVIDIA FLARE works, you might want to go through this section: Job Configuration.


In [None]:
! tree /tmp/nvflare/jobs/workdir/fedavg/*/config

At each site, there is job configuration file: 


* ```config_fed_client.json``` Or
* ```config_fed_server.json```

### Server configuration

In [None]:
! cat /tmp/nvflare/jobs/workdir/fedavg/app_server/config/config_fed_server.json

The server configuration is a JSON file describing the workflows. In our case, we defined one workflow, which has a controller using our defined FedAvg class.


>Note:  The configuration pattern is like the followings
```
    id: <unquie id>,
    path: <class_path>,
    args: {
        class contructor arguments
    }
```


### Client configurations

We look at the site-1 client's configuration 

In [None]:
! cat /tmp/nvflare/jobs/workdir/fedavg/app_site-1//config/config_fed_client.json

The configuration is similar; it defines an array of "executors". A built-in `PTInProcessClientAPIExecutor` is used, which takes the training script client.py and its corresponding arguments as input. 


```
  "executor": {
                "path": "nvflare.app_opt.pt.in_process_client_api_executor.PTInProcessClientAPIExecutor",
                "args": {
                    "task_script_path": "src/client.py",
                    "task_script_args": "--learning_rate 0.01 --batch_size 12"
                }
            }

```


The default Job configuration is json, but one can also use pyhocon or YAML, please refer to [config file documentation](https://nvflare.readthedocs.io/en/2.4/user_guide/configurations.html) for details


## Simulator CLI

With these job configuration, one can directly run simulator from command line. Here is the syntax and we will use it to run our previous job 



In [None]:
! nvflare simulator --help

In [None]:
!nvflare simulator  /tmp/nvflare/jobs/workdir/fedavg/  -w /tmp/nvflare/jobs/workdir/fedavg/workspace -n 5 -t 5 

Hope you now have a good understanding of NVIDIA FLARE's jobs. 

Next, let's learn about FLARE's logging configuration: [Logging Configuration](../01.7_logging/logging.ipynb).