# Understanding FLARE federated learning Job structure


## What is NVFlare Job ? 


NVFlare Job refers to a job configuration used within the NVIDIA FLARE framework.  

In NVFlare, a job is a unit of work that defines the specific tasks to be executed during a federated learning process. It encapsulates all necessary configurations, scripts, and resources needed to run an FL task, such as training, validation, or evaluation, across multiple participants in a federated system.

A job may have many apps. Each app consists code specific for the site (client site or server site) as well as configurations.  


In this section, we will take a look at the Job structure as well as Job API ( akak job construction API). 

## Job creation API

NVFlare defined a python API to make it easier to create job. Let's take a closer look at Job API


```python

from src.network import SimpleNetwork

from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob
from nvflare.job_config.script_runner import ScriptRunner

if __name__ == "__main__":
    n_clients = 5
    num_rounds = 2
   
    train_script = "src/client.py"


    job = FedJob(name=job_name, min_clients=num_clients)

    controller = FedAvg(
        stop_cond="accuracy > 25",
        save_filename="global_model.pt",
        initial_model=SimpleNetwork(),
        num_clients=num_clients,
        num_rounds=num_rounds,
    )

    job.to_server(controller)

    # Add clients
    for i in range(n_clients):
        executor = ScriptRunner(
            script=train_script, script_args="" 
        )
        job.to(executor, f"site-{i + 1}")

    job.simulator_run("/tmp/nvflare/jobs/workdir")


```

### Server

We create FedJob, we create a FedAvg Algorithm ( called Controller, details later) and add to the server of the Fed Job. 


#### Client Side

On the client side, we are using `client.py`, leveraging a `ScriptRunner` class which can run `client.py` directly. We assign the client code to each site, all running the same code and training parameters.

executor = ScriptRunner(script=train_script, script_args="")
job.to(executor, f"site-{i + 1}")

Finally, we run the simulator with this:

```
    job.simulator_run("/tmp/nvflare/jobs/workdir")
```


The overall Job creation pattern is like this:

* Create a Fed Job:
 
The overall Job creation pattern is like this

* Create a Fed Job: 

   ```
        class MyJob(BaseFedJob):
            pass

    job = MyJob()

    server_side_algorithm =  FedAvg( ...)  # which we call controller
    job.to_server(server_side_algorithm)


    client_side_algorithm = ScriptRunner( script=train_script, script_args="") 
    # assign it to client site
    job.to(client_side_algorithm, site_name)

   ```



The job API will create the job for you. The  

```
job.simulator_run("/tmp/nvflare/jobs/workdir") 

```

is actually create a job, then use simulator run the job. 

Let's use 

```
   job.export_job("/tmp/nvflare/jobs/job_config")
```
to generate job configuration without running the job. This code will be located at [fl_job_config.py](code/fl_job_config.py)



In [None]:
%cd code

In [None]:
! python3 fl_job_config.py


Now we have create job configuration, let's take a closer look. 

## Job structure

In [None]:

! tree /tmp/nvflare/jobs/job_config/fedavg

The job name "FedAvg" is folder structure, with each folder representing one app at one site. 

* **"app_server"**:  is the name for the server app

* **"app_site-n"**:  is the name for the client app

* for each site: it consits of 
   * **config**:  directory which contains side specific configuration

   * **custom**:  store the custom code for the specifc site

These names can be changed if you manually edit these configurations. By default Job API uses above conventions. 


* meta.json gives additional information related to the each app's deployment. 

```
{
    "name": "fedavg",
    "resource_spec": {},
    "min_clients": 1,
    "deploy_map": {
        "app_server": [
            "server"
        ],
        "app_site-1": [
            "site-1"
        ],
        "app_site-2": [
            "site-2"
        ],
        "app_site-3": [
            "site-3"
        ],
        "app_site-4": [
            "site-4"
        ],
        "app_site-5": [
            "site-5"
        ]
    }
}
```

A simplifed format of job structure can also be used when the client code and configuration is the same for all sites

```
/tmp/nvflare/jobs/job_config/fedavg
├── app_server
│   ├── config
│   │   └── config_fed_server.json
│   └── custom
│       └── src
│           └── network.py
├── app_client
│   ├── config
│   │   └── config_fed_client.json
│   └── custom
│       ├── network.py
│       └── src
│           └── client.py
└── meta.json


```

meta.json needs to be 


```
{
    "name": "fedavg",
    "resource_spec": {},
    "min_clients": 1,
    "deploy_map": {
        "app_server": [
            "server"
        ],
        "app_client": [
            "site-1", "site-2", "site-3", "site-4", "site-5" 
        ]
    }
}
```


If we don't mind deploy all code to all sites, we can change the job config into the followings

A simplifed format of job structure can also be used when the client code and configuration is the same for all sites

```
/tmp/nvflare/jobs/job_config/fedavg
├── app
│   ├── config
    |   └── config_fed_client.json
│   │   └── config_fed_server.json
│   └── custom
│       └── src
│           └── network.py
|           └── client.py
└── meta.json


```

meta.json needs to be 


```
{
    "name": "fedavg",
    "resource_spec": {},
    "min_clients": 1,
    "deploy_map": {
         app = ["@ALL"]
    }
}
```



## Job Configuration

We have convered a lot of ground so far. You could stop here, and move to the next chapter of the training materials. 

But if you like to futher understand how NVIDIA FLARE works, you might want to go through this section: Job Configuration. 


In [None]:
%cd code

In [None]:
! tree /tmp/nvflare/jobs/workdir/fedavg/


At each site, there is job configuration file: 


* ```config_fed_client.json``` Or
* ```config_fed_server.json```

These are the job configuration,

### Server Configuration

In [None]:
! cat /tmp/nvflare/jobs/workdir/fedavg/app_server/config/config_fed_server.json

The server configuration is a json file descripe the workflows. In our case, we defined one workflow, whci has a controller using our defined FedAvg class. 


>Note:  The configuration pattern is like the followings
```
    id: <unquie id>,
    path: <class_path>,
    args: {
        class contructor arguments
    }
```


### Client Configurations

We look at the site-1 client's configuration 

In [None]:
! cat /tmp/nvflare/jobs/workdir/fedavg/app_site-1//config/config_fed_client.json

the configuration is simular, it defines an array of "executors", a builtin ```PTInProcessClientAPIExecutor``` is used, 
which takes the training script client.py and its corresponding arguments as input. 


```
  "executor": {
                "path": "nvflare.app_opt.pt.in_process_client_api_executor.PTInProcessClientAPIExecutor",
                "args": {
                    "task_script_path": "src/client.py",
                    "task_script_args": "--learning_rate 0.01 --batch_size 12"
                }
            }

```


The default Job configuration is json, but one can also use pyhocon or YAML, please refer to [config file documentation](https://nvflare.readthedocs.io/en/2.4/user_guide/configurations.html) for details


## Simulator CLI

With these job configuration, one can directly run simulator from command line. Here is the syntax and we will use it to run our previous job 



In [None]:
! nvflare simulator --help

In [None]:
!nvflare simulator  /tmp/nvflare/jobs/workdir/fedavg/  -w /tmp/nvflare/jobs/workdir/fedavg/workspace -n 5 -t 5 

Hope you have a good standing of working with NVIDIA FLARE job so far. Let's move on to other chapters. 