# FedAvg Algorithm with SAG (Scatter & Gather) workflow
<a id = "title"></a>

In this example, we will demonstrate the SAG workflow with FedAvg using the CIFAR10 dataset.

Both Job Lifecycle and training workflow are controlled on the server side; we will just use the existing available SAG controller available in NVFLARE.

For client-side training code, we will leverage the new DL to FL Client API.

First, let's look at the FedAvg Algorithm and SAG Workflow.


## Scatter and Gather (SAG)

FLARE's Scatter and Gather workflow is similar to the Message Passing Interface (MPI)'s MPI Broadcast + MPI Gather. [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) is a standardized and portable message-passing standard designed to function on parallel computing architectures. MPI consists of some [collective communication routines](https://mpitutorial.com/tutorials/mpi-broadcast-and-collective-communication/), such as MPI Broadcast, MPI Scatter, and MPI Gather.

<img src="mpi_scatter.png" alt="scatter" width=25% height=20% /><img src="mpi_gather.png" alt="gather" width=25% height=20% />



## FedAvg with SAG
We use [SAG workflow](https://nvflare.readthedocs.io/en/main/programming_guide/controllers/scatter_and_gather_workflow.html) to implement the FedAvg algorithm. You can see one round of training in such workflow.

<img src="fed_avg_one_round.png" alt="FedAvg" width=35% height=30% />

<a id = "sag"></a>
<img src="fed_avg.png" alt="FedAvg" width=50% height=45% /> <img src="sag.png" alt="Scatter and Gather" width=40% height=40% />

The FedAvg aggregation is done on the server side, its weighted on the number of training steps on each client
 
## Convert training code to federated learning training code
<a id = "code"></a>
We will use the original [Training a Classifer](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) example
in pytorch as the code base. The cleanup code (remove comments etc.) can be found in [here](../code/dl/train.py)


With the NVFLARE DL to FL Client APIs, we need to transform the existing pytorch classifer training code into Federated Classifer training code with few lines of code changes. The already converted code can be found in **[here](../code/fl/train.py)**

For detailed discussion how to convert training code into federated learning training code using Client API, you can also checked out the examples [here](https://github.com/NVIDIA/NVFlare/blob/main/examples/hello-world/ml-to-fl/README.md) and code 

The key changes are the following steps: 

```
    #  import nvflare client API
    import nvflare.client as flare

    #  initializes NVFlare client API
    flare.init()

    # gets FLModel from NVFlare
    input_model = flare.receive()

    # loads model from NVFlare
    net.load_state_dict(input_model.params)

    # evaluate on received model
    accuracy = evaluate(input_model.params)
    
    # construct trained FL model
    output_model = flare.FLModel(
        params=net.cpu().state_dict(),
        metrics={"accuracy": accuracy},
        meta={"NUM_STEPS_CURRENT_ROUND": steps},
    )
    
    # send model back to NVFlare
    flare.send(output_model)
```

If you are using pytorch-lightning, the changes are much smaller, 1-line import , 1-line change applies to trainer, 1-line global model evaluation. see [cifar10_lightning_examples](https://github.com/NVIDIA/NVFlare/blob/main/examples/hello-world/ml-to-fl/pt/cifar10_lightning_fl.py) 
# Prepare Data
<a id = "data"></a>

Let's get the data first. Follow the instruction of cifar10, we can download the data with following scripts. 


In [None]:
CIFAR10_ROOT = "/tmp/nvflare/data/cifar10"

! python ../data/download.py

## Job Folder and Configurations
<a id = "job"></a>
 
Now we need to set up the configurations for the server and clients and construct the Job folder NVFLARE needs to run. We can do this using NVFLARE job CLI. You can study the [Job CLI tutorials](https://github.com/NVIDIA/NVFlare/blob/main/examples/tutorials/job_cli.ipynb) later with all the details. But for now, you can just use the following commands to find out the available job templates.

We need to set the job templates directory so the job CLI commands can find the job templates. If you have already set `NVFLARE_HOME` to `<NVFLARE git clone directory>`, then you can skip the following step.



In [None]:
! nvflare config -jt ../../../../../job_templates

In [None]:
! nvflare job list_templates

* Create job folder and initial configs

The template **'sag_pt'** seems to fit our needs: SAG with PyTorch, using the client API. Let's create a job folder with this template initially without specifying the code location, just to see what needs to be changed.


In [None]:
! nvflare job create -j /tmp/nvflare/jobs/cifar10_sag_pt -w sag_pt

Lets also looks at the server and client configurations

In [None]:
!cat /tmp/nvflare/jobs/cifar10_sag_pt/app/config/config_fed_server.conf

In [None]:
!cat /tmp/nvflare/jobs/cifar10_sag_pt/app/config/config_fed_client.conf

* Create a job folder with all the configurations.

Let's change the `num_rounds` to 5, `script` to `train.py`, and `min_clients` to 2 in `meta.conf`. We also want to change the arguments for `train.py`: `dataset_path=CIFAR10_ROOT`, `batch_size=6`, `num_workers=2`. Note that the `dataset_path` is not actually changed, but we just want to show you that it could be changed.


In [None]:
! nvflare job create -j /tmp/nvflare/jobs/cifar10_sag_pt -w sag_pt \
-f meta.conf min_clients=2 \
-f config_fed_client.conf app_script=train.py app_config="--batch_size 6 --dataset_path {CIFAR10_ROOT} --num_workers 2" \
-f config_fed_server.conf num_rounds=5 \
-sd ../code/fl \
-force

OK, we are ready to run the job, let's look at the job folder, use "ls -al" if you don't have "tree" installed. 

In [None]:
! tree /tmp/nvflare/jobs/cifar10_sag_pt  

## Run Job
We can use simulator to run the job directly. 


In [None]:
! nvflare simulator /tmp/nvflare/jobs/cifar10_sag_pt  -w /tmp/nvflare/jobs/cifar10_sag_pt_workspace -t 2 -n 2 

The job should be running in the simulator mode. We are done with the training. 