# FedAvg with SAG workflow with MLflow tracking

In this example, we will demonstrate the FegAvg SAG workflow using the CIFAR10 dataset with MLflow tracking. 

We will show how to add tracking capabilities to the previous example [FedAvg with SAG workflow](../sag/sag.ipynb#title), specifically we will show how to add MLflow in this example.

For an overview on Federated Averaging and SAG, see the section from the previous example: [Understanding FedAvg and SAG](../sag/sag.ipynb#sag)

## Experiment tracking

In any machine learning and deep learning scenario, we are trying to get the best model after training.
An important part of that is the convergence and keep tracks of different metrics / losses as training proceeds.
There are many tracking tools available, for example TensorBoard, MLflow, and Weights and Biases.
NVFlare has the capability to incorporate these tools to send all the client site's metrics and losses back to the server site.
And you can monitor and keep tracking of the whole federated learning progress by interacting with the NVFlare server machine.

## Training code changes

You need to just import the MLflowWriter, create an instance of it and you can use the methods it provides.
For example:

```

import nvflare.client as flare
from nvflare.client.tracking import MLflowWriter

flare.init()
mlflow = MLflowWriter()

mlflow.log_metric("loss", 0.2)

```


## Job Configuration

To configure the experiment / metrics tracking, we need to add the following components in the config_fed_client.conf:

1. `MetricRelay` component, so the metrics will be sending to the server for gathering
2. Another `CellPipe` component for "metrics_exchange" is needed by `MetricRelay` component
3. `ExternalConfigurator` component, so the client api can be initialized with required information

Since client side send metrics/losses to server side, the server side needs to receive these information, we need to add the following components in the config_fed_server.conf:

1. `MLflowReceiver`

You can configure `tracking_uri` and the following arguments `experiment_name`, `run_name`, `experiment_tags` and `run_tags` of `MLflowReceiver`, we want to save to local so we specify `tracking_uri` as empty "" 


Let's use the Job CLI to create the job from the sag_pt_mlflow template:

In [None]:
! nvflare config -jt ../../../../../job_templates

In [None]:
! nvflare job create -j /tmp/nvflare/jobs/cifar10_sag_pt_mlflow -w sag_pt_mlflow \
-f meta.conf min_clients=2 \
-f config_fed_client.conf app_script=train_with_mlflow.py app_config="--batch_size 6 --dataset_path /tmp/nvflare/data/cifar10 --num_workers 2" \
-f config_fed_server.conf num_rounds=5 experiment_name="nvflare-sag-pt-experiment" run_name="nvflare-sag-pt-with-mlflow" tracking_uri=\"\" \
-sd ../code/fl \
-force

We can take a look at the server and client configurations and make any changes as desired:

In [None]:
! cat /tmp/nvflare/jobs/cifar10_sag_pt_mlflow/app/config/config_fed_server.conf

In [None]:
! cat /tmp/nvflare/jobs/cifar10_sag_pt_mlflow/app/config/config_fed_client.conf

## Prepare Data

Make sure the CIFAR10 dataset is downloaded with the following script:

In [None]:
! python ../data/download.py --dataset_path /tmp/nvflare/data/cifar10

## Run the Job

Now we can run the job with the simulator:

In [None]:
! nvflare simulator /tmp/nvflare/jobs/cifar10_sag_pt_mlflow -w /tmp/nvflare/cifar10_sag_pt_mlflow -t 2 -n 2 

## Check the results

After the experiment is finished, you can view the results in one of the following ways.

Please refer to MLflow documentation for more information.

If the tracking_uri is specified, you can directly go to the tracking_uri to view the results

If the tracking_uri is not specified, the results will be saved in `/tmp/nvflare/cifar10_sag_pt_mlflow/mlruns/`

You can then run the mlflow command: `mlflow ui --port 5000` inside the directory `/tmp/nvflare/cifar10_sag_pt_mlflow/`

Then you should be seeing similar thing as the following screenshot:


<img src=mlflow.png width=95% height=95% />

