# FedAvg with MLflow tracking

In this example, we will demonstrate the FegAvg using the CIFAR10 dataset with MLflow tracking. 

We will show how to add tracking capabilities to the previous example [FedAvg with SAG workflow](../sag/sag.ipynb#title), specifically we will show how to add MLflow in this example.

For an overview on Federated Averaging and SAG, see the section from the previous example: [Understanding FedAvg and SAG](../sag/sag.ipynb#sag)

## Experiment tracking

In any machine learning and deep learning scenario, we are trying to get the best model after training.
An important part of that is the convergence and keep tracks of different metrics / losses as training proceeds.
There are many tracking tools available, for example TensorBoard, MLflow, and Weights and Biases.
NVFlare has the capability to incorporate these tools to send all the client site's metrics and losses back to the server site.
And you can monitor and keep tracking of the whole federated learning progress by interacting with the NVFlare server machine.

## Training code changes

You need to just import the MLflowWriter, create an instance of it and you can use the methods it provides.
For example:

```

import nvflare.client as flare
from nvflare.client.tracking import MLflowWriter

flare.init()
mlflow = MLflowWriter()

mlflow.log_metric("loss", 0.2)

```

## Prepare Data

Make sure the CIFAR10 dataset is downloaded with the following script:

In [None]:
! python ../data/download.py --dataset_path /tmp/nvflare/data/cifar10

## Job Configuration

To configure the experiment / metrics tracking, we need to add the following components in the config_fed_client.conf:

1. `MetricRelay` component, so the metrics will be sending to the server for gathering
2. Another `CellPipe` component for "metrics_exchange" is needed by `MetricRelay` component
3. `ExternalConfigurator` component, so the client api can be initialized with required information

Since client side send metrics/losses to server side, the server side needs to receive these information, we need to add the following components in the config_fed_server.conf:

1. `MLflowReceiver`

You can configure `tracking_uri` and the following arguments `experiment_name`, `run_name`, `experiment_tags` and `run_tags` of `MLflowReceiver`, we want to save to local so we specify `tracking_uri` as empty "" 


Let's first copy the required files over:

In [None]:
! cp ../code/fl/train_with_mlflow.py train_with_mlflow.py
! cp ../code/fl/net.py net.py

We can use Job API to easily create a job and run in simulator:

In [None]:
from net import Net

from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob
from nvflare.app_opt.tracking.mlflow.mlflow_receiver import MLflowReceiver
from nvflare.job_config.script_runner import ScriptRunner

if __name__ == "__main__":
    n_clients = 2
    num_rounds = 5
    train_script = "train_with_mlflow.py"

    job = FedAvgJob(
        name="cifar10_fedavg",
        n_clients=n_clients,
        num_rounds=num_rounds,
        initial_model=Net()
    )
    
    job.to(
        MLflowReceiver(
            tracking_uri="file:///{WORKSPACE}/{JOB_ID}/mlruns",
            kw_args={
                "experiment_name": "nvflare-sag-pt-experiment",
                "run_name": "nvflare-sag-pt-with-mlflow"
            }
        ),
        "server"
    )

    # Add clients
    for i in range(n_clients):
        runner = ScriptRunner(
            script=train_script, script_args="--batch_size 6 --num_workers 2"
        )
        job.to(runner, f"site-{i+1}")

    job.export_job("/tmp/nvflare/jobs")
    job.simulator_run("/tmp/nvflare/jobs/workdir", gpu="0")


## Run Job

The previous cell exports the job config and executes the job in NVFlare simulator.

If you want to run in production system, you will need to submit this exported job folder to nvflare system.


## Check the results

After the experiment is finished, you can view the results in one of the following ways.

Please refer to MLflow documentation for more information.

If the tracking_uri is specified, you can directly go to the tracking_uri to view the results

If the tracking_uri is not specified, the results will be saved in `/tmp/nvflare/jobs/workdir/server/simulate_job/mlruns/`

You can then run the mlflow command: `mlflow ui --port 5000` inside the directory `/tmp/nvflare/jobs/workdir/server/simulate_job`

Then you should be seeing similar thing as the following screenshot:


<img src=mlflow.png width=95% height=95% />



In [None]:
!mlflow ui --port 5000 --backend-store-uri /tmp/nvflare/jobs/workdir/server/simulate_job/mlruns/


Make sure you "stop" the above Cell when you done with review the MLFlow results. 

Next we will look at the [sag_he](../sag_he/sag_he.ipynb) example, which demonstrates how to enable homomorphic encryption using the POC -he mode.