   # Hello PyTorch with MLflow

In this example, we like to demonstrate that the example code used in hello-pt with MLFlow



Example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) to train an image classifier using federated averaging ([FedAvg]([FedAvg](https://arxiv.org/abs/1602.05629))) and [PyTorch](https://pytorch.org/) as the deep learning training framework. 

This example also highlights the streaming capability from the clients to the server with MLFLow 

> **_NOTE:_** This example uses the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset and will load its data within the trainer code.


### 1. Examaples overview 

We have two examples under hello-pt-mlflow

* hello-pt-mlflow-job 
  This example demonstrate the user can use MLFLow API syntax to log_metric(s), log_parameter(s), log_text and set_tag(s)
  The experiment trackinig logs will be streamed Federated Server and handled by MLFlowReceiver, which then delivered to
  MLFlow tracking Server with MLflow tracking URL. 
    
  Even with the same sytax, if user prefer to display via Tensorboard, the example, also demonstrated that the same tracking log
  can be displayed to Tensorboard without code changes
    

* hello-pt-tb-mlflow-job 

  In this example, user uses Tensorboard SummaryWriter API sytax: add_scalar, add_scalars. the same log event, can be displayed
  in MLFlow without user change the code. No MLFlow trackinig server.     
    
    

    



### 2. Install NVIDIA FLARE

Follow the [Installation](https://nvflare.readthedocs.io/en/main/getting_started.html#installation) instructions.
Install additional requirements:


In [None]:
%%bash
pip3 install torch torchvision tensorboard mlflow

### 3. hello-pt-mlflow-job
#### 3.1 configuration

**Client Configuration**

```
 "components": [
    {
      "id": "pt_learner",
      "path": "pt_learner.PTLearner",
      "args": {
        "lr": 0.01,
        "epochs": 5,
        "analytic_sender_id": "mlflow_sender"
      }
    },
    {
      "id": "mlflow_sender",
      "path": "nvflare.app_opt.tracking.mlflow.mlflow_sender.MLFlowSender",
      "args": {"event_type": "analytix_log_stats"}
    },
    {
      "id": "event_to_fed",
      "name": "ConvertToFedEvent",
      "args": {"events_to_convert": ["analytix_log_stats"], "fed_event_prefix": "fed."}
    }
  ]
}
```
to use MLflow API syntax, we need to register with MLFlowSender


**Server Configuration**

  in addition to the other normal configuration for training, we need to add the following component to handle
  the streamed events. 
  
  If the MLfLow tracking server is used, we need to specify the tracking URL, 
  If the MLflow tracking server is not user, we don't need to specify tracking URL in the argument. 
  
  **with tracking server**
  
``` 
  {
      "id": "mlflow_receiver",
      "path": "nvflare.app_opt.tracking.mlflow.mlflow_receiver.MLFlowReceiver",
      "args": {
        "kwargs": {"experiment_name": "hello-pt-experiments"},
        "artifact_location": "artifacts",
        "tracking_uri" : "http://<tracking_server_host>:5000"
      }
    }
```    
   **without tracking server**
```   
  
  {
      "id": "mlflow_receiver",
      "path": "nvflare.app_opt.tracking.mlflow.mlflow_receiver.MLFlowReceiver",
      "args": {
        "kwargs": {"experiment_name": "hello-pt-experiments"},
        "artifact_location": "artifacts",
      }
    }
```

#### 3.2 MLFlow Tracking Server
 
MLFLow Tracking Server can be setup and deployed separately. For example, in Azure ML Workspace, the MLFlow tracking server is already setup, all one needs is to find out the tracking URL
 
In this example, we will setup a simple tracking server with SQLite database: 

```
mlflow server --backend-store-uri=sqlite:///mlrunsdb15.db  --host localhost --port 5000

```
the user then can go to http://localhost:5000 to monitoring the experiments during job run
 
 

#### 3.3  Run the experiment

Use nvflare simulator to run the hello-examples, assuming NVFLARE_HOME is setup and point to the github clone of the NVFLARE code base. 


In [None]:
%%bash
cd $NVFLARE_HOME

nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 examples/hello-pt-mlflow/hello-pt-mlflow-job

#### 3.4. Runing experiments without tracking server and tracking URL

if we don't specify tracking URL and no tracking server. 


```

  {
      "id": "mlflow_receiver",
      "path": "nvflare.app_opt.tracking.mlflow.mlflow_receiver.MLFlowReceiver",
      "args": {
        "kwargs": {"experiment_name": "hello-pt-experiments"},
        "artifact_location": "artifacts",
      }
    }
```
we can simply run the experiments as before. Meanwhile, we can track the progress by the following command
(notice our workspace is point to /tmp/nvflare) 


**mlflow ui --backend-store-uri=/tmp/nvflare/mlrun**


run above from terminal ( it doesn't work running from Notebook)
```
 mlflow ui --backend-store-uri=/tmp/nvflare/mlruns
 
[2023-01-05 15:30:38 -0800] [71735] [INFO] Starting gunicorn 20.1.0
[2023-01-05 15:30:38 -0800] [71735] [INFO] Listening at: http://127.0.0.1:5000 (71735)
[2023-01-05 15:30:38 -0800] [71735] [INFO] Using worker: sync
[2023-01-05 15:30:38 -0800] [71737] [INFO] Booting worker with pid: 71737
[2023-01-05 15:30:38 -0800] [71738] [INFO] Booting worker with pid: 71738
[2023-01-05 15:30:38 -0800] [71739] [INFO] Booting worker with pid: 71739
[2023-01-05 15:30:38 -0800] [71740] [INFO] Booting worker with pid: 71740

```

then user should open http://127.0.0.1:5000 via browser check the results


#### 3.5.  Tensorboard Reciver

In this example, we uses the log_params(), log_text(), log_metrics(), set_tags() in various places in the code. 
You should be able to see them in the MLFlow UI http://localhost:5000 

What happens if we replace MLFlow Receiver with Tensorboard Reciever ? 

**Server Config**

Replace the following component
```
  {
      "id": "mlflow_receiver",
      "path": "nvflare.app_opt.tracking.mlflow.mlflow_receiver.MLFlowReceiver",
      "args": {
        "kwargs": {"experiment_name": "hello-pt-experiments"},
        "artifact_location": "artifacts",
        "tracking_uri" : "http://<tracking_server_host>:5000"
      }
    }
```
with
```
    {
      "id": "tb_analytics_receiver",
      "name": "TBAnalyticsReceiver",
      "args": {"events": ["fed.analytix_log_stats"]}
    },
```


re-run the example without any code changes and then launch Tensorboard to view the result

```
tensorboard --logdir=/tmp/nvflare/simulate_job/tb_events
```


### 4. hello-pt-tb-mlflow-job

This example is the same as hello-pt-tb. Except that we add one more component in the server configuration:
**MLFlowReceiver**. This is on top of the tensorboard receiver that is already in place. 

In other words, we will have two receivers for the same tracking data from Tensorboard summary writer. The client code has no change. 

#### 4.1 Configuration

**Server Configuration**

    {
      "id": "mlflow_receiver",
      "path": "nvflare.app_opt.tracking.mlflow.mlflow_receiver.MLFlowReceiver",
      "args": {
        "kwargs": {"experiment_name": "hello-pt-experiments"},
        "artifact_location": "artifacts"
      }
    }

#### 4.2 Run the experiment

Use nvflare simulator to run the hello-examples, assuming NVFLARE_HOME is setup and point to the github clone of the NVFLARE code base.

In [None]:
%%bash
cd $NVFLARE_HOME

nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 examples/hello-pt-mlflow/hello-pt-tb-mlflow-job


#### 4.3 View Results

**Tensorboard View**

from terminal: 
```
tensorboard --logdir=/tmp/nvflare/simulate_job/tb_events
TensorFlow installation not found - running with reduced feature set.

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.11.0 at http://localhost:6007/ (Press CTRL+C to quit)

```
then open broser with  http://localhost:6007/ URL

**MLFlow View**

From terminal: 

```
mlflow ui --backend-store-uri=/tmp/nvflare/mlruns
```

Then, look at the URL in browser http://localhost:5000/
