# Hello World Examples

Before NVIDIA FLARE 2.5, the Hello World Examples were presented in a notebook, with instructions on setting up a POC environment and then using the FLARE API to submit and follow along each job. While this can still be done, the new Job API simplifies the user experience to get started.

The Job API allows for the easy creation of jobs programatically, and even allows for the ability to easily run the job with the FLARE Simulator.

Previously, the jobs were in a jobs folder in each example, but since each job is now created by the Job API, you just have to follow along the Python script that creates the job in each example.  At the end of each script is a line that runs the job with a specified workspace directory:

```
job.simulator_run("/tmp/nvflare/jobs/workdir")
```

Simply executing the script (for example `python fedavg_script_runner_hello-numpy.py`) will create and run the job with the FLARE Simulator.

## Running with POC mode and FLARE API

If you would like to run the examples in FLARE with the POC mode using the [FLARE API](../tutorials/flare_api.ipynb) like before, you can export the job to a specified location and then you can submit the job from there.

```
job.export_job("/tmp/nvflare/exported/job/hello-fedavg-numpy")
```

The following instuctions in this notebook guide you through the steps to set up FLARE with POC mode and then submit jobs with the FLARE API, assuming that you have run the `export_job()` command after creating the job with the script in the folder with each example. In each `submit_job(job_folder)` command below, make sure that the job folder you are submitting matches where you exported each job to.

Each example below is self-contained. You can start from any example, but you must run through the 3 steps of each example in sequence.

## Prerequisites
Before you can run the examples here, the following preparation work must be done:

1. Install a virtual environment by following the instructions in [README.md](https://github.com/NVIDIA/NVFlare/tree/main/examples)
2. Install Jupyter Lab and install a new kernel for the virtualenv called `nvflare_example`
3. Install NVFlare following this [notebook](../nvflare_setup.ipynb)
4. Start NVFlare in POC mode following this [notebook](../tutorials/setup_poc.ipynb). All the examples in this notebook require 2 clients to run.

## Hello World start scripts

To help you quickly get started, we prepared a set of start/stop NVFLARE in POC mode scripts which capture the steps described above. These scripts must be run **from a terminal**.

In the terminal, you can use the following convenience script to create and enter a ``nvflare_example`` venv:

```../set_env.sh```

Then making sure you are in the ``nvflare_example`` venv, you can run the ``./hw_pre_start.sh`` script to install NVFlare, provision, and start the FL system in POC mode.

If you encounter errors, **do not repeatedly run ./hw_pre_start.sh**. Instead, try shutting down the NVFLARE system using:

```
./hw_post_cleanup.sh 
```

You can check if the system is shut down cleanly with:

```
ps -eaf | grep nvflare
```

If you see output like the following, then the nvflare systems are still running

```
510535    1932  1 18:54 pts/1    00:00:03 python3 -u -m nvflare.private.fed.app.client.client_train -m /tmp/workspace/example_project/prod_00/site-1/startup/.. -s fed_client.json --set secure_train=true uid=site-1 org=nvidia config_folder=config
510539    1932  1 18:54 pts/1    00:00:03 python3 -u -m nvflare.private.fed.app.client.client_train -m /tmp/workspace/example_project/prod_00/site-2/startup/.. -s fed_client.json --set secure_train=true uid=site-2 org=nvidia config_folder=config
510543    1932  1 18:54 pts/1    00:00:04 python3 -u -m nvflare.private.fed.app.server.server_train -m /tmp/workspace/example_project/prod_00/localhost/startup/.. -s fed_server.json --set secure_train=true org=nvidia config_folder=config
```

Make sure they are cleanly exited before you try to start the nvflare again. You can use ``nvflare poc stop`` and kill the processes if needed.



## Check FL System Status

If the NVFLARE system is up and running, then we are ready to check FL system status and run jobs.  

**Warning**:  this step will fail if FL system is not running


In [None]:
import os
import time

from nvflare.fuel.flare_api.flare_api import new_secure_session
from nvflare.fuel.flare_api.flare_api import NoConnection 

workspace = "/tmp/nvflare/poc"
default_poc_prepared_dir = os.path.join(workspace, "example_project/prod_00")
admin_dir = os.path.join(default_poc_prepared_dir, "admin@nvidia.com")

# the following try/except is usually not needed, we need it here to handle the case when you "Run all cells" or use notebook automation. 
# in the "Run all cells" case, JupyterLab seems to try to connect to the server before it starts (even though the execution is supposed to be sequential),
# which will result in a connection timeout. We use try/except to capture the scenario since extra sleep time doesn't seem to help.

try: 
   sess = new_secure_session("admin@nvidia.com", admin_dir, timeout=5)
except NoConnection:
    time.sleep(10)
    
    
flare_not_ready = True
while flare_not_ready: 
    print("trying to connect to server")
    try:
        sess = new_secure_session("admin@nvidia.com", admin_dir)
    except NoConnection:
        print("CANNOT CONNECT AFTER 10 SECONDS")
        continue

    sys_info = sess.get_system_info()

    print(f"Server info:\n{sys_info.server_info}")
    print("\nClient info")
    for client in sys_info.client_info:
        print(client)
    flare_not_ready = len( sys_info.client_info) < 2
        
    time.sleep(2)


### Utilities

**Monitoring Job**

You can use a custom function to format the output for the `monitor_job()` command. Here is one example function to display the job information:

In [None]:
import json
from nvflare.fuel.flare_api.flare_api import Session

def status_monitor_cb(
        session: Session, job_id: str, job_meta, *cb_args, **cb_kwargs
    ) -> bool:
    if job_meta["status"] == "RUNNING":
        if cb_kwargs["cb_run_counter"]["count"] < 3 or cb_kwargs["cb_run_counter"]["count"]%15 == 0:
            print(job_meta)            
        else:
            # avoid printing job_meta repeatedly to save space on the screen and not overwhelm the user
            print(".", end="")
    else:
        print("\n" + str(job_meta))
    
    cb_kwargs["cb_run_counter"]["count"] += 1
    return True


def format_json( data: dict):
    # Helper function to format output of list_jobs()
    print(json.dumps(data, sort_keys=True, indent=4,separators=(',', ': ')))


## Hello FedAvg NumPy

The `hello-world/hello-fedavg-numpy` example showcases the FedAvg workflow. See [this](https://nvflare.readthedocs.io/en/main/examples/hello_fedavg_numpy.html) for details on the example.

### 1. Submit job using FLARE API

Start a FLARE API session and submit the `hello-fedavg-numpy` job that you exported:

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_secure_session

poc_workspace = "/tmp/nvflare/poc"
poc_prepared = os.path.join(poc_workspace, "example_project/prod_00")
admin_dir = os.path.join(poc_prepared, "admin@nvidia.com")
sess = new_secure_session("admin@nvidia.com", startup_kit_location=admin_dir)

job_folder = os.path.join(os.getcwd(), "/tmp/nvflare/exported/job/hello-fedavg-numpy")
job_id = sess.submit_job(job_folder)

print(f"Job is running with ID {job_id}")

### 2. Wait for the job

The command `monitor_job()` will wait for the job until it is done.

In [None]:
sess.monitor_job(job_id, cb=status_monitor_cb, cb_run_counter={"count":0})

You can get additional information about the jobs on the server with `list_jobs()`, and you can get the information for a specific job with the `get_job_meta()` command with the ``job_id``.

In [None]:
list_jobs_output_detailed = sess.list_jobs(detailed=True)
print(format_json(list_jobs_output_detailed))

In [None]:
sess.get_job_meta(job_id)

### 3. Get the result


In [None]:
import numpy as np
result = sess.download_job_result(job_id)
array = np.load(result + "/workspace/models/server.npy")
print(array)

#### Clean up result directory

In [None]:
rm -r {result}

## Hello Cross-Site Validation

The `hello-world/hello-cross-val` example demonstrates how to perform cross site validation after training.

Please refer to the [documentation](https://nvflare.readthedocs.io/en/main/examples/hello_cross_val.html) for the details.

### 1. Submit job using FLARE API

Starting a FLARE API session and submit the `hello-cross-val` job

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_secure_session

poc_workspace = "/tmp/nvflare/poc"
poc_prepared = os.path.join(poc_workspace, "example_project/prod_00")
admin_dir = os.path.join(poc_prepared, "admin@nvidia.com")
sess = new_secure_session("admin@nvidia.com", admin_dir)

job_folder = os.path.join(os.getcwd(), "/tmp/nvflare/exported/job/hello-cross-val")
job_id = sess.submit_job(job_folder)

print(f"Job is running with ID {job_id}")

### 2. Wait for the job

In [None]:
sess.get_job_meta(job_id)

In [None]:
sess.monitor_job(job_id, cb=status_monitor_cb, cb_run_counter={"count":0})

### 3. Get the result

In [None]:
import json
import pprint

result = sess.download_job_result(job_id)
with open(result + "/workspace/cross_site_val/cross_val_results.json", "r") as f:
  cross_val_result = json.load(f)

pp = pprint.PrettyPrinter(indent=2)
pp.pprint(cross_val_result)

#### Clean up result directory

In [None]:
rm -r {result}

## Hello Cyclic Weight Transfer

The `hello-world/hello-cyclic` example uses the CyclicController workflow to implement [Cyclic Weight Transfer](https://pubmed.ncbi.nlm.nih.gov/29617797/) with TensorFlow as the deep learning training framework.

To use this example, tensorflow must be installed using the `requirements.txt`,

    pip install -r hello-world/hello-cyclic/requirements.txt
    
This examples needs access to [MNIST dataset](http://yann.lecun.com/exdb/mnist/)


In [None]:
! pwd

In [None]:
%pip install -r ../hello-world/hello-cyclic/requirements.txt    


### 1. Submit job using FLARE API

Starting a FLARE API session and submit the hello-cyclic job

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_secure_session

poc_workspace = "/tmp/nvflare/poc"
poc_prepared = os.path.join(poc_workspace, "example_project/prod_00")
admin_dir = os.path.join(poc_prepared, "admin@nvidia.com")
sess = new_secure_session("admin@nvidia.com", admin_dir)

job_folder = os.path.join(os.getcwd(), "/tmp/nvflare/exported/job/location/hello-cyclic")
job_id = sess.submit_job(job_folder)
print(f"Job is running with ID {job_id}")

### 2. Wait for the job

In [None]:
sess.monitor_job(job_id)

### 3. Get the result

In [None]:
import pprint
from nvflare.app_opt.tf.utils import flat_layer_weights_dict
from tensorflow.keras import layers, models

result = sess.download_job_result(job_id)

class Net(models.Sequential):
    def __init__(self, input_shape=(None, 28, 28)):
        super().__init__()
        self._input_shape = input_shape
        self.add(layers.Flatten())
        self.add(layers.Dense(128, activation="relu"))
        self.add(layers.Dropout(0.2))
        self.add(layers.Dense(10))

model = Net()
model.build(input_shape=(None, 28, 28))
model.load_weights(result + "/workspace/app_server/tf_model.weights.h5")
model.summary()

layer_weights_dict = flat_layer_weights_dict({layer.name: layer.get_weights() for layer in model.layers})

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(layer_weights_dict)

#### Clean up result directory

In [None]:
rm -r {result}

## Hello PyTorch

The `hello-world/hello-pt` example demonstrates how to use NVFlare with the popular deep learning framework PyTorch.

Refer to the [documentation](https://nvflare.readthedocs.io/en/main/examples/hello_pt.html) for details.

To use this example, PyTorch must be installed using the `requirements.txt`,

    pip install -r hello-world/hello-pt/requirements.txt
    
This examples also needs access to CIFAR10 dataset.


In [None]:
%pip install -r ../hello-world/hello-pt/requirements.txt    

### 1. Submit job using FLARE API

Starting a FLARE API session and submit the hello-pt job

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_secure_session

poc_workspace = "/tmp/nvflare/poc"
poc_prepared = os.path.join(poc_workspace, "example_project/prod_00")
admin_dir = os.path.join(poc_prepared, "admin@nvidia.com")
sess = new_secure_session("admin@nvidia.com", admin_dir)

job_folder = os.path.join(os.getcwd(), "/tmp/nvflare/exported/job/hello-pt")
job_id = sess.submit_job(job_folder)

print(f"Job is running with ID {job_id}")

### 2. Wait for the job

In [None]:
sess.monitor_job(job_id)

### 3. Get the result

In [None]:
import os
import pprint
import torch

print("this will take a bit of time")
result = sess.download_job_result(job_id)
model_path = os.path.join(result, "workspace/app_server/FL_global_model.pt")

model = torch.load(model_path)

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(model)

#### Clean up result directory

In [None]:
rm -r {result}

## Hello TensorFlow

The `examples/hello-world/hello-tf` example demonstrates how to use NVFlare with the popular deep learning framework TensorFlow.

Refer to the [documentation](https://nvflare.readthedocs.io/en/main/examples/hello_tf.html) for details.

To use this example, PyTorch must be installed using the `requirements.txt`,

    python3 -m pip install -r hello-tf/requirements.txt
    
This examples also needs access to [MNIST dataset](http://yann.lecun.com/exdb/mnist/)

In [None]:
%pip install -r hello-tf/requirements.txt

#### Running Tensorflow on local host with GPU 

Before we start to run the tensorflow job, we must aware the way we are running this job. 
We are running with 1 server, 2 sites in a local machine, which means three process involved for this federated training. 
If the local host has GPU, you might enter OOM error, due to the way Tensorflow consumes GPU memory. By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. If one has multiple process, some of the process will be OOM. To avoid multiple processes grabbing all GPU memory in TF, use the options described in [Limiting GPU memory growth]( https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth). 

In our cases,  we prefer that the process only allocates a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two methods to control this, as described in the above link.

In this example, we explictly set the environment varialble `TF_FORCE_GPU_ALLOW_GROWTH` to `true` at the very beginning of the trainer.py file, which runs in the clients and will allocate GPU memory for training.  With the env var been set, TF will not grab the entire GPU memory and will not cause GPU OOM error when running POC on local host.

Note that setting the env var `TF_FORCE_GPU_ALLOW_GROWTH` inside this notebook takes no effect because the clients of POC have already started and their env vars are set at the starting time.


### 1. Submit job using FLARE API

Starting a FLARE API session and submit the hello-tf2 job

This time, we tail the server log

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_secure_session

poc_workspace = "/tmp/nvflare/poc"
poc_prepared = os.path.join(poc_workspace, "example_project/prod_00")
admin_dir = os.path.join(poc_prepared, "admin@nvidia.com")
sess = new_secure_session("admin@nvidia.com", admin_dir)

job_folder = os.path.join(os.getcwd(), "/tmp/nvflare/exported/job/hello-tf")
job_id = sess.submit_job(job_folder)                          
print(f"Job is running with ID {job_id}")

In [None]:
! tail -100 /tmp/nvflare/poc/example_project/prod_00/server/log.txt

In [None]:
import json

list_jobs_output = sess.list_jobs()
print(format_json(list_jobs_output))


### 2. Wait for the job

In [None]:
sess.monitor_job(job_id)

### 3. Get the result

In [None]:
import pprint
from nvflare.app_opt.tf.utils import flat_layer_weights_dict
from tensorflow.keras import layers, models

result = sess.download_job_result(job_id)

class Net(models.Sequential):
    def __init__(self, input_shape=(None, 28, 28)):
        super().__init__()
        self._input_shape = input_shape
        self.add(layers.Flatten())
        self.add(layers.Dense(128, activation="relu"))
        self.add(layers.Dropout(0.2))
        self.add(layers.Dense(10))

model = Net()
model.build(input_shape=(None, 28, 28))
model.load_weights(result + "/workspace/app_server/tf_model.weights.h5")
model.summary()

layer_weights_dict = flat_layer_weights_dict({layer.name: layer.get_weights() for layer in model.layers})

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(layer_weights_dict)

#### Clean up result directory

In [None]:
rm -r {result}

## Cleanup
We need to shutdown NVFLARE system and clean up POC workspace. This can be down in the following steps. 
you can change the cell into the code cell from markdown cell.
or you can simply execute from a **terminal**

```hw_post_cleanup.sh```

bash shudown script basically does the followings


```! nvflare poc stop```

```! nvflare poc clean```
