# Hello World Examples

In this notebook, we will walk you through some Hello World examples in `NVFlare/examples/hello-world` to get familiar with basic workflow of NVIDIA FLARE.

We will run the examples in FLARE with the POC mode using the [FLARE API](../tutorials/flare_api.ipynb). You can also run these examples in the [FLARE simulator](../tutorials/flare_simulator.ipynb).

Each example below is self-contained. You can start from any example, but you must run through the 3 steps of each example in sequence.

## Prerequisites
Before you can run the examples here, the following preparation work must be done:

1. Install a virturalenv following the instructions in [README.md](https://github.com/NVIDIA/NVFlare/tree/dev/examples)
2. Install Jupyter Lab and install a new kernel for the virtualenv called `nvflare_example`
3. Install NVFlare following this [notebook](../nvflare_setup.ipynb)
4. Start NVFlare in POC mode following this [notebook](../tutorials/setup_poc.ipynb). All the examples in this notebook require 2 clients to run.



## Hello World start scripts

to help you quickly started, we prepared a set of start/stop NVFLARE in POC mode scripts which captured steps described in above documentations.But you must run these scripts **from a terminal**

Once you are in the terminal, make sure you are in the ```nvflare_example``` venv. you can setup this by 

```../set_env.sh```

Then you can run

```./hw_pre_start.sh```


to start the FL system in POC mode. If you running the scripts successfully, you should be able to see the output like below

```
  < ...skip output ...>

2023-03-31 20:17:55,769 - FederatedClient - INFO - Got engine after 0.5007579326629639 seconds
2023-03-31 20:17:55,769 - FederatedClient - INFO - Got the new primary SP: grpc://localhost:8002

trying to connect to server
Server info:
status: stopped, start_time: Fri Mar 31 20:17:47 2023

Client info
site-1(last_connect_time: Fri Mar 31 20:18:02 2023)
site-2(last_connect_time: Fri Mar 31 20:18:05 2023)
ready to go
 
```

If you see this, **```ready to go```**, you are ready to go back to notebook and run the job. 

If the you getting errors, **avoid repeatedly run ./hw_pre_start.sh**, first you need to try to shutdown NVFLARE system, using

```
  ./hw_post_cleanup.sh 
```

you can check if the nvflare system are shutdown cleanly. 

```
     ps -eaf | grep nvflare
     
```
If you seen the followings, then the nvflare systems are still running

```

510535    1932  1 18:54 pts/1    00:00:03 python3 -u -m nvflare.private.fed.app.client.client_train -m /tmp/workspace/example_project/prod_00/site-1/startup/.. -s fed_client.json --set secure_train=true uid=site-1 org=nvidia config_folder=config
510539    1932  1 18:54 pts/1    00:00:03 python3 -u -m nvflare.private.fed.app.client.client_train -m /tmp/workspace/example_project/prod_00/site-2/startup/.. -s fed_client.json --set secure_train=true uid=site-2 org=nvidia config_folder=config
510543    1932  1 18:54 pts/1    00:00:04 python3 -u -m nvflare.private.fed.app.server.server_train -m /tmp/workspace/example_project/prod_00/localhost/startup/.. -s fed_server.json --set secure_train=true org=nvidia config_folder=config

```
make sure they are cleared before you try to start the nvflare again. kill the process if needed



## Check FL System Status

If the NVFLARE system is up and running, then we are ready to check FL system status and run jobs.  

**Warning**:  this step will fail if FL system is not running


In [None]:
import os
import time

from nvflare.fuel.flare_api.flare_api import new_insecure_session
from nvflare.fuel.flare_api.flare_api import NoConnection 

workspace = "/tmp/nvflare/poc"
admin_dir = os.path.join(workspace, "admin")

# the following try/except is usually not needed, we need it here to handle the case when you "Run all cells" or use notebook automation. 
# in the "Run all cells" case, JupyterLab seems to try to connect to the server before it starts (even though the execution is supposed to be sequential),
# which will result in a connection timeout. We use try/except to capture the scenario since extra sleep time doesn't seem to help.

try: 
   sess = new_insecure_session(admin_dir, timeout=5)
except NoConnection:
    time.sleep(10)
    
    
flare_not_ready = True
while flare_not_ready: 
    print("trying to connect to server")
    sess = new_insecure_session(admin_dir)
    sys_info = sess.get_system_info()

    print(f"Server info:\n{sys_info.server_info}")
    print("\nClient info")
    for client in sys_info.client_info:
        print(client)
    flare_not_ready = len( sys_info.client_info) < 2
        
    time.sleep(2)


### Utilities

**Monitoring Job**

You can choose your monitoring output, here is one function to display the job information 

In [None]:
import json
from nvflare.fuel.flare_api.flare_api import Session

def status_monitor_cb(
        session: Session, job_id: str, job_meta, *cb_args, **cb_kwargs
    ) -> bool:
    if job_meta["status"] == "RUNNING":
        if cb_kwargs["cb_run_counter"]["count"] < 3 or cb_kwargs["cb_run_counter"]["count"]%15 == 0:
            print(job_meta)            
        else: 
            print(".", end="")
    else:
        print("\n" + str(job_meta))
    
    cb_kwargs["cb_run_counter"]["count"] += 1
    return True



def format_json( data: dict): 
    print(json.dumps(data, sort_keys=True, indent=4,separators=(',', ': ')))


## Hello Scatter and Gather

The example job in `hello-world/hello-numpy-sag/jobs/hello-numpy-sag` demonstrate the scatter and gather workflow. See [this](https://nvflare.readthedocs.io/en/2.3/examples/hello_scatter_and_gather.html#hello-scatter-and-gather) for the details of the example.

### 1. Submit job using FLARE API

Starting a FLARE API session and submit the `hello-numpy-sag` job

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_insecure_session

poc_workspace = "/tmp/nvflare/poc"
admin_dir = os.path.join(poc_workspace, "admin")
sess = new_insecure_session(admin_dir)

job_folder = os.path.join(os.getcwd(), "hello-numpy-sag/jobs/hello-numpy-sag")
job_id = sess.submit_job(job_folder)

print(f"Job is running with ID {job_id}")

### 2. Wait for the job

The command `monitor_job()` will wait for the job till it's done.

In [None]:
list_jobs_output_detailed = sess.list_jobs(detailed=True)
print(format_json(list_jobs_output_detailed))

In [None]:
sess.get_job_meta(job_id)

In [None]:
sess.monitor_job(job_id, cb=status_monitor_cb, cb_run_counter={"count":0})

### 3. Get the result


In [None]:
import numpy as np
result = sess.download_job_result(job_id)
array = np.load(result + "/workspace/models/server.npy")
print(array)

#### Clean up result directory

In [None]:
rm -r {result}

## Hello Cross-Site Validation

The example job in `hello-world/hello-numpy-cross/jobs/hello-numpy-cross` demonstrates how to perform cross site validation after training.

Please refer to the [documentation](https://nvflare.readthedocs.io/en/2.3/examples/hello_cross_val.html) for the details.

### 1. Submit job using FLARE API

Starting a FLARE API session and submit the `hello-numpy-cross-val` job

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_insecure_session

poc_workspace = "/tmp/nvflare/poc"
admin_dir = os.path.join(poc_workspace, "admin")
sess = new_insecure_session(admin_dir)

job_folder = os.path.join(os.getcwd(), "hello-numpy-cross-val/jobs/hello-numpy-cross-val")
job_id = sess.submit_job(job_folder)

print(f"Job is running with ID {job_id}")

### 2. Wait for the job

In [None]:
sess.get_job_meta(job_id)

In [None]:
sess.monitor_job(job_id, cb=status_monitor_cb, cb_run_counter={"count":0})

### 3. Get the result

In [None]:
import json
import pprint

result = sess.download_job_result(job_id)
with open(result + "/workspace/cross_site_val/cross_val_results.json", "r") as f:
  cross_val_result = json.load(f)

pp = pprint.PrettyPrinter(indent=2)
pp.pprint(cross_val_result)

#### Clean up result directory

In [None]:
rm -r {result}

## Hello Cyclic Weight Transfer

This example uses the CyclicController workflow to implement [Cyclic Weight Transfer](https://pubmed.ncbi.nlm.nih.gov/29617797/) with TensorFlow as the deep learning training framework. The job is `hello-world/hello-cyclic/jobs/hello-cyclic`.

To use this example, tensorflow must be installed using the `requirements.txt`,

    pip install -r hello-world/hello-cyclic/requirements.txt
    
This examples needs access to [MNIST dataset](http://yann.lecun.com/exdb/mnist/)


In [None]:
! pwd

In [None]:
%pip install -r ../hello-world/hello-cyclic/requirements.txt    


### 1. Submit job using FLARE API

Starting a FLARE API session and submit the hello-cyclic job

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_insecure_session

poc_workspace = "/tmp/nvflare/poc"
admin_dir = os.path.join(poc_workspace, "admin")
sess = new_insecure_session(admin_dir)

job_folder = os.path.join(os.getcwd(), "hello-cyclic/jobs/hello-cyclic")
job_id = sess.submit_job(job_folder)
print(f"Job is running with ID {job_id}")

### 2. Wait for the job

In [None]:
sess.monitor_job(job_id)

### 3. Get the result

In [None]:
from nvflare.fuel.utils import fobs
from nvflare.app_common.decomposers import common_decomposers
import pprint

# This example stores numpy arrays in FOBS format. Decomposers for Numpy is not registered automatically.
common_decomposers.register()

result = sess.download_job_result(job_id)
with open(result + "/workspace/app_server/tf2weights.fobs", "rb") as f:
    bytes = f.read()

weights = fobs.loads(bytes)

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(weights)

#### Clean up result directory

In [None]:
rm -r {result}

## Hello PyTorch

This example demonstrates how to use NVFlare with the popular deep learning framework PyTorch. The job is `hello-world/hello-pt/jobs/hello-pt`.

Refer to the [documentation](https://nvflare.readthedocs.io/en/2.3/examples/hello_pt.html) for details.

To use this example, PyTorch must be installed using the `requirements.txt`,

    pip install -r hello-world/hello-pt/requirements.txt
    
This examples also needs access to CIFAR10 dataset.


In [None]:
%pip install -r ../hello-world/hello-pt/requirements.txt    

### 1. Submit job using FLARE API

Starting a FLARE API session and submit the hello-pt job

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_insecure_session

poc_workspace = "/tmp/nvflare/poc"
admin_dir = os.path.join(poc_workspace, "admin")
sess = new_insecure_session(admin_dir)

job_folder = os.path.join(os.getcwd(), "hello-pt/jobs/hello-pt")
job_id = sess.submit_job(job_folder)

print(f"Job is running with ID {job_id}")

### 2. Wait for the job

In [None]:
sess.monitor_job(job_id)

### 3. Get the result

In [None]:
import os
import pprint
import torch

print("this will take a bit of time")
result = sess.download_job_result(job_id)
model_path = os.path.join(result, "workspace/app_server/FL_global_model.pt")

model = torch.load(model_path)

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(model)

#### Clean up result directory

In [None]:
rm -r {result}

## Hello TensorFlow 2

This example demonstrates how to use NVFlare with the popular deep learning framework TensorFlow 2. The job is `examples/hello-world/hello-tf2/jobs/hello-tf2`.

Refer to the [documentation](https://nvflare.readthedocs.io/en/2.3/examples/hello_tf2.html) for details.

To use this example, PyTorch must be installed using the `requirements.txt`,

    python3 -m pip install -r hello-tf2/requirements.txt
    
This examples also needs access to [MNIST dataset](http://yann.lecun.com/exdb/mnist/)

In [None]:
%pip install -r hello-tf2/requirements.txt

#### Running Tensorflow on local host with GPU 

Before we start to run the tensorflow job, we must aware the way we are running this job. 
We are running with 1 server, 2 sites in a local machine, which means three process involved for this federated training. 
If the local host has GPU, you might enter OOM error, due to the way Tensorflow consumes GPU memory. By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. If one has multiple process, some of the process will be OOM. To avoid multiple processes grabbing all GPU memory in TF, use the options described in [Limiting GPU memory growth]( https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth). 

In our cases,  we prefer that the process only allocates a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two methods to control this. 

The First method is set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true. This configuration is platform specific. 
The 2nd method is using the piece of code below

In [None]:
%env TF_FORCE_GPU_ALLOW_GROWTH=true

In [None]:
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

### 1. Submit job using FLARE API

Starting a FLARE API session and submit the hello-tf2 job

This time, we tail the server log

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_insecure_session

poc_workspace = "/tmp/nvflare/poc"
admin_dir = os.path.join(poc_workspace, "admin")
sess = new_insecure_session(admin_dir)

job_folder = os.path.join(os.getcwd(), "hello-tf2/jobs/hello-tf2")
job_id = sess.submit_job(job_folder)                          
print(f"Job is running with ID {job_id}")

In [None]:
! tail -100 /tmp/nvflare/poc/server/log.txt

In [None]:
import json

list_jobs_output = sess.list_jobs()
print(format_json(list_jobs_output))


### 2. Wait the job

In [None]:
sess.monitor_job(job_id)

### 3. Get the result

In [None]:
from nvflare.fuel.utils import fobs
from nvflare.app_common.decomposers import common_decomposers
import pprint

common_decomposers.register()
result = sess.download_job_result(job_id)
with open(result + "/workspace/app_server/tf2weights.fobs", "rb") as f:
    bytes = f.read()

weights = fobs.loads(bytes)

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(weights)

#### Clean up result directory

In [None]:
rm -r {result}

## Cleanup
We need to shutdown NVFLARE system and clean up POC workspace. This can be down in the following steps. 
you can change the cell into the code cell from markdown cell.
or you can simply excute from a **terminal**

```hw_post_cleanup.sh```

bash shudown script basically does the followings


```! nvflare poc --stop```

```! nvflare poc --clean```
