# NVIDIA FLARE with Docker

### NVIDIA FLARE with Docker
This notebook shows how to use [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) with [Docker](https://www.docker.com/).

Please make sure you set up a virtual environment and install JupyterLab following the [example root readme](../../README.md).

Also, make sure that you have cloned the [NVFlare](https://github.com/NVIDIA/NVFlare) repository so you have the source code for building Docker images with NVFlare.

## Building a Docker Image

### Building a Docker Image with a Dockerfile
In the folder containing this example, there is a Dockerfile:

In [None]:
cat Dockerfile

Note that this Dockerfile uses the [NVIDIA PyTorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) base container and then clones the NVFlare repository while updating and installing the basic dependencies and specified version of NVFlare.

To build a Docker image named `nvflare-pt-docker` with this Dockerfile, you can use the following command (note that if you need to download the base image, you may want to run it in a separate terminal instead of inside this notebook because the output in the notebook will keep appending to track the status of the download and may use up too much memory):

In [None]:
! docker build -t nvflare-pt-docker . -f Dockerfile

You can check that the Docker image has been built and exists with:

In [None]:
! docker images

In this example Dockerfile, we are using the [NVIDIA PyTorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) base container. You can use other base containers if you need other dependencies.

### Dependencies of Base Images
Please note that there may be issues if the dependencies of the base image you are using have versions that conflict with NVIDIA FLARE. For example, the PyTorch base image also has gRPC, so if the version of gRPC in the NVIDIA PyTorch base image is not compatible with the version of gRPC used by the version of NVFlare, you may run into errors and not be able to successfully connect the FL clients and servers. You may need to use a newer or older base image or specify a newer or older version of NVFlare to be compatible.

## Provisioning NVIDIA FLARE Project with Docker Image
If you provision a project with the `nvflare provision` command, you can configure a Docker image for the startup kits to automatically contain a `docker.sh` script for starting the specified Docker image.

### Setting Docker Image in Project.yml
In the `project.yml` configuration for provisioning an NVFlare project, under `nvflare.lighter.impl.static_file.StaticFileBuilder` is an arg for `docker_image` which is commented out in the default `project.yml`. If this line is uncommented and the name of the Docker image is placed here, the provisioning process will create a `docker.sh` script for each server and participant:

```
builders:
  - path: nvflare.lighter.impl.static_file.StaticFileBuilder
    args:
      # when docker_image is set to a docker image name, docker.sh will be generated on server/client/admin
      docker_image: nvflare-pt-docker
```

We are focusing on Docker in this notebook, so for more details on the provisioning process, see [Provisioning in NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/programming_guide/provisioning_system.html). There is a basic project.yml in the folder with this notebook that is almost the same as the default project.yml generated for non-HA mode but with `mylocalhost` configured for the server name. We also update the name of the project to example_docker_project and uncomment the `docker_image` arg for StaticFileBuilder and set the name of the Docker image to `nvflare-pt-docker` to match the image we just created.

### Provision the Project
Take a look at this project.yml file and run the following `nvflare provision` command if there is nothing you need to update. You can update the server name to something other than `mylocalhost`, but whatever this is will need to be accessible from the FL clients (more details in the section below).

In [None]:
! nvflare provision

## Starting Docker Containers and NVFlare
Inside each startup kit is a `docker.sh` script for starting the Docker image specified in the `project.yml` configuration.

### Docker Run Command for the Server
If you kept the name of the server as the default of `mylocalhost`, the following cell will show the contents of the `docker.sh` script for the FL server.

In [None]:
! cat workspace/example_docker_project/prod_00/mylocalhost/startup/docker.sh

Note how the Docker image was set to the image we specified with the `DOCKER_IMAGE=nvflare-pt-docker` line.

`/workspace` is mapped to the directory that is the parent of the one containing this `docker.sh` script, i.e., `mylocalhost`. It is then used for the working directory with the `-w` option.

By default, `NETARG` is set to add the option `--net=host`, but if you do not want to use the host network, you can comment out the line setting the default value and uncomment the line to set it to map ports manually: `NETARG="-p 8003:8003 -p 8002:8002"`.

If you run the `docker.sh` script with the `-d` flag for detached mode, the `docker run` command that is executed will launch the Docker image in detached mode and automatically start the FL server. Otherwise, the container will start in interactive mode so you can manually start the server with the `start.sh` command in the startup directory by typing `./startup/start.sh`.

If you want to run the FL server in a terminal, you can start the docker with `docker run --rm -it --name=flserver -v $(pwd)/mylocalhost:/workspace/ -w /workspace/ --ipc=host --net=host nvflare-pt-docker /bin/bash` then `./startup/start.sh`)

Otherwise, you can run the server with the next cell. Since in a Jupyter notebook it can be challenging to execute scripts requiring interaction, we can use the `-d` flag to start the FL server automatically when running the Docker command in the next cell:

In [None]:
! ./workspace/example_docker_project/prod_00/mylocalhost/startup/docker.sh -d

To check that the docker image has started, run:

In [None]:
! docker ps

### Docker Run Command for Clients
The following cell will show the contents of the `docker.sh` script for the FL client site-1, the default name for the first client configured in the `project.yml` for this example. If you have changed the name of the FL clients, please replace site-1 with your FL client name below.

In [None]:
! cat workspace/example_docker_project/prod_00/site-1/startup/docker.sh

Much of this is the same as the `docker.sh` script for launching the Docker container for FL server. The Docker image is set to the same image we specified in `project.yml`. 

Again, `/workspace` is mapped to the directory that is the parent of the one containing this `docker.sh` script, and it is used for the working directory with the `-w` option.

By default, `NETARG` is set to add the option `--net=host`, but since clients do not need to open any ports this is not needed as long as the FL client is able to connect to the FL server.

The Docker script for FL clients has the additional variable `MY_DATA_DIR` set to `/home/flcient/data` by default which is mapped to `/data` in the container. You can set this to a custom value by running `export MY_DATA_DIR=$SOME_DIRECTORY` before `docker.sh`.

The `GPU2USE` variable is also added to keep track of the option for what GPUs to use for the container. Uncomment a line setting the value of `GPU2USE` to use GPUs: `--gpus=all` to use all available GPUs, `--gpus=2` to use two GPUs, and `--gpus="device=0,2"` for specifying specific GPUs where the numbers after `device=` are the GPU IDs.

> **Note:** In order to use the `--gpus` flag, you may need to ensure that you have the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) installed.

If you run the `docker.sh` script with the `-d` flag for detached mode, the `docker run` command that is executed will launch the Docker image in detached mode and automatically start the FL client. Otherwise, the container will start in interactive mode so you can manually start the client with the `start.sh` command in the startup directory by typing `./startup/start.sh`.

If you want to run an FL client in a terminal, you can start the docker with `docker run --rm -it --name=site-1 $GPU2USE -u $(id -u):$(id -g) -v /etc/passwd:/etc/passwd -v /etc/group:/etc/group -v $(pwd)/site-1:/workspace/ -v /tmp/nvflare/flclient/data:/data/:ro -w /workspace/ --ipc=host nvflare-pt-docker /bin/bash` then `./startup/start.sh`)

Otherwise, you can start a client in the next cell. Since in a Jupyter notebook it can be challenging to execute scripts requiring interaction, we the `-d` flag to start the FL client automatically when running the Docker command in the next cell (to start a second client for site-2, you could copy the cell and change the path to use the docker.sh for site-2, or use a terminal to run it):

In [None]:
! ./workspace/example_docker_project/prod_00/site-1/startup/docker.sh -d

To check that the Docker container has started, run:

In [None]:
! docker ps

If you want to see the logs of the container, you can run:

In [None]:
! docker logs site-1

If you notice that a while after starting the Docker container and running an NVFlare client, the Docker container stops and exits, it could be possible that the client is unable to connect to the FL server. See the section below on troubleshooting connections.

### Docker Run Command for FLARE Console
There is a `docker.sh` script for the project admins provisioned in `project.yml` but you do not need any dependencies other than having nvflare installed for running the FLARE Console, so usually you would not need to run the FLARE Console in a Docker container.

If you would like to, you can run the FLARE Console to connect to the FL server and check the status, submit jobs, or perform any other FL commands.

With the PyTorch dependencies in the Docker containers, you can follow the instructions to export and then submit the [Hello PyTorch Example](https://nvflare.readthedocs.io/en/main/examples/hello_pt_job_api.html) with the [FLARE Console](https://nvflare.readthedocs.io/en/main/real_world_fl/operation.rst) or the [FLARE API](../../tutorials/flare_api.ipynb).

### Troubleshooting Connections and Configuring /etc/hosts
To make sure that your FL server is reachable, assuming your server is running in the Docker container with the mode for `NETARG="--net=host"`, you may need to add an entry in /etc/hosts (or if you are not runinng your FL server on the local host, you need to make sure the DNS is resolvable):
```
127.0.0.1	 mylocalhost
```

If you are trying to run this on Mac, you may have to take additional steps to make the FL server reachable. For example, if you use Colima, you may need to figure out a way to add the following into the /etc/hosts of your container for your FL client even if you are using the script with `NETARG="--net=host"`: 
```
192.168.5.2	 mylocalhost
```

## Stopping the Docker Containers
After you are done and ready to stop your running Docker containers, you can use the following cells:

In [None]:
! docker stop flserver

In [None]:
! docker stop site-1

In [None]:
! docker stop site-2

If you started a Docker container for the FLARE Console with the default name and want to stop it:

In [None]:
! docker stop fladmin

To check that all your Docker containers have stopped, the following should display no more running containers:

In [None]:
! docker ps