# Provision and Run with Docker

Provisioning and deploying using Docker containers could be a convenient way to ensure a uniform OS and software environment across client and server systems. Docker deployment can be used as an alternative to the bare-metal deployment described during the sections before.

Before starting, make sure you have Docker installed and set up on all participants' system (server and clients).

> **Note**: you will need to install a supported container runtime and the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) to enable support for GPUs.

In this notebook, we will walk you through the following items for containerized provision and deployment:
- Building a Docker image
- Provision a project with the Docker image
- Starting the server, clients and admin


# Build a Docker Image

Before starting a containerized provision and deployment, we must build a Docker image with NVIDIA FLARE and other runtime dependencies installed for the project. You have the flexibility to create a Dockerfile however you want, as long as all the dependencies are included in the image. But here is an [example Dockerfile](code/Dockerfile) that you can use:

In [None]:
!cat code/Dockerfile

Note that this Dockerfile uses the [NGC `pytorch:24.07-py3` base image](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch), clones the NVIDIA FLARE repository and installs the latest version of FLARE. 

> **Note**: you can customize the base image and the specific version of NVIDIA FLARE to be installed based on the requirements of the project. But be careful: it's recommended to use the same FLARE version for provisioning the project and for building the Docker image. Otherwise, runtime errors might occur.



Run the following command to build a Docker image for our provision and deployment later. You can use any name for the image, here we are using the name `nvflare-pt-docker`.

> **Note**: it's recommended to run the command in a separate terminal instead of inside this notebook, because the output in the notebook might keep appending to track the status of the docker build command, and may use too much memory.


In [None]:
!docker build -t nvflare-pt-docker . -f code/Dockerfile

Once the build is complete, you can verify that the Docker image has been built with:

In [None]:
!docker images | grep nvflare-pt-docker

> **Note**: the same Docker image needs to be built on the system of every participant that intends to start using Docker.  

# Provisioning a Project with the Docker Image

To provision a project with the Docker image, we need to perform the same steps as described previously in [Provision Using `nvflare provision` CLI](../04.1_provision_via_cli/provision_via_cli.ipynb). The only modification needed is to make sure that the server and clients can use the docker image `nvflare-pt-docker` when they start. This can be done by modifying the project configuration file. 

Take a look at the example configuration file in [`code/project.yml`](code/project.yml). You will notice that the the only difference between this configuration file and [the one in "Provision Using `nvflare provision` CLI"](../04.1_provision_via_cli/code/project.yml), is a newly added argument under the `nvflare.lighter.impl.static_file.StaticFileBuilder` builder:
```
builders:
  - path: nvflare.lighter.impl.static_file.StaticFileBuilder
    args:
      # when docker_image is set to a docker image name, docker.sh will be generated on server/client/admin
      docker_image: nvflare-pt-docker
```

By doing this, the provisioning process will create a `docker.sh` script for all participants of the project.

Now let's go ahead and provision a project:

In [None]:
!nvflare provision -p ./code/project.yml

# Starting the Participants with Docker

After a successful provisioning, you will find a new `docker.sh` script inside of the server, clients and admin's startup folder. 

### Starting the Server

The content of the server side `docker.sh` (located at [`workspace/example_project/prod_00/localhost/startup/docker.sh`](workspace/example_project/prod_00/localhost/startup/docker.sh) after provisioning) should look like this:


```bash
#!/usr/bin/env bash
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# docker run script for FL server
# to use host network, use line below
NETARG="--net=host"
# or to expose specific ports, use line below
#NETARG="-p 8003:8003 -p 8002:8002"
DOCKER_IMAGE=nvflare-pt-docker
echo "Starting docker with $DOCKER_IMAGE"
svr_name="${SVR_NAME:-flserver}"
mode="${1:-r}"
if [ $mode = "-d" ]
then
  docker run -d --rm --name=$svr_name -v $DIR/..:/workspace/ -w /workspace \
  --ipc=host $NETARG $DOCKER_IMAGE /bin/bash -c \
  "python -u -m nvflare.private.fed.app.server.server_train -m /workspace -s fed_server.json --set secure_train=true config_folder=config org=nvidia"
else
  docker run --rm -it --name=$svr_name -v $DIR/..:/workspace/ -w /workspace/ --ipc=host $NETARG $DOCKER_IMAGE /bin/bash
fi
```

You can see that this script is just executing `docker run` of the image `nvflare-pt-docker` with multiple options. 

By default, the `--net=host` option is set for Docker to use the host network. If this is not desired, you can comment out the line that sets the default value, and uncomment the line to set it to map ports manually: `NETARG="-p 8003:8003 -p 8002:8002"`.

If you run the `docker.sh` script with the `-d` flag for detached mode, the `docker run` command that is executed will launch the Docker image in detached mode and automatically start the server. Otherwise, the container will start in interactive mode so you can manually start the server with the `start.sh` command in the startup directory by typing `./startup/start.sh`.

Let's go ahead and start the server, by executing the following command in a separate terminal:

```bash
./workspace/example_project/prod_00/server1/startup/docker.sh -d
```

This will run a detached container and start up the sever. We can verify that with: 

In [None]:
!docker ps

You can check the server's log with:

In [None]:
!docker logs flserver

### Starting the Clients

The client side `docker.sh` script is similar to the server side script. 

```python
#!/usr/bin/env bash
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# docker run script for FL client
# local data directory
: ${MY_DATA_DIR:="/home/flclient/data"}
# ...

NETARG="--net=host"
# FL clients do not need to open ports, so the following line is not needed.
#NETARG="-p 443:443 -p 8003:8003"
DOCKER_IMAGE=nvflare-pt-docker
echo "Starting docker with $DOCKER_IMAGE"
mode="${1:--r}"
if [ $mode = "-d" ]
then
  docker run -d --rm --name=site-1 $GPU2USE -u $(id -u):$(id -g) \
  -v /etc/passwd:/etc/passwd -v /etc/group:/etc/group -v $DIR/..:/workspace/ \
  -v $MY_DATA_DIR:/data/:ro -w /workspace/ --ipc=host $NETARG $DOCKER_IMAGE \
  /bin/bash -c "python -u -m nvflare.private.fed.app.client.client_train -m /workspace -s fed_client.json --set uid=site-1 secure_train=true config_folder=config org=nvidia"
else
  docker run --rm -it --name=site-1 $GPU2USE -u $(id -u):$(id -g) \
  -v /etc/passwd:/etc/passwd -v /etc/group:/etc/group -v $DIR/..:/workspace/ \
  -v $MY_DATA_DIR:/data/:ro -w /workspace/ --ipc=host $NETARG $DOCKER_IMAGE /bin/bash
fi
```

The Docker script for FL clients has the additional variable `MY_DATA_DIR` set to `/home/flcient/data` by default which is mapped to `/data` in the container. You can set this to a custom value by running `export MY_DATA_DIR=$SOME_DIRECTORY` before `docker.sh`.

The `GPU2USE` variable is also added to keep track of the option for what GPUs to use for the container. You can modify the corresponding line to set the value of `GPU2USE` to use GPUs: 
- `--gpus=all` to use all available GPUs
- `--gpus=2` to use two GPUs
- `--gpus="device=0,2"` for specifying specific GPUs where the numbers after `device=` are the GPU IDs

> **Note:** In order to use the `--gpus` flag, you need to ensure that you have the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) installed.


Let's go ahead and start both clients in detached mode, by executing the following commands in separate terminals:

```bash

# Start the client site-1
./workspace/example_project/prod_00/site-1/startup/docker.sh -d

# Start the client site-2
./workspace/example_project/prod_00/site-2/startup/docker.sh -d

```

We can verify that both clients have started with:

In [None]:
!docker ps

You can check the clients logs with:

In [None]:
!docker logs site-1

And

In [None]:
!docker logs site-2

### Starting the Admin

Although there is a `docker.sh` script inside the admin's startup folder, you would not need to run the FLARE Console in a Docker container, since the admin is a user that could potentially connect from anywhere, and do not need any dependencies other than having nvflare installed for running the FLARE Console.

Let's go ahead and start the admin user and connect to the FLARE Console, by executing the following command in a separate terminal:
```bash
./workspace/example_project/prod_00/admin\@nvidia.com/startup/fl_admin.sh 
```

Enter the admin's email address as defined in the project configuration file: `admin@nvidia.com`. Then use the sub-command `check_status server` to make sure that the server and clients have all successfully started. You should see an output similar to the following:

```
Engine status: stopped
---------------------
| JOB_ID | APP NAME |
---------------------
---------------------
Registered clients: 2 
-----------------------------------------------------------------------------------------------------
| CLIENT | FQCN   | FQSN   | LEAF | TOKEN                                | LAST CONNECT TIME        |
-----------------------------------------------------------------------------------------------------
| site-1 | site-1 | site-1 | True | 61bbdd09-b08d-4838-ab65-43b0d0fd022f | Wed Feb 19 22:48:17 2025 |
| site-2 | site-2 | site-2 | True | 8c390da1-723e-4c26-88d9-a0673c3c9d53 | Wed Feb 19 22:48:14 2025 |
-----------------------------------------------------------------------------------------------------
Done [4425 usecs] 2025-02-19 23:48:24.404288
```

### Submit And Run An Application 

With the PyTorch dependencies in the Docker containers, you can follow the instructions to export and then submit the [Hello PyTorch Example](https://nvflare.readthedocs.io/en/main/examples/hello_pt_job_api.html).


### Troubleshooting Connections and Configuring /etc/hosts
To make sure that your FL server is reachable, assuming your server is running in the Docker container with the mode for `NETARG="--net=host"`, you may need to add an entry in /etc/hosts (or if you are not runinng your FL server on the local host, you need to make sure the DNS is resolvable):
```
127.0.0.1	 localhost
```

If you are trying to run this on Mac, you may have to take additional steps to make the FL server reachable. For example, if you use Colima, you may need to figure out a way to add the following into the /etc/hosts of your container for your FL client even if you are using the script with `NETARG="--net=host"`: 
```
192.168.5.2	 localhost
```

### Stopping the Docker Containers

After you are done and ready to stop your running Docker containers, you can use the following cells:

In [None]:
!docker stop flserver

In [None]:
!docker stop site-1

In [None]:
!docker stop site-2

To check that all your Docker containers have stopped, the following should display no more running containers:

In [None]:
!docker ps

**That's it, we have learned how to provision and run an FL system with Docker!**

# What's Next

Next, we will explore cloud deployment options, starting with [deployment in AWS environment](../04.5_deployment_in_aws/deployment_in_aws.ipynb).