# Simulating Real-World Deployment with NVIDIA FLARE

## Introduction

In Part 1 of this course, we used the NVIDIA FLARE simulator to run federated learning experiments without worrying about system deployment. While the simulator is excellent for algorithm development and testing, real-world federated learning deployments involve multiple physical sites, each with its own infrastructure and security requirements.

In this section, we'll bridge the gap between simulation and real-world deployment by using NVIDIA FLARE's Proof of Concept (POC) mode. This mode allows us to simulate a multi-site deployment on a single machine, providing a realistic preview of how federated learning works in production environments.

### Learning Objectives
By the end of this section, you will be able to:
- Understand the difference between simulation and real-world deployment
- Set up a Proof of Concept (POC) environment to simulate multi-site deployment
- Configure a federated learning project with custom sites and users
- Start and manage a federated learning system in POC mode

## Real-World Deployment vs. Simulation

Let's first understand the key differences between the simulator we used previously and a real-world deployment:

| Aspect | Simulator | Real-World Deployment |
|--------|-----------|----------------------|
| Infrastructure | Single process with multiple threads | Multiple machines across different organizations |
| Setup | Simple Python API or CLI command | Multi-step process including provisioning and distribution |
| Security | Minimal (for development) | Comprehensive (certificates, encryption, authentication) |
| Communication | In-memory | Network-based (gRPC, HTTP, etc.) |
| Lifecycle | Starts and stops with your script | Long-running services waiting for jobs |

## The Real-World Deployment Process

In a production environment, deploying a federated learning system typically involves these steps:

1. **Provisioning**: Creating secure software packages (startup kits) for each participant
2. **Distribution**: Securely transferring these packages to each participating site
3. **Startup**: Each site starts their FLARE client/server using the startup kit
4. **Operation**: The system runs continuously, processing jobs as they're submitted

The Proof of Concept (POC) mode allows us to simulate this entire process on a single machine, making it an excellent learning tool before moving to a distributed deployment.


Let's compare the key differences between the POC simulated deployment and a real-world deployment:

| Aspect | Simulator | Real-World Deployment |
|--------|-----------|----------------------|
| Infrastructure | Client and server run different process on the same machine (localhost) | Clients and servers on multiple machines across different organizations |
| Setup | poc command uses provision to create startup kits, but make everything run on the same machine, no distribution | provision and distribution process |
| Security | Minimal (for development) | Comprehensive (certificates, encryption, authentication) |
| Communication | Network-based (gRPC, HTTP, etc.) | Network-based (gRPC, HTTP, etc.) |
| Lifecycle | Long-running services waiting for jobs| Long-running services waiting for jobs |



## Understanding the POC Command

NVIDIA FLARE provides a dedicated command-line interface for creating and managing POC deployments. The [`nvflare poc` command](https://nvflare.readthedocs.io/en/main/user_guide/nvflare_cli/poc_command.html) offers several subcommands for different aspects of POC management.

> **Note**: While you can run these commands in a notebook, it's generally better to use a terminal for POC setup to avoid potential inconsistencies in process management.

### Creating a Basic POC Environment

Let's start by creating a simple POC environment with five client sites. This simulates a federated learning scenario with five participating organizations.

The command to create this environment is:

In [2]:
! nvflare poc prepare -n 5

prepare poc at /tmp/nvflare/poc for 5 clients
provision at /tmp/nvflare/poc for 5 clients with /tmp/nvflare/poc/project.yml
INFO: Generated results can be found under /tmp/nvflare/poc/example_project/prod_00. 


### Understanding the Generated Structure

The command above creates a directory structure that simulates a multi-site deployment. Let's examine what was created:

```
/tmp/nvflare/poc/example_project/prod_00/
├── admin@nvidia.com/     # Admin user's startup kit
├── server/               # FL server startup kit
├── site-1/               # Client site 1 startup kit
├── site-2/               # Client site 2 startup kit
├── site-3/               # Client site 3 startup kit
├── site-4/               # Client site 4 startup kit
└── site-5/               # Client site 5 startup kit
```

Each directory contains a complete startup kit for that participant, including:

- **startup/**: Scripts and configurations to start the FLARE client/server
- **local/**: Directory for local data and outputs
- **transfer/**: Directory for secure file transfer between participants
- **Certificates and keys**: For secure communication (in a real deployment)

### Key Components of the POC Environment

The POC environment includes several key components:

1. **Server**: The central coordinator for the federated learning system
2. **Clients**: The participants (sites) that contribute data and computation
3. **Admin**: The user who manages the system and submits jobs

In a real-world deployment, these components would run on separate machines across different organizations. In POC mode, they all run on your local machine but in separate processes, simulating network communication.

## Creating a Realistic POC Environment

While the basic POC setup is useful, real-world federated learning projects often have custom site names, organizational structures, and user roles. NVIDIA FLARE allows you to define these using a project configuration file.

### The Project Configuration File

A project configuration file (typically `project.yml`) defines the structure of your federated learning project, including:

- Names and roles of participating sites
- User accounts and their permissions
- Security settings
- Other deployment parameters

Let's examine a sample project configuration:

In [None]:
! cat code/project.yml

### Understanding the Project Configuration

The project configuration above defines a healthcare-focused federated learning project with three main participants:

1. **general-hospital-server**: The central server operated by a nonprofit organization
2. **us_hospital**: A hospital in the United States participating as a client
3. **europe-hospital**: A hospital in Europe participating as a client

It also defines several user roles for each organization:

- **Project Admin** (`admin@nonprofit.org`): Has full control over the project
- **Site Admins** (`admin@hospital.org.us`, `admin@hospital.org.eu`): Manage their respective sites
- **Lead Members** (`lead@hospital.org.us`, `lead@hospital.org.eu`): Have elevated permissions at their sites
- **Regular Members** (`member@hospital.org.us`, `member@hospital.org.eu`): Have basic access to their sites

This structure reflects a realistic scenario where multiple organizations collaborate while maintaining their own internal hierarchy.

### Creating a Custom POC Environment

Now, let's create a POC environment based on this project configuration:

In [None]:
! echo 'y' | nvflare poc prepare -i code/project.yml

Let's examine the structure of our custom POC environment:

In [None]:
! tree /tmp/nvflare/poc/health_project/prod_00

Notice that the site names now match those defined in our project configuration file (`general-hospital-server`, `us_hospital`, `europe-hospital`) instead of the generic names (`server`, `site-1`, etc.).

Each user defined in the project configuration also has their own startup kit, allowing them to connect to the system with their specific permissions.

#### Simplified Custom Site Configuration

If you want to customize site names without creating a full project configuration file, you can use the `-c` option with the `poc prepare` command:

```bash
nvflare poc prepare -c hospital1 hospital2 research_center
```

This creates a POC environment with three client sites named `hospital1`, `hospital2`, and `research_center`, plus the default server.

#### Docker-Based Deployment

For more realistic simulation or actual production deployment, NVIDIA FLARE supports Docker-based deployment. This approach provides better isolation between components and more closely resembles a real-world deployment scenario.

To create a Docker-based POC environment, use the `-d` flag with the `poc prepare` command, followed by the docker image

```bash
nvflare poc prepare -d "nvflare/nvflare"
```

This generates a `docker.sh` for each client and server which will be used to pull the docker image and then start the container in detached mode.

In [None]:
! echo 'y'| nvflare poc prepare -d 'nvflare/nvflare'
! tree /tmp/nvflare/poc/example_project/prod_00

### Starting the POC Environment

Once you've created your POC environment, you can start the federated learning system. In a real deployment, each site would start their own component. In POC mode, you can start all components with a single command:

```bash
nvflare poc start
```

This command starts the server and all client processes in the background. 

#### Starting Specific Components

You can also start specific components using the `-p` (participant) option:

```bash
# Start only the server
nvflare poc start -p server

# Start specific clients
nvflare poc start -p site-1 -p site-2
```

#### Excluding Components

Alternatively, you can start all components except specific ones using the `-ex` (exclude) option:

```bash
# Start all components except the admin console
nvflare poc start -ex admin@nvidia.com

# Start all components except specific clients
nvflare poc start -ex site-3 -ex site-4
```

This is particularly useful when you want to start most components but manage a few separately.

### Monitoring and Managing the POC Environment

You can monitor and manage your POC environment using several commands:

#### Checking Status

To check the status of all components:

```bash
nvflare poc status
```

This shows which components are running and their process IDs.

#### Stopping Components

To stop specific components:

```bash
# Stop a specific client
nvflare poc stop -p site-1

# Stop multiple components
nvflare poc stop -p site-1 -p site-2
```

To stop all components:

```bash
nvflare poc stop
```

#### Cleaning Up

To clean up the POC environment (removing all processes and temporary files):

```bash
nvflare poc clean
```

Let's stop and clean our environment now

In [None]:
! nvflare poc stop

In [None]:
! nvflare poc clean

## From POC to Production

The POC mode is designed to be as close as possible to a real deployment, with a few simplifications for local testing. When you're ready to move to a production environment, the main differences will be:

1. **Provisioning**: You'll use `nvflare provision` instead of `nvflare poc prepare`
2. **Distribution**: You'll need to securely distribute startup kits to each participant
3. **Network Configuration**: You'll need to configure firewalls and network settings for cross-organization communication
4. **Security**: You'll need to implement additional security measures appropriate for your deployment
5. **Resource Allocation**: You'll need to allocate appropriate computing resources for each component
6. **Monitoring**: You'll need to set up more comprehensive monitoring and alerting

We'll cover these topics in more detail in [Chapter 4](../../chapter-4_setup_federated_system/04.0_introduction/introduction.ipynb).

## Summary

In this section, we've learned how to:

- Create a POC environment to simulate a multi-site federated learning deployment
- Configure a custom project with specific sites and user roles
- Deploy NVIDIA FLARE in Docker containers for better isolation
- Start and manage the components of a federated learning system using various command options
- Understand the relationship between POC mode and real-world deployment

The POC mode provides a valuable bridge between simulation and production, allowing you to test your federated learning workflows in a realistic environment before deploying across multiple organizations.

In the next section, we'll explore different ways to [interact with the federated computing system](../03.3_interact_with_federated_computing_system/ways_to_interact_with_fl_system.ipynb).