# Getting Started with NVFlare (PyTorch)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/NVFlare/blob/main/examples/hello-world/hello-pt/hello-pt.ipynb)

NVFlare is an open-source framework that allows researchers and
data scientists to seamlessly move their machine learning and deep
learning workflows into a federated paradigm.

## Federated Averaging with NVFlare
Given the flexible controller and executor concepts, it is easy to implement different computing & communication patterns with NVFlare, such as [FedAvg](https://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com) and [cyclic weight transfer](https://academic.oup.com/jamia/article/25/8/945/4956468). 

The controller's `run()` routine is responsible for assigning tasks and processing task results from the Executors. 

### Server Code
In federated averaging, the server code is responsible for distributing the global model and aggregating model updates from clients. 

First, we provide a robust implementation of the [FedAvg](https://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com) algorithm with NVFlare. 

The server implements these main steps:
1. FL server initializes an initial model.
2. For each round (global iteration):
    - FL server samples available clients.
    - FL server sends the global model to clients and waits for their updates.
    - FL server aggregates all the `results` and produces a new global model.

In this example, we will directly use the default federated averaging algorithm provided by NVFlare utilizing the [FedAvgRecipe](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_opt.pt.recipes.fedavg.html#nvflare.app_opt.pt.recipes.fedavg.FedAvgRecipe) for PyTorch. 

There is no need to defined a customized server code for this example.

### Client Code 
We take a CIFAR-10 example directly from [PyTorch website](https://github.com/pytorch/tutorials/blob/main/beginner_source/blitz/cifar10_tutorial.py) with some minor modifications, such as removing comments, move the network to [src/net.py](src/net.py), and add a main method and GPU support. The original code can be found at [cifar10_original.py](../../hello-world/ml-to-fl/pt/code/cifar10_original.py).

Now, we need to adapt this centralized training code to something that can run in a federated setting.

On the client side, the training workflow is as follows:
1. Receive the model from the FL server.
2. Perform local training on the received global model
and/or evaluate the received global model for model
selection.
3. Send the new model back to the FL server.

Using NVFlare's client API, we can easily adapt machine learning code that was written for centralized training and apply it in a federated scenario.
For a general use case, there are three essential methods to achieve this using the Client API :
- `init()`: Initializes NVFlare Client API environment.
- `receive()`: Receives model from the FL server.
- `send()`: Sends the model to the FL server.

## Run an NVFlare Job
Now that we have defined the FedAvg controller to run our federated compute workflow on the FL server, and our client training script to receive the global models, run local training, and send the results back to the FL server, we can put everything together using NVFlare's Job API.

#### 2. Define a FedJob Recipe
 
 

In [8]:
from model import SimpleNetwork

from nvflare.app_opt.pt.recipes.fedavg import FedAvgRecipe
from nvflare.recipe import SimEnv
# from nvflare.recipe import add_experiment_tracking
import torch 
n_clients = 2
num_rounds = 2
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleNetwork(rna_dim=19359, clinical_dim=13).to(device)

recipe = FedAvgRecipe(
    name="hello-multimodal",
    min_clients=n_clients,
    num_rounds=num_rounds,
    initial_model=model,
    train_script="client.py",
)

#### 3. Add experiment tracking

In [9]:
# add_experiment_tracking(recipe, tracking_type="tensorboard")

#### 4. Run Job
Here, we run the job in a simulation environment.

In [10]:
env = SimEnv(num_clients=n_clients)
run = recipe.execute(env)
print()
print("Job Status is:", run.get_status())
print("Result can be found in :", run.get_result())
print()

[38m2026-01-08 12:11:29,553 - INFO - model selection weights control: {}[0m
[38m2026-01-08 12:11:30,147 - INFO - Tensorboard records can be found in /tmp/nvflare/simulation/hello-multimodal/server/simulate_job/tb_events you can view it using `tensorboard --logdir=/tmp/nvflare/simulation/hello-multimodal/server/simulate_job/tb_events`[0m
[38m2026-01-08 12:11:30,147 - INFO - Initializing ScatterAndGather workflow for Federated Averaging.[0m
[38m2026-01-08 12:11:30,147 - INFO - Both source_ckpt_file_full_name and ckpt_preload_path are not provided. Using the default model weights initialized on the persistor side.[0m
[38m2026-01-08 12:11:30,147 - INFO - Beginning ScatterAndGather training phase.[0m
[38m2026-01-08 12:11:30,148 - INFO - Round 0 started.[0m
[38m2026-01-08 12:11:33,794 - INFO - start task run() with full path: /tmp/nvflare/simulation/hello-multimodal/site-1/simulate_job/app_site-1/custom/client.py[0m
[38m2026-01-08 12:11:33,794 - INFO - start task run() with fu

#### 5. Visualize the Training
You can use TensorBoard to show the experiment tracking curves by running

```bash
tensorboard --bind_all --logdir /tmp/nvflare/simulation/hello-pt
```
in another terminal or directly show the training curves in the next notebook cell.

In [11]:
# asked Holger, logs are in server/simulate_job/tb_events
# they were empty because of the 2000 datapt condition.
%load_ext tensorboard
%tensorboard --bind_all --logdir /tmp/nvflare/simulation/hello-multimodal # also had this stuck as hello-pt

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6007 (pid 21556), started 0:04:02 ago. (Use '!kill 21556' to kill it.)