Modified from the [NVFlare Hello PyTorch example](https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-world/hello-pt)

## Run an NVFlare Job
Now that we have defined the FedAvg controller to run our federated compute workflow on the FL server, and our client training script to receive the global models, run local training, and send the results back to the FL server, we can put everything together using NVFlare's Job API.

#### 2. Define a FedJob Recipe
 
 

In [1]:
from server.model import FusionNet

from nvflare.app_opt.pt.recipes.fedavg import FedAvgRecipe
from nvflare.recipe import SimEnv
from nvflare.recipe import add_experiment_tracking
import torch 
n_clients = 2
num_rounds = 4
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model = FusionNet().to(device)

recipe = FedAvgRecipe(
    name="MEM_SAN_FedCollab", # MEM - Memphis, SAN - San Diego
    min_clients=n_clients,
    num_rounds=num_rounds,
    initial_model=model,
    train_script="multi-client-sim.py",
)

#### 3. Add experiment tracking

In [2]:
add_experiment_tracking(recipe, tracking_type="tensorboard")

#### 4. Run Job
Here, we run the job in a simulation environment.

In [3]:
env = SimEnv(num_clients=n_clients)
run = recipe.execute(env)
print()
print("Job Status is:", run.get_status())
print("Result can be found in :", run.get_result())
print()

[38m2026-01-09 02:11:20,433 - INFO - model selection weights control: {}[0m
[38m2026-01-09 02:11:21,029 - INFO - Tensorboard records can be found in /tmp/nvflare/simulation/MEM_SAN_FedCollab/server/simulate_job/tb_events you can view it using `tensorboard --logdir=/tmp/nvflare/simulation/MEM_SAN_FedCollab/server/simulate_job/tb_events`[0m
[38m2026-01-09 02:11:21,029 - INFO - Tensorboard records can be found in /tmp/nvflare/simulation/MEM_SAN_FedCollab/server/simulate_job/tb_events you can view it using `tensorboard --logdir=/tmp/nvflare/simulation/MEM_SAN_FedCollab/server/simulate_job/tb_events`[0m
[38m2026-01-09 02:11:21,029 - INFO - Initializing ScatterAndGather workflow for Federated Averaging.[0m
[38m2026-01-09 02:11:21,029 - INFO - Both source_ckpt_file_full_name and ckpt_preload_path are not provided. Using the default model weights initialized on the persistor side.[0m
[38m2026-01-09 02:11:21,030 - INFO - Beginning ScatterAndGather training phase.[0m
[38m2026-01-09 

Exception in thread Thread-2 (run):
Traceback (most recent call last):
  File "/Users/tyleryang/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
Exception in thread Thread-2 (run):
Traceback (most recent call last):
  File "/Users/tyleryang/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/Users/tyleryang/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/python3.12/threading.py", line 1012, in run
    self.run()
  File "/Users/tyleryang/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/python3.12/threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/tyleryang/Developer/CMU-NVIDIA-Hackathon/Knowledge_Structures_Multimodal/.venv/lib/python3.12/site-packages/nvflare/app_common/executors/task_script_runner.py", line 71, in run
    raise e
  File "/Users/tyleryang/Dev

[38m2026-01-09 02:11:25,541 - INFO - Abort signal received. Exiting at round 0.[0m

Note, get_status returns None in SimEnv. The simulation logs can be found at /tmp/nvflare/simulation/MEM_SAN_FedCollab
Job Status is: None
Result can be found in : /tmp/nvflare/simulation/MEM_SAN_FedCollab



#### 5. Visualize the Training
You can use TensorBoard to show the experiment tracking curves by running

```bash
tensorboard --bind_all --logdir /tmp/nvflare/simulation/MEM_SAN_FedCollab
```
in another terminal or directly show the training curves in the next notebook cell.

In [None]:
# asked Holger, logs are in server/simulate_job/tb_events
# they were empty because of the 2000 datapt condition.
%load_ext tensorboard
%tensorboard --bind_all --logdir /tmp/nvflare/simulation/MEM_SAN_FedCollab # also had this stuck as hello-pt