# Swarm Learning with Cross-Site Evaluation

In this chapter, we will demonstrate the [Swarm Learning](https://www.nature.com/articles/s41586-021-03583-3) and **Client-Controlled Cross-Site Evaluation** Workflows using the Client API and the CIFAR10 dataset. 
Unlike traditional federated learning, which relies on a central server to aggregate model updates, Swarm Learning eliminates the need for a central aggregator. Each participating node trains the model locally and shares only the learned parameters (e.g., weights) with other nodes, which act directly as aggregators.

## Swarm Learning

<img src="figs/swarm_learning.png" alt="swarm ccwf" width=35% height=35% />

Swarm Learning is a decentralized Federated Averaging algorithm where the key difference is that the server is not trusted with any sensitive information. The server is now only responsible for job health and lifecycle management via the `SwarmServerController`, while the clients are now responsible for training and aggregation logic via the swarm client-controlled `SwarmClientController`.
Similarly to the `Client-Controlled Cyclic Workflow` described in the previous [chapter](../07.2.2_cyclic/cyclic_weight_transfer_example.ipynb), the server is not involved in the communication of weight updates — instead a **peer-to-peer** communication channel is used for implementing swarm learning.

- `SwarmServerController`: manages swarm job lifecycle and configurations such as `aggr_clients` and `train_clients`
- `SwarmClientController`: sends `learn_task`  to all training clients to invoke their executors for `train` task each round, and sends results to designated `aggr_client` for aggregation.

Required tasks: `train`

See the full definitions of [SwarmServerController](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/ccwf/swarm_server_ctl.py) and [SwarmClientController](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/ccwf/swarm_client_ctl.py) for all available arguments.



## Client-Controlled Cross-Site Evaluation

<img src="figs/client_controlled_cse.png" alt="cse ccwf" width=35% height=35% />

In client-controlled cross-site evaluation, rather than sending client models to the server for distribution, clients instead communicate directly with each other to share their models for validation.


- `CrossSiteEvalServerController`: manages evaluation workflow and configurations such as `evaluators` and `evaluatees`
- `CrossSiteEvalClientController`: sends `eval` request to evaluators, evaluators send `get_model` task to evaluatees, evaluatees send their model back with `submit_model`, and evaluators perform `validate` on the model and send the results to the server. 

Required tasks: `validate`, `submit_model`

See the full definition of [CrossSiteEvalClientController](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/ccwf/cse_client_ctl.py) for all available arguments.

## Converting DL training code to FL training code
We will be using the [Client API FL code](../train.py) trainer converted from the original [Training a Classifer](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) example.



## Install requirements
If you haven't yet, install required packages.

In [None]:
!pip install --upgrade pip
!pip install -r ./requirements.txt

## Prepare Data

Make sure the CIFAR10 dataset is downloaded with the following script:

In [2]:
! python ../download_cifar10.py

## Run Job API

Let's use the Job API to create a CCWF Job.

We use the `add_swarm()` function to add our server_config, client_config, and cse_config.

First add the `SwarmServerConfig` with the number of rounds for the `SwarmServerController`.

We also add `CrossSiteEvalConfig` for the `CrossSiteEvalServerController`.

On the client side, we add the `SwarmClientConfig` for the `SwarmClientController` which maps the `learn_task_name` to `train` and add the `CrossSiteEvalClientController` which uses the `validate` and `submit_model` tasks. These task are handled by the `ScriptRunner` with our `train.py` script. Additionally, required components including the persistor with the initial `Net()` model, aggregator, and shareable generator are defined as client-side components.

Let's first copy the required files:

In [3]:
! cp ../train.py train.py
! cp ../net.py net.py

Then we can use Job API to easily create a job and run in simulator:

In [None]:
from net import Net

from nvflare.apis.dxo import DataKind
from nvflare.app_common.aggregators.intime_accumulate_model_aggregator import InTimeAccumulateWeightedAggregator
from nvflare.app_common.ccwf.ccwf_job import CCWFJob, CrossSiteEvalConfig, SwarmClientConfig, SwarmServerConfig
from nvflare.app_common.ccwf.comps.simple_model_shareable_generator import SimpleModelShareableGenerator
from nvflare.app_opt.pt.file_model_persistor import PTFileModelPersistor
from nvflare.job_config.script_runner import ScriptRunner

n_clients = 2
num_rounds = 3
train_script = "train.py"

job = CCWFJob(name="swarm")
aggregator = InTimeAccumulateWeightedAggregator(expected_data_kind=DataKind.WEIGHTS)
job.add_swarm(
    server_config=SwarmServerConfig(num_rounds=num_rounds),
    client_config=SwarmClientConfig(
        executor=ScriptRunner(script=train_script),
        aggregator=aggregator,
        persistor=PTFileModelPersistor(model=Net()),
        shareable_generator=SimpleModelShareableGenerator(),
    ),
    cse_config=CrossSiteEvalConfig(eval_task_timeout=300),
)

job.export_job("/tmp/nvflare/jobs/job_config")
job.simulator_run("/tmp/nvflare/jobs/workdir", n_clients=n_clients, gpu="0")

The previous cell exports the job config and executes the job in NVFlare simulator.



## Summary

In this notebook, we explored two key concepts in advanced federated learning:

1. **Swarm Learning**:
- A decentralized approach to federated learning that eliminates the need for a central server
- Uses `SwarmServerController` for job lifecycle management
- Employs `SwarmClientController` for training and aggregation logic
- Clients directly share parameters with each other instead of going through a central server
2. **Client-Controlled Cross-Site Evaluation**:
- Enables direct client-to-client model sharing for validation
- Uses `CrossSiteEvalServerController` for workflow management
- Implements `CrossSiteEvalClientController` for evaluation coordination
- Supports tasks like "validate" and "submit_model"

This approach provides enhanced privacy and security by keeping sensitive data on client devices while enabling effective model training and evaluation across multiple sites.

Next, we'll learn about [Split Learning](https://arxiv.org/abs/1810.06060), another alternative to standard federated learning, suitable for vertical data partitioning among sites. To enable, real-world split learning, we start with a privacy-preserving way to find common case ids between datasets from different sites, namely [Private Set Intersection](../07.2.4_split_learning/federated_private_set_intersection.ipynb).