# Cyclic Weight Transfer (CWT) with Cyclic Workflow

In this example, we will demonstrate the Cyclic workflow using the Client API with the CIFAR10 dataset. 

## Cyclic Workflow
<a id = "cyclic_workflow"></a>

[Cyclic Weight Transfer](https://pubmed.ncbi.nlm.nih.gov/29617797) (CWT) uses the server-controlled `CyclicController` to pass the model weights from one site to the next in a cyclic fashion. 

In the Cyclic workflow, sites train one at a time, while sending the model to the next site. The order of the sites can be specified as fixed, random, or random (without same in a row).  A round is finished once all sites in the defined order have completed training once, and the final result is returned to the server. This differs from Scatter-and-Gather or FedAvg workflows, wherein all sites train simultaneously and aggregate their results together at the end of a round.

## Converting DL training code to FL training code

We will be using the [Client API FL code](../code/fl/train.py) trainer converted from the original [Training a Classifer](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) example.

## Install requirements
Install required packages

In [None]:
!pip install --upgrade pip
!pip install -r ./requirements.txt

## Prepare Data

In [2]:
! python ../download_cifar10.py

## Job Configuration

The client configuration for the trainer with the Client API is standard with the `ScriptRunner`, and our defined `train.py` that supports the `train` task. This is the same configuration used in the SAG pt workflow.

For the server configuration we use the `CyclicController` for the workflow, and define arguments such as number of rounds, order of relaying sites, and the `train` task name that the clients support. 


Let's first copy the required files:

In [None]:
! cp ../train.py train.py
! cp ../net.py net.py

## Run Job API

Then we can use Job API to easily create a job and run in simulator. We simulate three clients, runing CWT for three rounds.

In [None]:
from net import Net
from nvflare import FedJob
from nvflare.app_common.workflows.cyclic import Cyclic
from nvflare.app_opt.pt.job_config.model import PTModel
from nvflare.job_config.script_runner import ScriptRunner


if __name__ == "__main__":
    n_clients = 2
    num_rounds = 3
    train_script = "train.py"

    job = FedJob(name="cyclic")

    # Define the controller workflow and send to server
    controller = Cyclic(
        num_clients=n_clients,
        num_rounds=num_rounds,
    )
    job.to(controller, "server")

    # Define the initial global model and send to server
    job.to(PTModel(Net()), "server")

    # Add clients
    for i in range(n_clients):
        runner = ScriptRunner(
            script=train_script,
            script_args="",
        )
        job.to(runner, f"site-{i+1}")

    job.export_job("/tmp/nvflare/jobs")
    job.simulator_run("/tmp/nvflare/jobs/workdir", gpu="0")


Ensure that the `app_script` is set to the Client API FL `train.py` code and the model path for the persistor is set to the `net.Net`.

The previous cell exports the job config and executes the job in NVFlare simulator.


Next, we use a client-controlled version of the cyclic workflow.

# Cyclic Weight Transfer (CWT) with Client-Controlled Cyclic Workflow

In this example, we will demonstrate the Client-Controlled Cyclic Workflow using the Client API with the CIFAR10 dataset. 
This differs from the **Server-Controlled Cyclic Workflow** use above, as the server is not involved in communication with sensitive information in the case that is it not trusted. Therefore, NVFlare implements a **peer-to-peer** communication channel for CWT.

## Client-Controlled Cyclic Workflow

<img src="figs/cyclic_ccwf.png" alt="cyclic ccwf" width=35% height=35% />

The `CyclicServerController` is responsible for managing the lifecycle of the job, and will assign `cyclic_config` and `cyclic_start` tasks for configuration and to begin the training workflow. The configuration includes picking the starting client, result clients, and defining the cyclic order.

The `CyclicClientController` is responsible for the training logic once `cyclic_start` is sent, and the *Cyclic Workflow* is algorithmically the same as the server-controlled version. The main difference is transferring the model is now encrypted with secure **peer-to-peer** messaging, and only the result clients receive the model, rather than the server.

See the [docs](https://nvflare.readthedocs.io/en/main/programming_guide/controllers/client_controlled_workflows.html#cyclic-learning) for more information about the *Client-Controlled Cyclic Workflow*.

Again, we will be using the same [Client API FL code](../code/fl/train.py) trainer.


## Run Job API

Let's use the Job API to create a CCWF Job.

We use the `add_cyclic()` function to add our server_config and client_config.

First add the `CyclicServerConfig` for the `CyclicServerController` with our desired parameters.
Here we set the required number of rounds, and also increase the max status report interval to 300 seconds.

Next we add the `CyclicClientConfig` for the `CyclicClientController` that handles all `cyclic_*` tasks and maps the `learn_task_name` to the `train` task handled by the `ScriptRunner` with our `train.py` script. The `PTFileModelPersistor` with the initial `Net()` model and the `SimpleModelShareableGenerator` are also added as components in the `CyclicClientConfig`.

Then we can use Job API to easily create a job and run in simulator:

In [None]:

from net import Net

from nvflare.app_common.ccwf.ccwf_job import CCWFJob, CyclicClientConfig, CyclicServerConfig
from nvflare.app_common.ccwf.comps.simple_model_shareable_generator import SimpleModelShareableGenerator
from nvflare.app_opt.pt.file_model_persistor import PTFileModelPersistor
from nvflare.job_config.script_runner import ScriptRunner

n_clients = 2
num_rounds = 3
train_script = "train.py"

job = CCWFJob(name="cifar10_cyclic")

job.add_cyclic(
    server_config=CyclicServerConfig(num_rounds=num_rounds, max_status_report_interval=300),
    client_config=CyclicClientConfig(
        executor=ScriptRunner(script=train_script),
        persistor=PTFileModelPersistor(model=Net()),
        shareable_generator=SimpleModelShareableGenerator(),
    ),
)

job.export_job("/tmp/nvflare/jobs/job_config")
job.simulator_run("/tmp/nvflare/jobs/workdir", n_clients=n_clients, gpu="0")

Again, you need to ensure that the `train_script` is set to the Client API FL `train.py` code and the model path for the persistor is set to `net.Net`.

## Summary

In this notebook, we demonstrated two implementations of Cyclic Weight Transfer (CWT) using NVFlare:
    
1. **Server-Controlled CWT Workflow**:
- Uses the `CyclicController` on the server side
- Server manages the training order and model distribution
- Sites train sequentially, passing model weights to the next site
- Server receives and persists the final model after each round
    
2. **Client-Controlled CWT Workflow**:
- Uses `CyclicServerController` and `CyclicClientController`
- Implements peer-to-peer communication for enhanced privacy
- Server only manages job lifecycle and configuration
- Clients handle model transfer and training coordination
- Supports encrypted model transfer between clients

Key Features:
- Both workflows maintain the same algorithmic approach to CWT
- Support for fixed, random, or random-without-repetition site ordering
- Integration with PyTorch models and training scripts
- Built-in support for model persistence and evaluation

The main difference between the two approaches is the level of server involvement in the training process, with the client-controlled version providing enhanced privacy through peer-to-peer communication.


Next, we will have a look at a [swarm learning](../07.2.3_swarm_learning/swarm_learning.ipynb) example, which also covers client-controlled cross-site evaluation workflows.