# Building a custom Controlled in NVFlare

In this tutorial, we'll walk through how to build a custom controller in NVFlare for performing a Peer-to-Peer (P2P) distributed optimization algorithm. We'll go through the various components involved, including `DXO`, `Task`, `Shareable`, `Signal`, and `FLContext`, and how they fit into the overall flow of the `Controller`.

## Overview

In NVFlare, a `Controller` is a server-side component responsible for managing job execution and orchestrating tasks across clients.
In our P2P algorithm, the custom controller's main responsibilities are:

- Loading and broadcasting the network configuration to clients.
- Initiating and terminating the execution of the P2P distributed optimization algorithm.

We'll create a custom controller named `DistOptController` by subclassing the base `Controller` class provided by NVFlare.

> Notice that a `Controller` is a subclass of [`FLComponent`](https://nvflare.readthedocs.io/en/2.5.2/apidocs/nvflare.apis.fl_component.html#nvflare.apis.fl_component.FLComponent). In NVFlare, the FLComponent is the base class of all FL Components, including controllers, executors, responders, filters, aggregators and many others (see the [here](https://nvflare.readthedocs.io/en/2.5.2/programming_guide/fl_component.html) for more details). FLComponents have the capability to handle and fire events and contain various methods for logging.

## Implementing the Controller
As a subclass of `Controller`, our `DistOptController` must implement three methods:

- `start_controller`: Called at the beginning of the run.
- `control_flow`: Defining the main control flow of the controller  (in this case, broadcasting the configuration and telling to the clients to run the algorithm)
- `stop_controller`: Called at the end of the run.

Here's the basic structure we'll use:

```python
from nvflare.apis.fl_context import FLContext
from nvflare.apis.impl.controller import Controller
from nvflare.apis.signal import Signal

class DistOptController(Controller):

    def control_flow(self, abort_signal: Signal, fl_ctx: FLContext):
        # Broadcast configuration to clients and run the algorithm
        ...

    def start_controller(self, fl_ctx: FLContext):
        pass

    def stop_controller(self, fl_ctx: FLContext):
        pass
```

In this tutorial, we'll focus on the `control_flow` method, as it's where we'll implement the logic to send configurations to clients and instruct them to run the algorithm. The `start_controller` and `stop_controller` methods will remain empty, but could be used to initialize resources or perform setup tasks or clean up resources and perform any finalization tasks respectively.

### Some key components
Before proceeding, let's understand some key components we'll use. We provide the basic details needed to understand the rest of the notebook, but feel free to follow the provided links to know about each one.

- [`FLContext`](https://nvflare.readthedocs.io/en/2.5.2/apidocs/nvflare.apis.fl_context.html#nvflare.apis.fl_context.FLContext): The `FLContext` object carries all the execution context, which includes information about the current execution environment, such as run number, job ID, and other configurations. It's passed to many methods to provide context. More details can be found [here](https://nvflare.readthedocs.io/en/2.5.2/programming_guide/fl_context.html). We'll talk more about it when implementing the executors.
- [`Signal`](https://nvflare.readthedocs.io/en/2.5.2/apidocs/nvflare.apis.signal.html#nvflare.apis.signal.Signal): The `Signal` object provides a mechanism to signal events like abortion. Controllers and clients can check this signal to determine if they should stop execution gracefully. More details on handling abort signals [here](https://nvflare.readthedocs.io/en/2.5.2/best_practices.html#respect-the-abort-signal).
- [`Task`](https://nvflare.readthedocs.io/en/2.5.2/apidocs/nvflare.apis.controller_spec.html#nvflare.apis.controller_spec.Task): In NVFlare, a `Task` represents a unit of work assigned by the controller to clients. Each `Task` has a `name`, associated `data`, and other metadata. Tasks can be sent to specific clients or broadcasted to all clients. They follow a specific lifecycle, as explained [here](https://nvflare.readthedocs.io/en/2.5.2/programming_guide/controllers/controllers.html#task-lifecycle).
- [`DXO`](https://nvflare.readthedocs.io/en/2.5.2/apidocs/nvflare.apis.dxo.html#nvflare.apis.dxo.DXO) (Data Exchange Object): The `DXO` is a standardized data structure in NVFlare for exchanging information between communicating parties in the distributed system. It encapsulates data along with metadata like data kind (see the [`DataKind`](https://nvflare.readthedocs.io/en/2.5.2/apidocs/nvflare.apis.dxo.html#nvflare.apis.dxo.DataKind) object), ensuring consistency.
- [`Shareable`](https://nvflare.readthedocs.io/en/2.5.2/apidocs/nvflare.apis.shareable.html#nvflare.apis.shareable.Shareable): A `Shareable` is a data structure used for communication between the different players in NVFlare. It wraps data to be shared and can include additional metadata or headers. It is just a dict that can have any keys and values, however, **values must be serializable**.

### Initializing the Controller
To initialize the controller to, we override the `__init__` method to accept a `Config` object, which contains the network configuration and any extra parameters needed for the algorithm

```python
from nvflare.app_opt.p2p.types import Config

class DistOptController(Controller):
    def __init__(self, config: Config, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.config = config
```

An example of a valid configuration is shown below, we'll discuss more about it later on.
```shell
Config(
    extra={"iterations":100},
    network=Network(
        nodes=[
            Node(
                id='site-1',
                neighbors=[
                    Neighbor(id='site-2', weight=0.1),
                ]
            ),
            Node(
                id='site-2',
                neighbors=[
                    Neighbor(id='site-3', weight=0.1),
                ]
            ),
            Node(
                id='site-3',
                neighbors=[
                    Neighbor(id='site-1', weight=0.1),
                ]
            ),
        ]
    )
)
```

### Implementing the Control Flow

In the control_flow method, we'll perform two main steps:

1. Send the network configuration to the clients: Each client receives its specific configuration, such as neighbor information.

2. Run the algorithm: Instruct all clients to start executing the P2P optimization algorithm.

Let's first look at the implementation and then we'll break it down.


```python
from nvflare.apis.controller_spec import Task
from nvflare.apis.dxo import DXO, DataKind

class DistOptController(Controller):

    ...

    def control_flow(self, abort_signal: Signal, fl_ctx: FLContext):
        # 1. Send network config to each client
        for node in self.config.network.nodes:
            # Prepare the data using DXO
            dxo = DXO(
                data_kind=DataKind.APP_DEFINED,
                data={"neighbors": [n.__dict__ for n in node.neighbors]},
            )
            shareable = dxo.to_shareable()

            # Create the task with name "config"
            task = Task(name="config", data=shareable)

            # Send the task to the specific client and wait for completion
            self.send_and_wait(task=task, targets=[node.id], fl_ctx=fl_ctx)

        # 2. Instruct clients to run the algorithm
        targets = [node.id for node in self.config.network.nodes]
        
        # Prepare any extra parameters to send to the clients
        dxo = DXO(
            data_kind=DataKind.APP_DEFINED,
            data={key: value for key, value in self.config.extra.items()},
        )
        shareable = dxo.to_shareable()

        # Create the task with name "run_algorithm"
        task = Task(name="run_algorithm", data=shareable)

        # Broadcast the task to all clients and wait for all to respond
        self.broadcast_and_wait(
            task=task,
            targets=targets,
            min_responses=0,
            fl_ctx=fl_ctx,
        )
```

#### Implementation break-down
1. Send the network configuration to the clients
    - For each node in the network configuration:
        - Prepare the data using DXO:
            - We create a `DXO` object with the neighbors' information. As discussed above, it encapsulates the data to be sent to the client. Here, we use `DataKind.APP_DEFINED` to indicate that the data is application-defined.
        - Convert the `DXO` to a `Shareable`:
            - We call `dxo.to_shareable()` to create a `Shareable` object from the `DXO`. This conversion is necessary because NVFlare's communication mechanism uses `Shareable` objects. As mentioned, the `Shareable` object wraps the data to be transferred between server and clients. 
        - Create a task with name `"config"`:
            - We create a `Task` named `"config"` with the `Shareable` data. The task name "config" identifies the task type, which the client will recognize and handle accordingly (we'll see that in the next section, when building the executors)/
        - Send the task to the specific client and wait for completion:
            - Here we use the `send_and_wait` method to send the task to the target client (`node.id`) and wait for it to complete the task. This ensures synchronization before moving to the next step.
2. Running the Algorithm
    - Prepare the list of target clients:
        - We collect all node IDs from the network configuration into the `targets` list.
    - Prepare any extra parameters:
        - We create a `DXO` with any extra parameters needed for the algorithm, stored in `self.config.extra`.
    - Create and broadcast the task:
        - We create a `Task` named `"run_algorithm"` with the `Shareable` data.
        - Broadcast and wait:
            - We use `broadcast_and_wait` to send the task to all target clients. As the name suggests, this method broadcasts the same task to all specified clients and waits for their responses. Here, we set `min_responses=0`, indicating that we wait for all clients to respond/complete the algorithm before proceeding.

## Conclusion
Our custom `DistOptController` is now ready. It effectively manages the distribution of the network configuration to clients and initiates the execution of the P2P algorithm across them.
The complete implementation of the `DistOptController` can be found in `nvflare/app_opt/p2p/controllers/dist_opt_controller.py`.