# NVIDIA FLARE's Federated Computing Platform


In this chapter, we will overview the NVIDIA FLARE (NVFlare)'s core concept and system architecture. Explore different aspects of the NVFlare's system. Simulate the deployment locally and see how to interact with the system. 

## Federated Learning vs. Federated Computing

At its core, FLARE serves as a federated computing framework, with applications such as Federated Learning and Federated Analytics built upon this foundation. Notably, it is agnostic to datasets, workloads, and domains. In contrast to centralized data lake solutions that necessitate copying data to a central location, FLARE brings computing capabilities directly to distributed datasets. This approach ensures that data remains within the compute node, with only pre-approved, selected results shared among collaborators. Moreover, FLARE is system agnostic, offering easy integration with various data processing frameworks through the implementation of the FLARE client. This client facilitates deployment in sub-processes, Docker containers, Kubernetes pods, HPC, or specialized systems.





## Core Concepts

In NVIDIA FLARE (NVFlare), therea are few core concepts: 

* Server side component: Controller
* Client side compoenet: Executor
* Communication message: Sharable 
* Filtering mechanism
* Building Block: FLComponent 
* Job
* 

In Part 1, we only encountered Job, we will discuss the rest in this section

### Controller

The controller is the object that define the logics for the clients to follows. The controller API makes it possible to create any client coordination logic in an federated learning workflow. 

In other works, the controller defines the workflow: i.e. how the federated execution will be carry out, for example, the execution is in round-robin style or scatter & gather style, is defined by controller. 

The controller, in the most cases, in exceuted on the FL server. Some refers this as server stratedy.  The controller can also be executed in client side ( refer as client-side-controller). This can be used for define peer-to-peer style of workflow such as swarm learning. 


 ### Executor

The Exectuor is the object that defines the logics to execute on the client side. It handles the task defined by Controller and response back the task requests. 

The interaction between the Controller and Executor can be found the following picture 



### Shareable 

A [Shareable](https://nvflare.readthedocs.io/en/main/programming_guide/shareable.html) object represents a communication between server and client. Technically a Shareable object is implemented as a Python dict. This dict contains two kinds of information:
* Header 
    * Peer Properties
    * Cookie 
    * return code
* Content

In other words, Shareaable is nothing but a dictionary with some metadata information


The Controller and Executor exchange Shareable

<img src="controller_executor_no_filter.png" alt="Controller and executor" width="700" height="400">



### Filters

NVIDIA FLARE also introduce the filtering mechanism to allow user to limit the input & outputs. Filters in NVIDIA FLARE is way to transform the Shareable object between the communicating parties. A [Filter](https://nvflare.readthedocs.io/en/main/programming_guide/filters.html) can be used to provide additional processing to shareable data before sending or after receiving from the peer.

<img src="controller_worker_flow.png" alt="Controller and executor with filters" width="700" height="400">




### FLComponent

The NVIDIA FLARE is built with components. FLComponent is the build block of all component (Base Class). Controller, Executor, Fitler and Shareable are all type of FLComponent

The core property of FLCompoent is event-suport. FLComponent is able to fire and receive events, this enable the FLARE system event-driven, pluggable system. 


### Events

NVIDIA Flare fires and manages events in the lifecycle of the system. There are two categories of event types: Local Event and Fed Event. 

Both client and server has local events for the respective system activities. The client's local can also converted to be "Fed Event" which means the event will be propergate and fired on the server side. 



## High-Level Concepts

Although understand these core concepts will enable FLARE users to build power federated computing algorithms, but to some data scientists, higher level construct is preferred. 

NVFLARE also introduced the few concepts to reduce the learning curve. 

 * FLModel -- higher level communication data structure


###  FLModel

FLModel structure is an higher level data structure that designed for data scientists. This structure may not be general for the common federated computing messaging communication, but it is suitable for the federated learning applications

We define a standard data structure FLModel that captures the common attributes needed for exchanging learning results. This is particularly useful when NVFlare system needs to exchange learning information with external training scripts/systems. The external training script/system only need to extract the required information from received FLModel, run local training, and put the results in a new FLModel to be sent back.

Behinds the scence, we will convert the FLModel structure to and from Sharable. 


**FLModel** 

a standardize data structure for NVFlare to communicate with external systems.

**Parameters:**
* params_type – type of the parameters. It only describes the “params”. If params_type is None, params need to be None. If params is provided but params_type is not provided, then it will be treated as FULL.
* params – model parameters, for example: model weights for deep learning.

* optimizer_params – optimizer parameters. For many cases, the optimizer parameters don’t need to be transferred during FL training.

* metrics – evaluation metrics such as loss and scores.

* start_round – the start FL rounds. A round means round trip between client/server during training. None for inference.

* current_round – the current FL rounds. A round means round trip between client/server during training. None for inference.

* total_rounds – total number of FL rounds. A round means round trip between client/server during training. None for inference.

* meta – metadata dictionary used to contain any key-value pairs to facilitate the process.



Now with a few concepts, lets take a look at the [system architecture](../03.1_federated_computing_architecture/system_architecture.ipynb)

