# NVIDIA FLARE Overview

* Apache License 2.0 to catalyze FL research & development​
  
* Designed for enterprise production
   
* Able to run on CPU, GPU, and Multi-GPU

* Enables cross-country, distributed, multi-party collaborative learning​

* Production scalability with high availability and multi-task execution​

* Framework, model, domain, and task agnostic​

* Layered, pluggable, customizable federated compute architecture​


# NVIDIA FLARE Architecture Overview

* Layered, pluggable open architecture​

* Each layer’s components are composable and pluggable​

* Network: Communication & Messaging layer ​

   Drivers: gRPC, HTTP + WebSocket, TCP, any plugin driver​

* CellNet: logical end-to-end (cell to cell) network​

* Message: reliable streaming message ​

* Federated Computing Layer​

* Resource-based job scheduling, job monitoring, concurrent job lifecycle management, high-availability management​

* Plugin component management ​

* Configuration management​

* Local event and federated event handling​

* Federated Workflow​

   SAG, Cyclic, Cross-site Evaluation, Swarm Learning, Federated Analytics​

* Federated Learning Algorithms​

<img src="./flare_overview.png" alt="FLARE Architecture" width="700" height="400">


NVIDIA FLARE is built in layers. Each layer is built on top of the next. At the bottom layer is the network communication layer.

## FCI - Flare Communication Interface

FCI is a logical network framework that supports asynchronous, 2-way communication through multiple transports. It is:

* **Pluggable**: It has a pluggable architecture to support different messaging patterns (request-response, broadcast, pub/sub). It can also support different transports through drivers, like TCP, Pipe, HTTP/WS, gRPC.

* **Streamable**: Large binary data can be streamed in small chunks to minimize memory usage.

* **Full-duplex**: Both sides can send messages to each other without polling, if the transport supports it.

* **Multiplex**: Multiple conversations can be conducted over the same connection at the same time using stream IDs.

* **Asynchronous**: Can send/receive messages in an asynchronous fashion like fire/forget, listen to messages.

* **One-way connection** for remote communications: All TCP-based connections can be initiated from clients so clients have no port exposed.

* **Supports IPC**: It can work with communications through pipes or sockets between processes.

* **Native heartbeats**: Heartbeats are supported by FCI to keep connections alive.

From top to bottom, FCI has the following layers:

* **API Layer**: This is the API exposed to application developers, like Communicator and Cellnet.
* **Streamable Framed Message (SFM)**: This is the core of FCI and it provides abstraction on top of different communication protocols. It manages endpoints and connections.
* **Transport Drivers**: This layer is responsible for sending frames to other endpoints. It treats the frame as opaque bytes.

<img src="./fci.png" alt="FLARE Communication Interface" width="300" height="400">

## Federated Computing Architecture

There are two parent control processes with corresponding job processes on each site. This enables support of concurrent, multi-job processes.

<img src="./system_architecture.png" alt="FLARE System Architecture" width="700" height="400">


## Event-Based System

ALL NVIDIA FLARE's components (FLComponent) has event handling and event firing via the runtine engine. As result, user can write a FLComponent as plugin and listen to event and write any customized logics at any layers. 


## Federated Learning Framework

Based on the basic core concepts, we have built many Federated learning workflows including FedAvg, FedOpt, FedProx, Scaffold, cyclic, swarming learning, split learning algorithms with many examples which can be found on the [website](https://nvidia.github.io/NVFlare/) and its [tutorial categories](https://nvidia.github.io/NVFlare/catalog/).

## Enterprise Security and Privacy

We have many features to support enterprise security as well as support privacy-enhancing technologies (PETs). Please refer to [Part-3 Security and Privacy](../../../part-3_security_and_privacy/part-3_introduction.ipynb).

## Simulations

We have built different tools for simulation including Python API and CLI. You have seen the Job API and simulator CLI in [Chapter-1](../../../part-1_federated_learning_introduction/Chapter-1_running_federated_learning_applications/01.0_introduction/introduction.ipynb).

In [Section 3.2](../03.2_deployment_simulation/simulate_real_world_deployment.ipynb), we will also discuss how to simulate the deployment within a local machine.

## Setup and Deployment

Setting up the federated computing system is not a trivial task. We have built tools to make this process simpler. We will discuss this in [Chapter 4](../../chapter-4_setup_federated_system/04.0_introduction/introduction.ipynb).

## Different type of FLARE APIs

At its Core, Flare uses controller and executor assign tasks and execute tasks for each job. There we have the 

### Python APIs

* **Controller, Executor API** -- those are the lower-level API that gives the full control and power for any type of federated computing

* **ModelController and Client API** -- This is higher level API based on the assumption that for many machine learning and deep learning algorithms, we can use the FLModel data structure to capture the input and output. 

        ```
        class FLModel:
            def __init__(
                self,
                params_type: Union[None, str, ParamsType] = None,
                params: Any = None,
                optimizer_params: Any = None,
                metrics: Optional[Dict] = None,
                start_round: Optional[int] = 0,
                current_round: Optional[int] = None,
                total_rounds: Optional[int] = None,
                meta: Optional[Dict] = None,
            ):
            ...

        ```
This data structure essentially capture the model ( parameter type (Full, Diff), model paramaters (weights), optmizer parameters), metrics, metadata.  This kind data structure is understandable by most data scientists. 

The Server side, we have ModelController -- Controller use and consume FLModel, on the client side we have Client API that receive and send model update via FLModel. You already seen this in previous chapters. 


* **Job API** -- FLARE Job API is a way to generate job configuration. Although once can direct edit configuration files, one can also use the Job API to construct the needed components and generate the job configuration. The job API can also call job.simulate_run()  -- which is combined step of export job configuration and call simulator run. 

* **Simulator API** -- one can directly invokve  simulator_run() method start simulation in python


* **FLARE API** -- FLARE python API is equivallent FLARE Console command API. Instead of interact with FL system via Console command, we can perform most of the command functions via FLARE API. These includes connect to the server, checking status, monitoring jobs, submit job etc. 


### Command Line Interface

FLARE has several CLIs under the  ```nvflare `` command

nvflare --version   

nvflare poc --- POC command

nvflare preflight_check -- check FL system setup to see anything not working and why

nvflare provision  -- provision tool

nvflare simulator -- simulator CLI

nvflare dashboard -- start NVFLARE dashboard, a Web UI to allow participant distribute provisioned startup kit

nvflare authz_preview  --- look at different user roles 

nvflare job --  CLI job command to allow user to create job configuration based on the job templates, list existing templates, and submit job to production and POC.  

nvflare config -- this command to allow user to setup default startup dir, poc workspace dir and job template dir locally. This is usefuly for local development with job templates and POC. 


## Configuration

NVFLARE supports several configuration formats: JSON, pyhocon, and YAML. You can see the details in [Configuration Files](https://nvflare.readthedocs.io/en/main/user_guide/configurations.html).

You can also leverage the existing [job templates](https://github.com/NVIDIA/NVFlare/tree/main/job_templates): a set of predefined configurations and using [job CLI](https://github.com/NVIDIA/NVFlare/blob/main/examples/tutorials/job_cli.ipynb) to customize to your needs. 

# Job Template

Job templates are set of existing job configurations with specified structure 

For example 
```

├── config_fed_client.conf
├── config_fed_server.conf
├── info.conf
├── info.md
└── meta.conf

```

Each job template consits "information card", info.conf, display card "info.md" and job configuration files. 

The configuraiton is defined in pyhocon format so we can add comments and explain the details 

we can take a look at one example 

```job_templates/sag_pt/config_fed_client.conf```


In [None]:
! cat  ../../../../../../job_templates/sag_pt/config_fed_client.conf

With job templates, we can use CLI command to view and modify template during job configuration creatation 

You can find many details in [job cli tutorial](../../../../job_cli.ipynb). 