# NVIDIA FLARE's Federated Computing Platform

## Introduction
Welcome to Chapter 3 of our self-paced training course on NVIDIA FLARE! In this chapter, we'll explore the core concepts and system architecture that make NVIDIA FLARE (NVFlare) a powerful platform for federated computing. We'll examine different aspects of the NVFlare system, learn how to simulate deployments, and discover various ways to interact with the system.

### Learning Objectives
By the end of this chapter, you will be able to:
- Distinguish between federated computing and federated learning, and understand where NVIDIA FLARE fits in
- Identify and explain the core components of NVIDIA FLARE's architecture
- Understand how these components interact to enable federated workflows

## Federated Learning vs. Federated Computing: Understanding the Distinction

Before diving into the technical details, let's clarify an important distinction:

**Federated Learning** is a specific application of distributed machine learning where models are trained across multiple decentralized devices or servers holding local data samples, without exchanging the data itself.

**Federated Computing** is the broader paradigm that enables computation to be performed across distributed systems while maintaining data privacy and security. Federated learning is just one application built on this foundation.

NVIDIA FLARE serves as a federated computing framework, with applications such as Federated Learning and Federated Analytics built upon this foundation. Key characteristics include:

- **Data Domain Agnostic**: Works with any type of dataset, workload, or domain
- **Computation at the Data**: Instead of centralizing data (as in data lake solutions), FLARE brings computing capabilities directly to where data resides
- **Privacy-Preserving**: Data remains within its original compute node, with only pre-approved, selected results shared among collaborators
- **System Agnostic**: Easily integrates with various data processing frameworks through the FLARE client
- **Deployment Flexibility**: Can be deployed in sub-processes, Docker containers, Kubernetes pods, HPC, or specialized systems


## Core Concepts of NVIDIA FLARE

To understand how NVIDIA FLARE works, we need to familiarize ourselves with several key concepts that form the foundation of its architecture. Let's explore each one in detail:

### 1. Controller: The Orchestrator

**What is it?** The Controller defines the logic for clients to follow and orchestrates the entire federated workflow.

**Key responsibilities:**
- Determines how federated execution will be carried out
- Defines the coordination pattern (e.g., round-robin, scatter & gather)
- Manages the flow of tasks to clients
- Processes responses from clients

**Where does it run?** In most cases, the Controller runs on the FL server (sometimes called the "server strategy"). However, it can also run on the client side ("client-side-controller") to enable peer-to-peer workflows like swarm learning.

**Why is it important?** The Controller is the brain of the federated system, making decisions about which clients participate, what tasks they perform, and how results are aggregated.

### 2. Executor: The Worker

**What is it?** The Executor defines the logic that runs on the client side.

**Key responsibilities:**
- Receives tasks from the Controller
- Executes the requested operations (e.g., model training, evaluation)
- Sends results back to the Controller
- Manages local resources and data access

**Why is it important?** Executors perform the actual computational work in the federated system, operating on local data without sharing the raw data itself.

The relationship between Controllers and Executors forms the backbone of NVIDIA FLARE's federated computing architecture. Let's visualize this interaction:

**Controller-Executor Interaction:**

<img src="controller_executor_no_filter.png" alt="Controller and executor" width="700" height="400">



### 3. Shareable: The Communication Medium

**What is it?** A [Shareable](https://nvflare.readthedocs.io/en/main/programming_guide/shareable.html) is the standardized object that facilitates communication between the server and clients.

**Technical implementation:** At its core, a Shareable is a Python dictionary with two main sections:

**Header information:**
- **Peer Properties**: Information about the sender
- **Cookie**: Stateful information that can be passed back and forth
- **Return Code**: Status of the operation

**Content:**
- The actual payload data being transferred (e.g., model weights, evaluation metrics)

**Why is it important?** Shareables provide a standardized format for all communication in the federated system, making it easier to implement security measures, track provenance, and ensure compatibility.

**Example:** When a server sends a model to a client for training, the model parameters are packaged in a Shareable. When the client completes training, it packages the updated model in a Shareable to send back to the server.


### 4. Filters: The Gatekeepers

**What are they?** [Filters](https://nvflare.readthedocs.io/en/main/programming_guide/filters.html) transform Shareable objects as they pass between communicating parties.

**Key capabilities:**
- Inspect and modify data before sending or after receiving
- Implement privacy-preserving techniques (e.g., differential privacy)
- Perform compression or encryption
- Validate data against security policies
- Log or monitor communication

**Why are they important?** Filters provide a modular way to add security, privacy, and efficiency features without changing the core communication logic.

**Example use cases:**
- Adding noise to model updates for differential privacy
- Compressing model weights to reduce bandwidth usage
- Encrypting sensitive information
- Validating that shared data complies with privacy policies

Here's how Filters fit into the Controller-Executor interaction:

<img src="controller_worker_flow.png" alt="Controller and executor with filters" width="700" height="400">

As shown in the diagram, Filters can be applied on both the sending and receiving sides, creating a flexible pipeline for data transformation.

### 5. FLComponent: The Building Block

**What is it?** FLComponent is the fundamental building block of all components in NVIDIA FLARE.

**Key characteristics:**
- Base class for Controllers, Executors, Filters, and other system components
- Provides event support (can fire and receive events)
- Enables the event-driven, pluggable architecture of FLARE

**Why is it important?** The component-based design allows for modular system construction, where components can be mixed and matched to create different federated workflows.

### 6. FLContext: The Information Carrier

**What is it?** FLContext is a mechanism for passing data between FL components.

**Key capabilities:**
- Available to every method of all FLComponent types
- Allows components to access services provided by the infrastructure
- Enables data sharing between components, even across endpoints

**Technical implementation:** FLContext functions like a Python dictionary storing key-value pairs (called "properties" or "props"). These properties have two important attributes:
- **Visibility**: Controls which components can see the property
- **Stickiness**: Determines how long the property persists

**Why is it important?** FLContext provides a standardized way for components to communicate and share state, making it easier to build complex workflows.

### 7. Events: The Communication Mechanism

**What are they?** Events are signals that components can fire and respond to during the system's lifecycle.

**Types of events:**
- **Local Events**: Occur within a single system (client or server)
- **Fed Events**: Propagate from client to server

**Why are they important?** The event system enables loose coupling between components, allowing for more flexible and extensible system design.

## Higher-Level Abstractions

While understanding the core concepts provides a solid foundation, NVIDIA FLARE also offers higher-level abstractions to simplify development for data scientists and ML practitioners.

### FLModel: A Data Scientist-Friendly Structure

**What is it?** FLModel is a standardized data structure designed specifically for federated learning applications.

**Key components:**
- **params_type**: Describes the type of parameters (e.g., FULL, DIFF)
- **params**: The actual model parameters (e.g., neural network weights)
- **optimizer_params**: Parameters for the optimization algorithm
- **metrics**: Evaluation metrics such as loss and scores
- **round information**: start_round, current_round, total_rounds
- **meta**: A metadata dictionary for any additional key-value pairs

**Behind the scenes:** FLModel is converted to and from Shareable objects automatically, abstracting away the lower-level details.

**Why is it important?** FLModel simplifies the interface between NVIDIA FLARE and external training systems, allowing data scientists to focus on their models rather than communication details.

## Putting It All Together

Now that we've explored the core concepts of NVIDIA FLARE's federated computing platform, let's summarize how they work together:

1. The **Controller** (typically on the server) orchestrates the federated workflow
2. It communicates with **Executors** (on clients) by sending and receiving **Shareable** objects
3. **Filters** can transform these Shareables to implement privacy, security, or efficiency features
4. All of these components are built on the **FLComponent** base class, enabling an event-driven architecture
5. Components communicate with each other through **FLContext** and **Events**
6. Higher-level abstractions like **FLModel** simplify development for specific use cases

In the next section, we'll explore the [system architecture](../03.1_federated_computing_architecture/system_architecture.ipynb) in more detail, seeing how these concepts are implemented in practice.

## Key Takeaways

- NVIDIA FLARE is a federated computing platform that brings computation to data, rather than centralizing data
- The Controller-Executor interaction forms the backbone of federated workflows
- Shareables standardize communication between components
- Filters provide a modular way to implement privacy, security, and efficiency features
- The component-based, event-driven architecture enables flexible system design
- Higher-level abstractions simplify development for specific use cases
