# FLEXible tutorial: Text classification using *Transformers*

FLEXible is a library to federate models. We offer the tools to load and federate data or to load federated data, and the tools to create a federated environment. The user must define the model and the *communication primitives* to train the model in a federated environment. This primitives can be expressed in the following steps:
- initialization: Initialize the model in the server.
- deplot model: Deploy the model to the clients.
- training: Define the train function.
- collect the weights: Collect the weights of the clients params to aggregate them later.
- aggregate the weights: Use an aggregation method to aggregte the collected weights.
- deploy model: Deploy the model with the updated weights to the clients.
- evaluate: Define the evaluate function.

In this notebook, we show how to implement this primitives and how to use FLEXible in orther to federate a model using Huggingface's library *transformers*. In this way, we will train a model using multiple clients, but without sharing any data between clients. We will follow this [tutorial](https://huggingface.co/docs/transformers/training) from the Huggingface tutorials for text classification. 

## Setup

In [None]:
from copy import deepcopy
import numpy as np

from datasets.load import load_dataset
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

TRANSFORMER_MODEL = "distilbert-base-uncased"

## Download the IMBD dataset

In the tutorial the dataset used is the Yilp Reviews, but in FLEXible we have some Pluggable Datasets, which we can directly adapt to the FLEXible data structure. In this case, we will use the IMDB dataset, as it is commonly used as an example for text classification, and it is the one we're using in the tutorials for text classification when using FLEXible.

In [None]:
ag_news_dataset = load_dataset('imdb', split=['train', 'test']) # Get the dataset from huggingface library

We now show the structure of the dataset

And then select the train-test partition, so we can federate the train data between clients.

In [None]:
train_examples, test_examples = ag_news_dataset[0], ag_news_dataset[1]

# 1) From centralized data to federated data
Usually we would have to encapsulate our centralized dataset as numpy arrays in a Dataset, to split it for every federated client. As we have so Pluggable Datasets for *huggingface*, *torch* and *tensorflow*, we can directly create a configuration within a ``FedDatasetConfig`` object. For this case we want to split it evenly between 2 clients, that is, an iid distribution.

To apply our config to the dataset, we use ``FedDataDistribution.from_config_with_huggingface_dataset``, so we will federate the dataset as expected. A more complete description of the configuration options of ``FedDatasetConfig``to federate a dataset can be found in the documentation. Also, it is highly recommended to check if the desired dataset is supported in the ``PluggableDatasets``as it will be directly loaded to FLEXible as expected.

In [None]:
from flex.data import FedDatasetConfig, FedDataDistribution

config = FedDatasetConfig(seed=0)
config.n_clients = 2
config.replacement = False # ensure that clients do not share any data
config.client_names = ['client1', 'client2'] # Optional
flex_dataset = FedDataDistribution.from_config_with_huggingface_dataset(data=train_examples, config=config, X_columns='text', label_columns='label')

# 2) Federating a model with FLEXible

Once we've federated the dataset, we have to create the FlexPool. The FlexPool class simulates a real-time scenario for federated learning, so it is in charge of the communications across the actors. The class FlexPool will assign to each actor a role (client, aggregator, server), so they can communicate during the training phase.

Please, check the notebook about the actors (TODO: Hacer notebook actores y sus relaciones) to know more about the actors and their relationships in FLEXible.

To create a Pool of actors, we need to have a federated dataset, like we've just done, and the model to initialize in the server side, because the server will send the model to the clients so they can train the model. As we have the federated dataset (flex_dataset), we will now create the model.

In this case, we will use a model from the tensorflow hub, so we dont have to worry about coding it. We also consider a federated setup commonly know as client server architecture, where a server orchestrates the training of federated clients in every round.

In the following, we create a client server architecture and provide a function to initialize the server model.

In [None]:
from flex.model import FlexModel

from flex.pool.decorators import init_server_model
from flex.pool.decorators import deploy_server_model

In [None]:
@init_server_model
def define_model(*args):
    flex_model = FlexModel()
    tokenizer = AutoTokenizer.from_pretrained(TRANSFORMER_MODEL)
    model =  AutoModelForSequenceClassification.from_pretrained(TRANSFORMER_MODEL, num_labels=2)
    flex_model['model'] = model
    flex_model['tokenizer'] = tokenizer
    return flex_model

In [None]:
from flex.pool import FlexPool

flex_pool = FlexPool.client_server_architecture(fed_dataset=flex_dataset, init_func=define_model)

In [None]:
clients = flex_pool.clients
server = flex_pool.servers
aggregators = flex_pool.aggregators
print(f"Server node is indentified by {server.actor_ids}")
print(f"Client nodes are identified by {clients.actor_ids}")
print(f"Aggregator nodes are identified by {aggregators.actor_ids}")

@deploy_server_model is a decorator designed to copy the model from the server to the clients at each federated learning round. The function that uses it, must have at least one argument, which is the FlexModel object that stores the model at the server.

In [None]:
import copy

@deploy_server_model
def copy_server_model_to_clients(server_flex_model: FlexModel):
    flex_model = FlexModel()
    for k, v in server_flex_model.items():
        flex_model[k] = copy.deepcopy(v)
    return flex_model

server.map(copy_server_model_to_clients, clients)

Suprisingly, there is no decorator for the training process as it can be imnplemented directly. As we are using PyTorch, we have to create the PyTorch dataset that will be fed into the model.

In [None]:
# Create custom class
from torch.utils.data import Dataset as TorchDataset

class IMDbDataset(TorchDataset):
    def __init__(self, encodings, labels) -> None:
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

As we are using the *transformers* library, we can use the **Trainer** class to automatize the train process. In case you prefer to create your own train function for the model, you hace to create your train loop function. Here we show both options, one with the *Trainer* class, and one with a classic train loop for PyTorch.

In [None]:
# Using Trainer or native PyTorch.

from flex.data import Dataset
from torch.utils.data import DataLoader
from torch.optim import AdamW


def tokenize_function(texts, tokenizer):
    return tokenizer(texts, padding="max_length", truncation=True)

def train_loop(model, train_dataloader, num_epochs, device, optimizer):
    for _ in range(num_epochs):
        for batch in train_dataloader:
            optimizer.zero_grad()
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)
            outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs[0]
            loss.backward()
            optimizer.step()

def train_native_pt(client_flex_model: FlexModel, client_data: Dataset, tokenize_func, train_loop_func):
    X_data = tokenize_func(client_data.X_data.tolist(), client_flex_model['tokenizer'])
    imdb_dataset = IMDbDataset(X_data, client_data.y_data)
    imdb_loader = DataLoader(imdb_dataset, shuffle=True, batch_size=16)
    optimizer = AdamW(client_flex_model.model.parameters(), lr=5e-5)
    num_epochs = 3
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    train_loop_func(client_flex_model['model'], imdb_loader, num_epochs, device, optimizer)

clients.map(train_native_pt, tokenize_func=tokenize_function, train_loop_func=train_loop)

When the model is trained, we have to collect the weights from the clients, so we can aggregate them. At FLEXible exists a primitive to collect those weights for a neural network. You can use this funcion, or you can create your own function. Also, in FLEXible exists a funcion to set the aggregated weihts for PyTorch models.

In [None]:
from flex.pool.primitives import collect_clients_weights_pt, set_aggregated_weights_pt

In [None]:
aggregators.map(collect_clients_weights_pt, clients)

For the aggregation it is possible to implement your own aggregation function with the *aggregate_weights* decorator, or we can use the aggregators that are already implemented in FLEXible, such as FedAvg.

In [None]:
from flex.pool.aggregators import fed_avg

aggregators.map(fed_avg)

After aggregating the weights, we use the set_aggregated_weights for PyTorch, that set the aggregated weights to the server model.

In [None]:
aggregators.map(set_aggregated_weights_pt, server)

After training the model and setting the weights onto the server model, we can just evaluate it using the *evaluate_server_model* decorator. This decorator is created to test the server model using external data.

In [None]:
from flex.pool import evaluate_server_model
import evaluate

def eval_loop(model, eval_dataloader, device, metric):
    model.eval()
    for batch in eval_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        with torch.no_grad():
            outputs = model(**batch)

        logits = outputs.logits
        predictions = torch.argmax(logits, dim=-1)
        metric.add_batch(predictions=predictions, references=batch["labels"])
    print(f"Results: {metric.compute()}")

@evaluate_server_model
def evaluate_global_model(server_flex_model: FlexModel, test_data=None, tokenize_func=None, eval_func=None):
    X_data = tokenize_func(test_data.X_data.tolist()[:100], server_flex_model['tokenizer']) # Using subset, for testing purposes.
    imdb_dataset = IMDbDataset(X_data, test_data.y_data[:100]) # Using subset, for testing purposes.
    metric = evaluate.load("accuracy")
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    imdb_dataloader = DataLoader(imdb_dataset, batch_size=1, shuffle=False)
    eval_func(server_flex_model['model'], imdb_dataloader, device, metric)

test_data = Dataset.from_huggingface_dataset(test_examples, X_columns='text', label_columns='label')
server.map(evaluate_global_model, test_data=test_data, tokenize_func=tokenize_function, eval_func=eval_loop)

### Run the federated learning experiment for a few rounds

Now, we can summarize the steps provided above and run the federated experiment for multiple rounds:

In [None]:
def train_n_rounds(n_rounds, clients_per_round=2):
    pool = FlexPool.client_server_architecture(fed_dataset=flex_dataset, init_func=define_model)
    for i in range(n_rounds):
        print(f"\nRunning round: {i+1} of {n_rounds+1}")
        node_dropout = 1-(clients_per_round/len(pool.clients))
        selected_clients_pool = pool.clients.filter(node_dropout=node_dropout)
        selected_clients = selected_clients_pool.clients
        print(f"Selected clients for this round: {len(selected_clients)}")
        # Deploy the server model to the selected clients
        pool.servers.map(copy_server_model_to_clients, selected_clients)
        # Each selected client trains her model
        selected_clients.map(train_native_pt, tokenize_func=tokenize_function, train_loop_func=train_loop)
        # The aggregador collects weights from the selected clients and aggregates them
        pool.aggregators.map(collect_clients_weights_pt, selected_clients)
        # Apply FedAvg aggregator
        pool.aggregators.map(fed_avg)
        # The aggregator send its aggregated weights to the server
        pool.aggregators.map(set_aggregated_weights_pt, pool.servers)
        pool.servers.map(evaluate_global_model, test_data=test_data, tokenize_func=tokenize_function, eval_func=eval_loop)

In [None]:
train_n_rounds(n_rounds=5)

### END
Congratulations, now you know how to train a model using FLEXible for multiples rounds using the *HuggingFace* ecosystem with PyTorch as Deep Learning framawork. Remember that it's important to first deploy/initialize the model on the clients, so you can run the rounds without problem!