# Example os use of FLEXible to train a LSTM for Text Generation with a custom Dataset

## 1) Federate Dataset

In the first section we're going to federate our dataset. For this tutorial, we use Reddit clean jokes dataset to train the network.

In [1]:
from copy import deepcopy
from collections import Counter
import pandas as pd
import numpy as np

import torch
from torch import nn, optim
from torch.utils.data import DataLoader, Dataset

from flex.data import FlexDataset, FlexDataObject, FlexDatasetConfig, FlexDataDistribution
from flex.pool import FlexPool

from utils import print_function, add_torch_dataset_to_client

First we're going to load our data from the CSV.

In [2]:
# Load the data
data = pd.read_csv('reddit-cleanjokes.csv')

In [3]:
data.head()

Unnamed: 0,ID,Joke
0,1,What did the bartender say to the jumper cable...
1,2,Don't you hate jokes about German sausage? The...
2,3,Two artists had an art contest... It ended in ...
3,4,Why did the chicken cross the playground? To g...
4,5,What gun do you use to hunt a moose? A moosecut!


In [4]:
X = data['Joke'].to_numpy()

Before federating the dataset, it's necessary to create the vocabulary. The vocabulary will be a global across clients, as we can simulate that clients will use the same pre-trained embeddings such as GloVe or FastText. For using those embeddings we will use the torchtext library.

In [5]:
import torchtext
from torchtext.data.utils import get_tokenizer
from torchtext.vocab.vectors import FastText
from torchtext.vocab import build_vocab_from_iterator

In [6]:
fasttext = FastText(language="en")

In [7]:
tokenizer = get_tokenizer('basic_english')
def yield_sentences(text):
    for sentence in text:
        yield tokenizer(sentence)

In [8]:
vocab = build_vocab_from_iterator(iterator=yield_sentences(X), min_freq=2)
vocab.append_token('UNK')

After creating the vocab, we're going to load the Fasttext embeddings

In [9]:
ret = fasttext.get_vecs_by_tokens(["Hi", "how", "are", "you"], lower_case_backup=True)
# ret = fasttext.get_vecs_by_tokens("Hi, how are you", lower_case_backup=True)
fasttext.vectors

tensor([[-2.3167e-02, -4.2483e-03, -1.0572e-01,  ...,  8.9398e-02,
         -1.5900e-02,  1.4866e-01],
        [-1.1112e-01, -1.3859e-03, -1.7780e-01,  ...,  6.3374e-02,
         -1.2161e-01,  3.9339e-02],
        [-6.5334e-02, -9.3031e-02, -1.7571e-02,  ...,  1.6642e-01,
         -1.3079e-01,  3.5397e-02],
        ...,
        [-1.3854e-02, -3.6073e-01,  2.3041e-01,  ...,  5.6179e-01,
          1.9784e+00, -2.2832e-01],
        [ 7.0447e-02, -1.4072e-01, -6.5624e-01,  ...,  9.5864e-02,
          3.6397e-01,  3.0378e-02],
        [-2.3073e-01,  6.0873e-02, -1.9777e-01,  ...,  5.2283e-01,
         -1.1123e-01,  2.1186e-01]])

Now, we create the Dataset class that will use each client to train the model.

In [10]:
class TextDataset(Dataset):
    def __init__(self, text, tokenizer, itos, stoi, sequence_len):
        text = ' '.join([sentence[0] for sentence in text])
        # self.text = ' '.join([tokenizer(sentence[0]) for sentence in text]))
        self.text = tokenizer(text)
        self.tokenizer = tokenizer
        self.vocab_itos = itos
        self.vocab_stoi = stoi
        self.sequence_length = sequence_len
        # self.words_indexes = [[stoi.get(word, stoi['UNK']) for word in sentence] for sentence in text]
        self.words_indexes = [stoi.get(word, stoi.get('UNK')) for word in self.text]

    def __len__(self):
        # return len(self.text)
        return len(self.words_indexes) - self.sequence_length

    def __getitem__(self, index):
        return (
            torch.tensor(self.words_indexes[index:index+self.sequence_length]),
            torch.tensor(self.words_indexes[index+1:index+self.sequence_length+1]),
        )

In [11]:
vocab.get_stoi().get('hearts', vocab.get_stoi().get('UNK'))

1884

--------------------------------------------------
Stop until bug in FlexDataDistribution.from_cofig() it's solved. 

Bug: Can't federate if the array has only one feature

Temporal solution: Change in flex_data_distribution in function __sample_with_weights, I added at line 177:
sub_features_indices = None # slice(
        #     None
        # )  # Default slice for features, it includes all the features

Remove that 3 lines to back to normal.

--------------------------------------------------

In [12]:
cdata = FlexDataObject(X)
config = FlexDatasetConfig(seed=0)
config.n_clients = 2
config.replacement = False # ensure that clients do not share any data
config.client_names = ['client1', 'client2']
# config.weights = [0.2] * config.n_clients # each client has only 20% of its assigned class
config.weights = None
fld = FlexDataDistribution.from_config(cdata=cdata, config=config)
#fld = FlexDataDistribution.iid_distribution(cdata=cdata)

To make each client have the Dataset, we have to use the map function from the FlexDataset class. With the map function, we will make the client have a Dataset internally, so we can train the model for each client with it's own data. 

The map function recieve a function to apply to each client, so now we create the function that we want to apply.

In [13]:
def add_torch_dataset_to_client_2(client, *args, **kwargs):
    """Function to create a dataset for each client. We keep the 
    X_data property as we don't want to change the raw text, but
    it should be changed for less memory usage.

    Args:
        client (FlexDataObject): Client to create a TextDataset

    Returns:
        FlexDataObject: Client with a TextDataset in her data
    """
    new_client = deepcopy(client)
    new_client_dataset = TextDataset(new_client.X_data, kwargs['tokenizer'], 
                                        kwargs['itos'], kwargs['stoi'], kwargs['sequence_len'])
    new_client.dataset = new_client_dataset

    return new_client

In [14]:
client = fld['client1']

In [15]:
list(client.X_data)[0][0]

'What do beef hearts smell like? Honey.'

In [16]:
new_client = add_torch_dataset_to_client_2(fld['client1'], tokenizer=tokenizer, itos=vocab.get_itos(), stoi=vocab.get_stoi(), sequence_len=4)

In [17]:
new_client.dataset

<__main__.TextDataset at 0x29009d520>

In [18]:
new_fld = fld.map(num_proc=2, func=add_torch_dataset_to_client, tokenizer=tokenizer,
                                                                itos=vocab.get_itos(),
                                                                stoi=vocab.get_stoi(),
                                                                sequence_len=4)

In [19]:
new_fld

{'client1': FlexDataObject(X_data=array([['What do beef hearts smell like? Honey.'],
       ["Why did Trump insist on Hillary Clinton as Secretary of state? He doesn't believe women should get above secretary"],
       ['Why did the Buddhist monk refuse Novocain? Because he wanted to transcend dental medication.'],
       ['What di you call a snowman in may? A puddle!'],
       ["My buddy said he'd give his right arm to be ambidextrous I can only admire such dedication."],
       ['Why did Woodrow Wilson take a long time to turn around? Because he could only make 14 point turns.'],
       ["What's an oven's favorite comedy routine? Deadpan."],
       ['I made half a cup of tea the other day... It was so nice I had two.'],
       ["What's a baker's biggest fear? Something going a-rye while they're raisin' bread."],
       ['What is invisible and smells like carrots? Bunny Farts.'],
       ['I was addicted to the hokey pokey but I turned myself around.'],
       ['Difference between a de

In [20]:
new_federated_dataset = FlexDataset({
    client: add_torch_dataset_to_client(fld[client], tokenizer=tokenizer, itos=vocab.get_itos(),
                                stoi=vocab.get_stoi(), sequence_len=100) 
    for client in fld
})

## 2) Create the architecture


Once we've federated our dataset, it's time to create the federated environment. In this case, we will use the FlexPool class to create the actors. The FlexPool class simulates a real-time scenario for federated learning, so we have to create each actor and it's role during the creationg and training of the model.

To initialize the Pool of actors we need a federated dataset, the *fld* variable we've created. We can use the constructor or use the functions given to create a fixed architecture. In this tutorial we will use a client-server architecture, so we will use the function client_server_architecture from the FlexPool class.

In [21]:
pool = FlexPool.client_server_architecture(fed_dataset=new_fld)

Now we have created a pool of actors that is composed of:
- Clients: The clients have the client-role and can access the data of the FlexDataset if they have the same ID.
- Server-aggregator: The client-server architecture adds a new actor that has the role of the server, so it can orchestate the training phase, and the aggregator, so it can aggregate the weights.

The pool of actors has some communication restrictions, as indicated in the documentation, so to make it easy to understand how the pool works, we can separate actors in different pools based on the role. In our case, we can get two subpools, one with the clients, and one with the server (that also acts as aggregator).

In [22]:
clients = pool.clients
server = pool.servers
# Lets take a look at the two pools we've just created.
print(f"Pool of clients: {clients._actors}")
print(f"Pool of server-aggregator: {server._actors}")

Pool of clients: {'client1': <FlexRole.client: 1>, 'client2': <FlexRole.client: 1>}
Pool of server-aggregator: {'server_5171426240': <FlexRole.server_aggregator: 6>}


As it's shown in the above cell, the server has two roles, the server one and the aggregator, so she can acts a aggregator too.

Now the we have the pools with the actors, we can start the training phase.

# 3) Set up the training round

The training phase has 4 phases:
- Initialize/Deploy model
- Train model
- Aggregate weights
- Evaluate model

This functions aren't available in the FlexPool class, as they will be different for each model, so the user must create the functions and apply them to the actors using the *map_procedure* function from FlexPool.

### 3.1 Init/deploy models

The fist step to init the training phase is to deplay the model across the clients that will train the model. In this example we only have two clients, so we are going to use both clients to train the model.

Once we've federated our dataset, it's time to create the model to train. Here we will use a simple LSTM model.

The model will have a layer of embeddings generated from it's own vocab.

In [23]:
class Model(nn.Module):
    def __init__(self, n_vocab):
        super(Model, self).__init__()
        self.lstm_size = 128
        self.embedding_dim = 128
        self.num_layers = 3

        # n_vocab = len(dataset.uniq_words)
        self.embedding = nn.Embedding(
            num_embeddings=n_vocab,
            embedding_dim=self.embedding_dim,
        )
        self.lstm = nn.LSTM(
            input_size=self.lstm_size,
            hidden_size=self.lstm_size,
            num_layers=self.num_layers,
            dropout=0.2,
        )
        self.fc = nn.Linear(self.lstm_size, n_vocab)

    def forward(self, x, prev_state):
        embed = self.embedding(x)
        output, state = self.lstm(embed, prev_state)
        logits = self.fc(output)
        return logits, state

    def init_state(self, sequence_length):
        return (torch.zeros(self.num_layers, sequence_length, self.lstm_size),
                torch.zeros(self.num_layers, sequence_length, self.lstm_size))

Once the model is defined, we have to define the function that will initialize the model for each client that participate in the training phase. After defining this function, we can use the function *map_procedure* from FlexPool, to 

In [24]:
server._models = {serv: Model(n_vocab=len(vocab)) for serv in server._actors}

In [25]:
def initialize_model(server_model, clients_models, *args, **kwargs):
    for client_model in clients_models:
        clients_models[client_model] = deepcopy(server_model)

In [26]:
server.map_procedure(initialize_model, clients) # As we have only one server, we take the first model only

[None]

In [27]:
clients._models

defaultdict(None,
            {'client1': Model(
               (embedding): Embedding(1885, 128)
               (lstm): LSTM(128, 128, num_layers=3, dropout=0.2)
               (fc): Linear(in_features=128, out_features=1885, bias=True)
             ),
             'client2': Model(
               (embedding): Embedding(1885, 128)
               (lstm): LSTM(128, 128, num_layers=3, dropout=0.2)
               (fc): Linear(in_features=128, out_features=1885, bias=True)
             )})

Now we have initilized the model for each client, and we have to train the model. To train the model we have to prepare the train function.

In [32]:
def train(data, model, *args, **kwargs):
    # Create the torch dataset for the client
    client_dataset = TextDataset(data.X_data, kwargs['tokenizer'], 
                                kwargs['itos'], kwargs['stoi'], kwargs['sequence_len'])
    # Set the model to train
    model.train()
    # Create the DataLoader, loss function and optimizer
    dataloader = DataLoader(client_dataset, batch_size=kwargs['batch_size'])
    criterion = nn.CrossEntropyLoss() # kwargs['criterion']
    optimizer = optim.Adam(model.parameters(), lr=0.001) # kwargs['optimizer']
    # Loop the epochs to train the model
    for epoch in range(kwargs['epochs']):
        # Get the initial state
        state_h, state_c = model.init_state(data.dataset.sequence_length)
        # Iterate across the batches
        for batch, (x, y) in enumerate(dataloader):
            optimizer.zero_grad()
            y_pred, (state_h, state_c) = model(x, (state_h, state_c))
            loss = criterion(y_pred.transpose(1,2), y)

            state_h = state_h.detach()
            state_c = state_c.detach()

            loss.backward()
            optimizer.step()

            print({ 'epoch': epoch, 'batch': batch, 'loss': loss.item() })

In [33]:
clients.map_procedure(train, batch_size=256, epochs=10, tokenizer=tokenizer, itos=vocab.get_itos(), stoi=vocab.get_stoi(), sequence_len=4)

{'epoch': 0, 'batch': 0, 'loss': 6.27843713760376}
{'epoch': 0, 'batch': 1, 'loss': 5.87345027923584}
{'epoch': 0, 'batch': 2, 'loss': 5.7857160568237305}
{'epoch': 0, 'batch': 3, 'loss': 5.910329818725586}
{'epoch': 0, 'batch': 4, 'loss': 5.6100287437438965}
{'epoch': 0, 'batch': 5, 'loss': 5.764122486114502}
{'epoch': 0, 'batch': 6, 'loss': 5.625349521636963}
{'epoch': 0, 'batch': 7, 'loss': 5.492345333099365}
{'epoch': 0, 'batch': 8, 'loss': 5.446799278259277}
{'epoch': 0, 'batch': 9, 'loss': 5.46243143081665}
{'epoch': 0, 'batch': 10, 'loss': 5.273279190063477}
{'epoch': 0, 'batch': 11, 'loss': 5.608667373657227}
{'epoch': 0, 'batch': 12, 'loss': 5.499441146850586}
{'epoch': 0, 'batch': 13, 'loss': 5.620606899261475}
{'epoch': 0, 'batch': 14, 'loss': 5.681468486785889}
{'epoch': 0, 'batch': 15, 'loss': 5.697798728942871}
{'epoch': 0, 'batch': 16, 'loss': 5.394366264343262}
{'epoch': 0, 'batch': 17, 'loss': 5.517154216766357}
{'epoch': 0, 'batch': 18, 'loss': 5.5854058265686035}
{'e

[None, None]

Now we have traint all the models available for training, so it's time to aggregate them. We have to create a function and then apply it with the *map_procedure* from FlexPool. In this case, the source pool will be the clients, and the destiny pool the aggregator. 

To get the weights of a model on pytorch, we can use model.parameters(), or model.named_parameters() if we want to get the layer's name too. 

In [None]:
def aggregate(orig_models, dst_model, *args, **kwargs):
    """Function that aggregate the weights

    Args:
        orig_models (nn.Module): Original model traint
        dst_model (nn.Module): Destiny model (aggregator model)
    """
    pass