<a href="https://colab.research.google.com/gist/tony92151/c952ae59715d439533ced2c4621d28e7/federated-learning-with-pysyft-and-pytorch-colab-demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Useful imports

In [0]:
import numpy as np
from sklearn.metrics import roc_auc_score

import torch
from torch import nn, optim
from torch.utils.data import TensorDataset, DataLoader

import warnings

warnings.filterwarnings("ignore")

In [0]:
inputs = np.load('data/inputs.npy')
labels = np.load('data/labels.npy')

In [11]:
print(inputs[:2])
print(labels[:2])

[[   0    0    0    0    0    0    0    0    0    0 4202 6580  241 4826
  7525 1431 3677 5707 8491 3171 7978 4060 6249 3616 4345  771 4445 7931
  1315 5303]
 [   0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0 4332 3155 7400  210
  4718 8098]]
[0 0]


In [0]:
VOCAB_SIZE = int(inputs.max()) + 1

In [0]:
# Training params
EPOCHS = 15
CLIP = 5 # gradient clipping - to avoid gradient explosion (frequent in RNNs)
lr = 0.1
BATCH_SIZE = 32

# Model params
EMBEDDING_DIM = 50
HIDDEN_DIM = 10
DROPOUT = 0.2

In [14]:
import syft as sy

In [0]:
labels = torch.tensor(labels)
inputs = torch.tensor(inputs)

# splitting training and test data
pct_test = 0.2

train_labels = labels[:-int(len(labels)*pct_test)]
train_inputs = inputs[:-int(len(labels)*pct_test)]

test_labels = labels[-int(len(labels)*pct_test):]
test_inputs = inputs[-int(len(labels)*pct_test):]

In [0]:
# Hook that extends the Pytorch library to enable all computations with pointers of tensors sent to other workers
hook = sy.TorchHook(torch)

# Creating 2 virtual workers
bob = sy.VirtualWorker(hook, id="bob")
anne = sy.VirtualWorker(hook, id="anne")

# threshold indexes for dataset split (one half for Bob, other half for Anne)
train_idx = int(len(train_labels)/2)
test_idx = int(len(test_labels)/2)

# Sending toy datasets to virtual workers
bob_train_dataset = sy.BaseDataset(train_inputs[:train_idx], train_labels[:train_idx]).send(bob)
anne_train_dataset = sy.BaseDataset(train_inputs[train_idx:], train_labels[train_idx:]).send(anne)
bob_test_dataset = sy.BaseDataset(test_inputs[:test_idx], test_labels[:test_idx]).send(bob)
anne_test_dataset = sy.BaseDataset(test_inputs[test_idx:], test_labels[test_idx:]).send(anne)

# Creating federated datasets, an extension of Pytorch TensorDataset class
federated_train_dataset = sy.FederatedDataset([bob_train_dataset, anne_train_dataset])
federated_test_dataset = sy.FederatedDataset([bob_test_dataset, anne_test_dataset])

# Creating federated dataloaders, an extension of Pytorch DataLoader class
federated_train_loader = sy.FederatedDataLoader(federated_train_dataset, shuffle=True, batch_size=BATCH_SIZE)
federated_test_loader = sy.FederatedDataLoader(federated_test_dataset, shuffle=False, batch_size=BATCH_SIZE)

### Creating simple GRU (1-layer) model with sigmoid activation for classification task

Unfortunatelly, the current version of PySyft does not support the RNNs modules of PyTorch yet. However, I was able to handcraft a simple GRU network with linear layers for this project. 

As the focus of this notebook is the usage of PySyft, I will skip the construction of the model. If you are interested in how I built the model, you can check it out on [handcrafted_GRU.py](https://github.com/andrelmfarias/Private-AI/blob/master/Federated_Learning/handcrafted_GRU.py).

In [0]:
from handcrafted_GRU import GRU

In [0]:
# Initiating the model
model = GRU(vocab_size=VOCAB_SIZE, hidden_dim=HIDDEN_DIM, embedding_dim=EMBEDDING_DIM, dropout=DROPOUT)

### Training

For now, PySyft does not support optimizers with momentum. Therefore, we are going to stick with the classical [Stochastic Gradient Descent](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD) optimizer.

As our task consists of a binary classification, we are going to use the [Binary Cross Entropy Loss](https://pytorch.org/docs/stable/nn.html#torch.nn.BCELoss).

In [0]:
# Defining loss and optimizer
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=lr)

For each epoch we are going to compute the training and validations losses, as well as the [Area Under the ROC Curve](https://scikit-learn.org/stable/modules/model_evaluation.html#roc-metrics) score due to the fact that the target dataset is unbalaced (only 13% of labels are positive).

In [20]:
for e in range(EPOCHS):
    
    ######### Training ##########
    
    losses = []
    # Batch loop
    for inputs, labels in federated_train_loader:
        # Location of current batch
        worker = inputs.location
        # Initialize hidden state and send it to worker
        h = torch.Tensor(np.zeros((BATCH_SIZE, HIDDEN_DIM))).send(worker)
        # Send model to current worker
        model.send(worker)
        # Setting accumulated gradients to zero before backward step
        optimizer.zero_grad()
        # Output from the model
        output, _ = model(inputs, h)
        # Calculate the loss and perform backprop
        loss = criterion(output.squeeze(), labels.float())
        loss.backward()
        # Clipping the gradient to avoid explosion
        nn.utils.clip_grad_norm_(model.parameters(), CLIP)
        # Backpropagation step
        optimizer.step() 
        # Get the model back to the local worker
        model.get()
        losses.append(loss.get())
    
    ######## Evaluation ##########
    
    # Model in evaluation mode
    model.eval()

    with torch.no_grad():
        test_preds = []
        test_labels_list = []
        eval_losses = []

        for inputs, labels in federated_test_loader:
            # get current location
            worker = inputs.location
            # Initialize hidden state and send it to worker
            h = torch.Tensor(np.zeros((BATCH_SIZE, HIDDEN_DIM))).send(worker)    
            # Send model to worker
            model.send(worker)
            
            output, _ = model(inputs, h)
            loss = criterion(output.squeeze(), labels.float())
            eval_losses.append(loss.get())
            preds = output.squeeze().get()
            test_preds += list(preds.numpy())
            test_labels_list += list(labels.get().numpy().astype(int))
            # Get the model back to the local worker
            model.get()
        
        score = roc_auc_score(test_labels_list, test_preds)
    
    print("Epoch {}/{}...  \
    AUC: {:.3%}...  \
    Training loss: {:.5f}...  \
    Validation loss: {:.5f}".format(e+1, EPOCHS, score, sum(losses)/len(losses), sum(eval_losses)/len(eval_losses)))
    
    model.train()

Epoch 1/15...      AUC: 72.618%...      Training loss: 0.40979...      Validation loss: 0.35500
Epoch 2/15...      AUC: 79.190%...      Training loss: 0.35325...      Validation loss: 0.32535
Epoch 3/15...      AUC: 84.579%...      Training loss: 0.31621...      Validation loss: 0.28853
Epoch 4/15...      AUC: 89.720%...      Training loss: 0.28319...      Validation loss: 0.24455
Epoch 5/15...      AUC: 93.667%...      Training loss: 0.22717...      Validation loss: 0.19267
Epoch 6/15...      AUC: 95.531%...      Training loss: 0.17956...      Validation loss: 0.15372
Epoch 7/15...      AUC: 96.034%...      Training loss: 0.14125...      Validation loss: 0.13412
Epoch 8/15...      AUC: 96.384%...      Training loss: 0.11721...      Validation loss: 0.12582
Epoch 9/15...      AUC: 96.709%...      Training loss: 0.10556...      Validation loss: 0.11969
Epoch 10/15...      AUC: 96.954%...      Training loss: 0.08936...      Validation loss: 0.11111
Epoch 11/15...      AUC: 97.019%...    

## Conclusion

We can see that with the PySyft library and its PyTorch extension, we can perform operations with tensor pointers such as we can do with PyTorch API (but for some limitations that are still to be addressed). 

Thanks to this, we were able to train spam detector model without having any access to the remote and private data: for each batch we sent the model to the current remote worker and got it back to the local machine before sending it to the worker of the next batch.

We can also notice that this federated training did not harm the performance of the model as both losses reduced at each epoch as expected and the final AUC score on the test data was above 97.5%.

There is however one limitation of this method: by getting the model back we can still have access to some private information. 
Let's say Bob had only one SMS on his machine. When we get the model back, we can just check which embeddings of the model changed and we will know which were the tokens (words) of the SMS.

In order to address this issue, there are two solutions: Differential Privacy and Secured Multi-Party Computation (SMPC). Differential Privacy would be used to make sure the model does not give access to some private information. SMPC, which is one kind of Encrypted Computation, in return allows you to send the model privately so that the remote workers which have the data cannot see the weights you are using.

I will show how can we perform these techniques with PySyft in a next tutorial.

Et voilà!