<a href="https://colab.research.google.com/github/Berenice2018/DeepLearning/blob/master/Section_3_Securing_Federated_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Section: Securing Federated Learning

- Lesson 1: Trusted Aggregator
- Lesson 2: Intro to Additive Secret Sharing
- Lesson 3: Intro to Fixed Precision Encoding
- Lesson 4: Secret Sharing + Fixed Precision in PySyft
- Final Project: Federated Learning wtih Encrypted Gradient Aggregation

# Lesson: Federated Learning with a Trusted Aggregator

In the last section, we learned how to train a model on a distributed dataset using Federated Learning. In particular, the last project aggregated gradients directly from one data owner to another. 

However, while in some cases it could be ideal to do this, what would be even better is to be able to choose a neutral third party to perform the aggregation.

As it turns out, we can use the same tools we used previously to accomplish this.

# Project: Federated Learning with a Trusted Aggregator

In [0]:
import time
import datetime
import os
import sys

In [0]:
#!pip install syft

!pip install tf-encrypted
! URL="https://github.com/Berenice2018/PySyft-Bc.git" && FOLDER="PySyft" && if [ ! -d $FOLDER ]; then git clone -b master --single-branch $URL; else (cd $FOLDER && git pull $URL && cd ..); fi;

!cd PySyft-Bc; python setup.py install

module_path = os.path.abspath(os.path.join('./PySyft-Bc'))
if module_path not in sys.path:
     sys.path.append(module_path)
    
!pip install --upgrade --force-reinstall lz4
!pip install --upgrade --force-reinstall websocket
!pip install --upgrade --force-reinstall websockets
!pip install --upgrade --force-reinstall zstd

In [3]:
!pip install multiprocess



In [0]:
from multiprocess import Pool, TimeoutError, cpu_count

In [0]:
import numpy as np # linear algebra
#import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torchvision import datasets, models, transforms
from torch.utils.data.sampler import SubsetRandomSampler

import syft as sy


In [65]:
hook = sy.TorchHook(torch)

ada = sy.VirtualWorker(hook, 'ada')
bob = sy.VirtualWorker(hook, 'bob')
secure_worker = sy.VirtualWorker(hook, 'secure_worker')

# create data owners
bob.add_workers([ada, secure_worker])
ada.add_workers([bob, secure_worker])
secure_worker.add_workers([ada, bob])


W0731 13:02:42.512239 140050521552768 hook.py:98] Torch was already hooked... skipping hooking process
W0731 13:02:42.514997 140050521552768 base.py:628] Worker ada already exists. Replacing old worker which could cause                     unexpected behavior
W0731 13:02:42.517075 140050521552768 base.py:628] Worker secure_worker already exists. Replacing old worker which could cause                     unexpected behavior
W0731 13:02:42.519141 140050521552768 base.py:628] Worker bob already exists. Replacing old worker which could cause                     unexpected behavior
W0731 13:02:42.522094 140050521552768 base.py:628] Worker secure_worker already exists. Replacing old worker which could cause                     unexpected behavior
W0731 13:02:42.524536 140050521552768 base.py:628] Worker ada already exists. Replacing old worker which could cause                     unexpected behavior
W0731 13:02:42.531519 140050521552768 base.py:628] Worker bob already exists. Replacing old 

<VirtualWorker id:secure_worker #objects:0>

In [66]:
ada.clear_objects()
bob.clear_objects()
secure_worker.clear_objects()

<VirtualWorker id:secure_worker #objects:0>

In [0]:
def get_datasets():
    print('creating loaders')
    # define the transform
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5, ), (0.5, ))
    ])

    # load the datasets
    fulltrainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)
    testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)

     # split the dataset
    ada_size = int(len(fulltrainset)* 0.5)
    bob_size = len(fulltrainset) - ada_size
    ada_set, bob_set = torch.utils.data.random_split(fulltrainset, [ada_size, bob_size])
    ada_set = ada_set.dataset
    bob_set = bob_set.dataset
    
    return ada_set, bob_set, testset

In [31]:
# Create the data loaders, federated PySyft loader
#datasets.ImageFolder.federate = get_federated_dataset

adaset, bobset, validset = get_datasets()

adas_train_loader = torch.utils.data.DataLoader(adaset, batch_size=32, shuffle=True, num_workers=0)
bobs_train_loader = torch.utils.data.DataLoader(bobset, batch_size=32, shuffle=True, num_workers=0)

valid_loader = torch.utils.data.DataLoader(validset, batch_size=32, shuffle=True, num_workers=0)
 

creating loaders


In [0]:
# sanity check
print(adaset.data.__getitem__(0))

In [0]:
all_workers = [ada, bob]
adas_data_ptr = adaset.data.send(ada)
adas_target_ptr = adaset.targets.send(ada)


bobs_data_ptr = bobset.data.send(bob)
bobs_target_ptr = bobset.targets.send(bob)

In [0]:
#for idx, (data, target) in enumerate( adas_train_loader.federated_dataset['ada']):

# check that our trainloader returns a pointer to a batch, and that transformations are applied
#data, target = next(iter(adas_train_loader.federated_dataset['ada']))
#print(data)


In [63]:

print(f'objects of ada= {len(ada._objects)}, bob= {len(bob._objects)}, secure= {len(secure_worker._objects)}')


objects of ada= 2, bob= 2, secure= 0


In [0]:
# Helper functions for printing oput training progress data
def print_epoch_start_stats(e_start, e_end, current_lr, current_vmin):

    print('*** Epoch [{}/{}]: Training with LR [{:.6f}], current VLoss Min [{:.4f}]'.format(
    e_start, e_end, current_lr, current_vmin))

def print_epoch_end_stats(train_loss, valid_loss, valid_acc, epoch_time):

    print('   Train loss: \t{:.6f}'.format(train_loss))
    print('   Valid loss: \t{:.6f}'.format(valid_loss))
    print('   Valid acc: \t{:.6f}'.format(valid_acc))
    print('*** Epoch completed in {:.0f}m {:.0f}s'.format(epoch_time // 60, epoch_time % 60))   

In [0]:
# helper functions
import datetime

def get_time():
      hour = datetime.datetime.today().hour +2
      minute = datetime.datetime.today().minute
      second = datetime.datetime.today().second
      return hour, minute, second

def train_epoch(worker_name, model, data_ptr, target_ptr, 
                criterion, optimizer, train_on_gpu=False):
    # initialize variables to monitor training and validation loss
    train_loss = 0.0
    train_accuracy = 0.0
    correct = 0.0
    total = 0.0
    
    # clear the gradients of all optimized variables
    optimizer.zero_grad()        


    ## find the loss and update the model parameters accordingly
    output = model(data_ptr)
    loss = criterion(output, target_ptr)
    loss.backward()
    optimizer.step()

    current_loss = loss.get().data

    # get the loss per batch and accumulate
    train_loss += current_loss.item()

    # get the class, highest probability
    probabilities = torch.exp(output)
    _, top_class = probabilities.topk(1, dim=1)

    # check if the predicted class is correct
    equals = top_class == target.view(*top_class.shape)

    #train_accuracy += torch.mean(equals.type(torch.FloatTensor))
    train_accuracy += torch.mean(torch.tensor(equals))
    

    #print('worker {} train loss= {:.6f}, train acc= {:.6f}'
          #.format(worker_name, train_loss, train_accuracy))
    return train_loss, train_accuracy


def validate_epoch(model, dataloader, criterion, train_on_gpu=False):
    valid_loss = 0.0
    valid_accuracy = 0.0
    correct = 0.0
    total = 0.0
    
    with torch.no_grad():
        #for idx, (data, target) in enumerate(adas_train_loader.federated_dataset[worker_name]):
        for batch_idx, (data, target) in enumerate(dataloader):
            # move to GPU
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            ## update the average validation loss
            output = model(data)
            loss = criterion(output,target)
            
            valid_loss += loss.item()

            ps = torch.exp(output)
            _ , top_class = ps.topk(1,dim = 1)
            #_, top_class = torch.max(ps, dim=1)
            equals = top_class == target.view(*top_class.shape) # shape is (batch size x 1)
            valid_accuracy += torch.mean(equals.type(torch.FloatTensor))
            
            #pred = output.argmax(1, keepdim=True) # get the index of the max log-probability 
            #correct += pred.eq(target.view_as(pred)).sum().item()

    return valid_loss, valid_accuracy

In [0]:
def train_my_model(n_epochs, workername, model, data_ptr, target_ptr,
                   optimizer, criterion, scheduler, use_cuda=False):    
    #valid_losses = []
    #train_losses = []
    #valid_accuracies = []
    
    
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    
    for epoch in range(n_epochs):
        
         # initialize variables to monitor training and validation loss
        training_loss = 0.0
        training_accuracy = 0.0
    
        #if scheduler is not None:
          #scheduler.step()
        
        ###################
        # train the model #
        #train_epoch(worker_name, model, dataloader, criterion, optimizer, train_on_gpu=False):
        model.train()
        training_loss, training_accuracy = train_epoch(
            workername, model, data_ptr, target_ptr, criterion, optimizer, use_cuda)
    
        
        ######################    
        # validate the model #
        model.eval()
        validation_loss = validate_epoch(
            model, loaders[1], criterion, use_cuda) #validation_accuracy
        
        #if scheduler is not None:
          #scheduler.step(validation_loss)
        
        ###### print training/validation statistics 
        # calculate the average loss per epoch
        training_loss = training_loss/len_train_loader
        #train_losses.append(training_loss)
        
        #training_accuracy = training_accuracy/len_train_loader
        
        validation_loss = validation_loss/len_valid_loader
        #valid_losses.append(validation_loss)
        
        validation_accuracy = validation_accuracy/len_valid_loader
        #valid_accuracies.append(validation_accuracy)
        
        #hour, minute, second = get_time()
        print('Worker {}, Epoch: {} \tTrain. Loss: {:.6f} \tValid. Loss: {:.6f} \t Accur.: {:.10f}'.format(
                  workername,
                  epoch,
                  training_loss,
                  validation_loss,
                  validation_accuracy ))
      
    
    return model, training_loss, validation_loss, validation_accuracy

In [0]:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(784, 512)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        print('input {}, {}'.format(x.size(), x.dim()))
        x = x.view(-1, 784)
        x = x.unsqueeze(0)
        print('after x.view {}, {}'.format(x.size(), x.dim()))
        x = self.fc1(x)
        print('fc1 {}, {}'.format(x.size(), x.dim()))
        x = F.relu(x)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


class Arguments():
    def __init__(self):
        self.batch_size = 64
        self.test_batch_size = 1000
        self.epochs = 2
        self.lr = 0.01
        self.momentum = 0.5
        self.no_cuda = False
        self.seed = 1
        self.log_interval = 10
        self.save_model = False

args = Arguments()

use_cuda = not args.no_cuda and torch.cuda.is_available()

torch.manual_seed(args.seed)

device = torch.device("cuda" if use_cuda else "cpu")

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}




# instantiate the model
model = Model()

#scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience = 4)
criterion = nn.NLLLoss()


In [67]:
#test
ada.clear_objects()
bob.clear_objects()
secure_worker.clear_objects()


# instantiate the model
model = Model()
criterion = nn.NLLLoss()

# load the datasets
# define the transform
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, ), (0.5, ))
])

fulltrainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)

 # split the dataset
ada_size = int(len(fulltrainset)* 0.5)
bob_size = len(fulltrainset) - ada_size
ada_set, bob_set = torch.utils.data.random_split(fulltrainset, [ada_size, bob_size])
bobset = bob_set.dataset
adaset = ada_set.dataset
print(len(ada_set), len(bob_set))

# send to worker
bobs_data_ptr = bobset.data.send(bob)
bobs_target_ptr = bobset.targets.send(bob)

# send a copy of the current model to Ada and Bob so that each trains on its own dataset.
bobs_model = model.copy().send(bob)

# momentum is not supported by PySyft at the moment
bobs_optim = optim.SGD(params=bobs_model.parameters(), lr=0.01) 

## find the loss and update the model parameters accordingly
output = bobs_model(bobs_data_ptr)
bobs_loss = criterion(output, bobs_target_ptr)
bobs_loss.backward()
bobs_optim.step()

bobs_loss = bobs_loss.get()
print(bobs_loss)

5000 5000
input torch.Size([0]), 3
after x.view torch.Size([0]), 3


RuntimeError: ignored

In [0]:
print('Training started at ', get_time())
loaders = [adas_train_loader, bobs_train_loader, valid_loader]

avr_train_losses = []
avr_valid_losses = []
avr_valid_accuracies = []
  
for a_iter in range(2):
    # send a copy of the current model to Ada and Bob so that each trains on its own dataset.
    bobs_model = model.copy().send(bob)
    adas_model = model.copy().send(ada)
    
    # momentum is not supported by PySyft at the moment
    bobs_optim = optim.SGD(params=bobs_model.parameters(), lr=args.lr) 
    adas_optim = optim.SGD(params=adas_model.parameters(), lr=args.lr)
        
#def train_my_model(n_epochs, loaders, workername, model, optimizer, criterion, scheduler, use_cuda=False):    
    #start training of each model
    trained_ada, ada_trainloss, ada_validloss, ada_accuracy = train_my_model(
        args.epochs, 'ada', adas_model, adas_data_ptr, adas_target_ptr,
        adas_optim, criterion, None, use_cuda=False)
    
    trained_bob, bob_trainloss, bob_validloss, bob_accuracy = train_my_model(
        args.epochs, 'bob', bobs_model, bobs_data_ptr, bobs_target_ptr,
        bobs_optim, criterion, None, use_cuda=False)

    # append for plots
    avr_train_losses.append((ada_trainloss + bob_trainloss) *0.5)
    avr_valid_losses.append((ada_validloss + bob_validloss) *0.5)
    avr_valid_accuracies.append((ada_accuracy + bob_accuracy) * 0.5)
    
    
    # Let Ada and Bob send their model to the secure (trusted) server.
    alices_model.move(secure_worker)
    bobs_model.move(secure_worker)
    
    # average ada's and bob's trained models together
    # then use this to set the values for our global "model".
    with torch.no_grad():
      model.weight.set_(
          (torch.sum(trained_ada.weight.data, trained_bob.weight.data) * 0.5).get()
      )
      
      model.bias.set_(
          (torch.sum(trained_ada.bias.data, trained_bob.bias.data) * 0.5).get()
      )
      
    print('Finished iteration {}'.format(a_iter))
    
##### visualize
plot_loss_acc(iterations, avr_train_losses, avr_valid_losses, avr_valid_accuracies)
  
  

# Lesson: Intro to Additive Secret Sharing

While being able to have a trusted third party to perform the aggregation is certainly nice, in an ideal setting we wouldn't have to trust anyone at all. This is where Cryptography can provide an interesting alterantive. 

Specifically, we're going to be looking at a simple protocol for Secure Multi-Party Computation called Additive Secret Sharing. This protocol will allow multiple parties (of size 3 or more) to aggregate their gradients without the use of a trusted 3rd party to perform the aggregation. In other words, we can add 3 numbers together from 3 different people without anyone ever learning the inputs of any other actors.

Let's start by considering the number 5, which we'll put into a varible x

In [0]:
x = 5

Let's say we wanted to SHARE the ownership of this number between two people, Alice and Bob. We could split this number into two shares, 2, and 3, and give one to Alice and one to Bob

In [0]:
bob_x_share = 2
alice_x_share = 3

decrypted_x = bob_x_share + alice_x_share
decrypted_x

5

Note that neither Bob nor Alice know the value of x. They only know the value of their own SHARE of x. Thus, the true value of X is hidden (i.e., encrypted). 

The truly amazing thing, however, is that Alice and Bob can still compute using this value! They can perform arithmetic over the hidden value! Let's say Bob and Alice wanted to multiply this value by 2! If each of them multiplied their respective share by 2, then the hidden number between them is also multiplied! Check it out!

In [0]:
bob_x_share = 2 * 2
alice_x_share = 3 * 2

decrypted_x = bob_x_share + alice_x_share
decrypted_x

10

This even works for addition between two shared values!!

In [0]:
# encrypted "5"
bob_x_share = 2
alice_x_share = 3

# encrypted "7"
bob_y_share = 5
alice_y_share = 2

# encrypted 5 + 7
bob_z_share = bob_x_share + bob_y_share
alice_z_share = alice_x_share + alice_y_share

decrypted_z = bob_z_share + alice_z_share
decrypted_z

12

As you can see, we just added two numbers together while they were still encrypted!!!

One small tweak - notice that since all our numbers are positive, it's possible for each share to reveal a little bit of information about the hidden value, namely, it's always greater than the share. Thus, if Bob has a share "3" then he knows that the encrypted value is at least 3.

This would be quite bad, but can be solved through a simple fix. Decryption happens by summing all the shares together MODULUS some constant. I.e.

In [0]:
x = 5

Q = 23740629843760239486723

bob_x_share = 23552870267 # <- a random number
alice_x_share = Q - bob_x_share + x
alice_x_share

23740629843736686616461

In [0]:
(bob_x_share + alice_x_share) % Q

5

So now, as you can see, both shares are wildly larger than the number being shared, meaning that individual shares no longer leak this inforation. However, all the properties we discussed earlier still hold! (addition, encryption, decryption, etc.)

# Project: Build Methods for Encrypt, Decrypt, and Add 

In this project, you must take the lessons we learned in the last section and write general methods for encrypt, decrypt, and add. Store shares for a variable in a tuple like so.

In [0]:
x_share = (2,5,7)

Even though normally those shares would be distributed amongst several workers, you can store them in ordered tuples like this for now :)

In [0]:
# try this project here!

# Lesson: Intro to Fixed Precision Encoding

As you may remember, our goal is to aggregate gradients using this new Secret Sharing technique. However, the protocol we've just explored in the last section uses positive integers. However, our neural network weights are NOT integers. Instead, our weights are decimals (floating point numbers).

Not a huge deal! We just need to use a fixed precision encoding, which lets us do computation over decimal numbers using integers!

In [0]:
BASE=10
PRECISION=4

In [0]:
def encode(x):
    return int((x * (BASE ** PRECISION)) % Q)

def decode(x):
    return (x if x <= Q/2 else x - Q) / BASE**PRECISION

In [0]:
encode(3.5)

35000

In [0]:
decode(35000)

3.5

In [0]:
x = encrypt(encode(5.5))
y = encrypt(encode(2.3))
z = add(x,y)
decode(decrypt(z))

7.8

# Lesson: Secret Sharing + Fixed Precision in PySyft

While writing things from scratch is certainly educational, PySyft makes a great deal of this much easier for us through its abstractions.

In [0]:
bob = bob.clear_objects()
alice = alice.clear_objects()
secure_worker = secure_worker.clear_objects()

In [0]:
x = th.tensor([1,2,3,4,5])

### Secret Sharing Using PySyft

We can share using the simple .share() method!

In [0]:
x = x.share(bob, alice, secure_worker)

In [0]:
bob._objects

{35498656553: tensor([  10235770278698899, 1401398179551373756, 2277280072169145491,
          636965538565031298,  913795591610271305])}

and as you can see, Bob now has one of the shares of x! Furthermore, we can still call addition in this state, and PySyft will automatically perform the remote execution for us!

In [0]:
y = x + x

In [0]:
y

(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:23637986557 -> bob:30254176063]
	-> (Wrapper)>[PointerTensor | me:18229131498 -> alice:75856222543]
	-> (Wrapper)>[PointerTensor | me:34301722959 -> secure_worker:75419815101]
	*crypto provider: me*

In [0]:
y.get()

tensor([ 2,  4,  6,  8, 10])

### Fixed Precision using PySyft

We can also convert a tensor to fixed precision using .fix_precision()

In [0]:
x = th.tensor([0.1,0.2,0.3])

In [0]:
x

tensor([0.1000, 0.2000, 0.3000])

In [0]:
x = x.fix_prec()

In [0]:
x.child.child

tensor([100, 200, 300])

In [0]:
y = x + x

In [0]:
y = y.float_prec()
y

tensor([0.2000, 0.4000, 0.6000])

### Shared Fixed Precision

And of course, we can combine the two!

In [0]:
x = th.tensor([0.1, 0.2, 0.3])

In [0]:
x = x.fix_prec().share(bob, alice, secure_worker)

In [0]:
y = x + x

In [0]:
y.get().float_prec()

tensor([0.2000, 0.4000, 0.6000])

Make sure to make the point that people can see the model averages in the clear.

# Final Project: Federated Learning with Encrypted Gradient Aggregation

See the other notebook here: https://colab.research.google.com/drive/1hDbIS5s8hL6ISd5RTCvhnRPDadt565P0