# Section: Securing Federated Learning

- Lesson 1: Trusted Aggregator
- Lesson 2: Intro to Additive Secret Sharing
- Lesson 3: Intro to Fixed Precision Encoding
- Lesson 4: Secret Sharing + Fixed Precision in PySyft
- Final Project: Federated Learning wtih Encrypted Gradient Aggregation

# Lesson: Federated Learning with a Trusted Aggregator

In the last section, we learned how to train a model on a distributed dataset using Federated Learning. In particular, the last project aggregated gradients directly from one data owner to another. 

However, while in some cases it could be ideal to do this, what would be even better is to be able to choose a neutral third party to perform the aggregation.
* a neutral 3rd party who has a machine that we can trust to not look at the gradients when performing the aggregation
* we can choose anyone -> the likelihood that we can find is much higher

As it turns out, we can use the same tools we used previously to accomplish this.

# Project: Federated Learning with a Trusted Aggregator

When we aggregate our gradients from multiple workers, instead of bringing them to a central server to aggregate them or having them send gradients directly to each other we're goona have everyone who's involved in the federated learning process or the federated learning session send all of their gradients to a neutral third party (secure worker)

In [2]:
import syft as sy
import torch as th
from torch import nn, optim

In [3]:
# create the hook
hook = sy.TorchHook(th)

#### create workers

In [4]:
# data owners
bob = sy.VirtualWorker(hook, id='bob')
alice = sy.VirtualWorker(hook, id='alice')
# trusted third party
secure_worker = sy.VirtualWorker(hook, id='secure_worker')

In [5]:
# inform workers that other workers exist
bob.add_workers([alice, secure_worker])
alice.add_workers([bob, secure_worker])
secure_worker.add_workers([alice, bob])



<VirtualWorker id:secure_worker #tensors:0>

#### create dataset for simple linear model

In [6]:
# A Toy Dataset
data = th.tensor([[0,0],[0,1],[1,0],[1,1.]], requires_grad=True)
target = th.tensor([[0],[0],[1],[1.]], requires_grad=True) 

In [7]:
# get pointers to training data on each worker by
# sending some training data to bob and alice
# bob data
bobs_data = data[0:2].send(bob)
bobs_target = target[0:2].send(bob)
# alice data
alices_data = data[2:].send(alice)
alices_target = target[2:].send(alice)

#### initialize a linear model

In [8]:
# initialize toy model
model = nn.Linear(2,1)

#### federated learning

* we want to have two different model that we want to send to each of the workers because we want to be averaged
* we need two optimizers: one for bob and one for parameters of alice's

In [9]:
for round_iter in range(10):

    # copy the model and send it to bob
    bobs_model = model.copy().send(bob)
    alices_model = model.copy().send(alice)

    # initialize the optimizers
    bobs_opt = optim.SGD(params=bobs_model.parameters(), lr=0.1)
    alices_opt = optim.SGD(params=alices_model.parameters(), lr=0.1)

    # training the models and average the models

    for i in range(10):

        # for bob
        # zero out the gradients
        bobs_model.zero_grad()
        # make prediction
        bobs_pred = bobs_model(bobs_data)
        # calculate the loss with SME 
        bobs_loss = ((bobs_pred - bobs_target)**2).sum()
        # backpropagate
        bobs_loss.backward()
        # make an optimizer step to update the weights 
        bobs_opt.step()
        # get the loss
        bobs_loss = bobs_loss.get().data
        #print(bobs_loss)

        # for alice
        # zero out the gradients
        alices_model.zero_grad()
        # make prediction
        alices_pred = alices_model(alices_data)
        # calculate the loss with SME 
        alices_loss = ((alices_pred - alices_target)**2).sum()
        # backpropagate
        alices_loss.backward()
        # make an optimizer step to update the weights 
        alices_opt.step()
        # get the loss
        alices_loss = alices_loss.get().data
        #print(alices_loss)

    # iterates through every parameter on alices model and call move on that parameter
    # move models on secure_worker
    alices_model.move(secure_worker)
    bobs_model.move(secure_worker)
    
    with th.no_grad():
        # average the models
        model.weight.set_(((alices_model.weight.data + bobs_model.weight.data) / 2).get())
        model.bias.set_(((alices_model.bias.data + bobs_model.bias.data) / 2).get())
        
    secure_worker.clear_objects()
    
    print("Bob:" + str(bobs_loss) + " Alice:" + str(alices_loss))

Bob:tensor(0.0383) Alice:tensor(0.0197)
Bob:tensor(0.0109) Alice:tensor(0.0017)
Bob:tensor(0.0042) Alice:tensor(8.0833e-05)
Bob:tensor(0.0021) Alice:tensor(8.5223e-06)
Bob:tensor(0.0012) Alice:tensor(5.1591e-05)
Bob:tensor(0.0008) Alice:tensor(6.8679e-05)
Bob:tensor(0.0005) Alice:tensor(6.5655e-05)
Bob:tensor(0.0004) Alice:tensor(5.5286e-05)
Bob:tensor(0.0003) Alice:tensor(4.3969e-05)
Bob:tensor(0.0002) Alice:tensor(3.4018e-05)


In [10]:
# parameters for bob's and alice's model
secure_worker._objects

{}

In [11]:
bob._objects

{60587739820: tensor([[0., 0.],
         [0., 1.]], requires_grad=True), 4902868227: tensor([[0.],
         [0.]], requires_grad=True), 99165568907: tensor([[ 0.0124],
         [-0.0075]], grad_fn=<AddmmBackward>)}

In [12]:
alice._objects

{37005961846: tensor([[1., 0.],
         [1., 1.]], requires_grad=True), 25998039684: tensor([[1.],
         [1.]], requires_grad=True), 40224110750: tensor([[0.9954],
         [1.0036]], grad_fn=<AddmmBackward>)}

# Lesson: Intro to Additive Secret Sharing

While being able to have a trusted third party to perform the aggregation is certainly nice, in an ideal setting we wouldn't have to trust anyone at all. This is where Cryptography can provide an interesting alterantive. 

Specifically, we're going to be looking at a simple protocol for Secure Multi-Party Computation called Additive Secret Sharing. This protocol will allow multiple parties (of size 3 or more) to aggregate their gradients without the use of a trusted 3rd party to perform the aggregation. In other words, we can add 3 numbers together from 3 different people without anyone ever learning the inputs of any other actors.

Let's start by considering the number 5, which we'll put into a varible x

* aggregate these gradients in an encrypted state
* no individual person will see those gradients

Shared ownership/shared governance

If we iterate, the loss will be minimized.

In [65]:
x = 5

Let's say we wanted to SHARE the ownership of this number between two people, Alice and Bob. We could split this number into two shares, 2, and 3, and give one to Alice and one to Bob

In [66]:
bob_x_share = 2
alice_x_share = 3

decrypted_x = bob_x_share + alice_x_share
decrypted_x

5

Note that neither Bob nor Alice know the value of x. They only know the value of their own SHARE of x. Thus, the true value of X is hidden (i.e., encrypted). 

The truly amazing thing, however, is that Alice and Bob can still compute using this value! They can perform arithmetic over the hidden value! Let's say Bob and Alice wanted to multiply this value by 2! If each of them multiplied their respective share by 2, then the hidden number between them is also multiplied! Check it out!

In [67]:
bob_x_share = 2 * 2
alice_x_share = 3 * 2

decrypted_x = bob_x_share + alice_x_share
decrypted_x

10

This even works for addition between two shared values!!

In [68]:
# encrypted "5"
bob_x_share = 2
alice_x_share = 3

# encrypted "7"
bob_y_share = 5
alice_y_share = 2

# encrypted 5 + 7
bob_z_share = bob_x_share + bob_y_share
alice_z_share = alice_x_share + alice_y_share

decrypted_z = bob_z_share + alice_z_share
decrypted_z

12

As you can see, we just added two numbers together while they were still encrypted!!!

One small tweak - notice that since all our numbers are positive, it's possible for each share to reveal a little bit of information about the hidden value, namely, it's always greater than the share. Thus, if Bob has a share "3" then he knows that the encrypted value is at least 3.

This would be quite bad, but can be solved through a simple fix. Decryption happens by summing all the shares together MODULUS some constant. I.e.

In [69]:
x = 5

Q = 23740629843760239486723

bob_x_share = 23552870267 # <- a random number
alice_x_share = Q - bob_x_share + x
alice_x_share

23740629843736686616461

In [70]:
(bob_x_share + alice_x_share) % Q

5

So now, as you can see, both shares are wildly larger than the number being shared, meaning that individual shares no longer leak this inforation. However, all the properties we discussed earlier still hold! (addition, encryption, decryption, etc.)

# Project: Build Methods for Encrypt, Decrypt, and Add 

In this project, you must take the lessons we learned in the last section and write general methods for encrypt, decrypt, and add. Store shares for a variable in a tuple like so.

# define ```encrypt()```

In [19]:
x_share = (2,5,7)

Even though normally those shares would be distributed amongst several workers, you can store them in ordered tuples like this for now :)

```sum(shares) % Q = x```

In [72]:
import random

In [89]:
# input a tuple, x and number of shares
# sum the shares toghether and take a modulus of that sum
# it will descrypt to the correct value
def encrypt(x, n_shares=3):
    
    shares = list()

    # generate the first two shares randomly
    for i in range(n_shares - 1):
        shares.append(random.randint(0, Q))

    final_share = Q - (sum(shares) % Q) + x

    shares.append(final_share)

    return tuple(shares)

In [90]:
Q = 23740629843760239486723

In [91]:
encrypt(5, n_shares=10)

(21597179522291401762004,
 6416049415504118228326,
 14841024364321363973762,
 7947195663863917936334,
 21100714859057051088285,
 3741620568934529596861,
 22563181941837810698297,
 13571906918269134215161,
 21186138546649603116508,
 9478767261832506304805)

### define ```decrypt()```

In [94]:
def decrypt(shares):
    return sum(shares) % Q

In [95]:
decrypt(encrypt(5))

5

### define ```add()```

In [96]:
# accepts two different tuples
def add(a, b):
    
    c = list()
    
    assert(len(a) == len(b))
    
    for i in range(len(a)):
        c.append((a[i] + b[i]) % Q)
        
    return tuple(c)

In [97]:
decrypt(add(encrypt(5), encrypt(10)))

15

In [98]:
x = encrypt(5)
y = encrypt(10)

x, y

((7117559698689842068389, 4508699778082869664980, 12114370366987527753359),
 (15339271351164757262471, 7608963749108748285553, 792394743486733938709))

In [100]:
z = add(x,y)
z

(22456831049854599330860, 12117663527191617950533, 12906765110474261692068)

In [102]:
decrypt(z)

15

# Lesson: Intro to Fixed Precision Encoding

As you may remember, our goal is to aggregate gradients using this new Secret Sharing technique. However, the protocol we've just explored in the last section uses positive integers. However, our neural network weights are NOT integers. Instead, our weights are decimals (floating point numbers).

Not a huge deal! We just need to use a fixed precision encoding, which lets us do computation over decimal numbers using integers!

* PRECISION: how many decimal points we actually want to encode our numbers with
* BASE 10 (normal encodyng), 2 (bynary)

In [108]:
BASE=10
# 4 decimal values
PRECISION=4
Q=23740629843760239486723

In [109]:
def encode(x):
    return int((x * (BASE ** PRECISION)) % Q)

def decode(x):
    return (x if x <= Q/2 else x - Q) / BASE**PRECISION

In [105]:
# 3.5 means 35000
encode(3.5)

35000

In [106]:
decode(35000)

3.5

In [114]:
encode(-0.5), decode(0.5)

(23740629843760239345664, 5e-05)

In [115]:
decode(5000 + 5000)

1.0

In [110]:
x = encrypt(encode(5.5))
y = encrypt(encode(2.3))
z = add(x,y)
decode(decrypt(z))

7.8

# Lesson: Secret Sharing + Fixed Precision in PySyft

While writing things from scratch is certainly educational, PySyft makes a great deal of this much easier for us through its abstractions.

In [117]:
bob = bob.clear_objects()
alice = alice.clear_objects()
secure_worker = secure_worker.clear_objects()

In [118]:
# create data
x = th.tensor([1,2,3,4,5])

### Secret Sharing Using PySyft

We can share using the simple .share() method!

In [119]:
# split the data into multiple different additive secret shares 
# then send those shares to bob, alice and secure_worker
x = x.share(bob, alice, secure_worker)
x

(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:59140615744 -> bob:17438411538]
	-> (Wrapper)>[PointerTensor | me:28503946172 -> alice:28493265064]
	-> (Wrapper)>[PointerTensor | me:38366978111 -> secure_worker:83251700870]
	*crypto provider: me*

In [120]:
bob._objects

{17438411538: tensor([4417300060625948145, 3784897679592191331, 4512047126824704739,
         3973372192880975151,  915133640708791685])}

and as you can see, Bob now has one of the shares of x! Furthermore, we can still call addition in this state, and PySyft will automatically perform the remote execution for us!

In [121]:
y = x + x

In [122]:
y

(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:34836591538 -> bob:9053020198]
	-> (Wrapper)>[PointerTensor | me:93091533370 -> alice:64751503372]
	-> (Wrapper)>[PointerTensor | me:26606486632 -> secure_worker:27358104088]
	*crypto provider: me*

In [123]:
y.get()

tensor([ 2,  4,  6,  8, 10])

### Fixed Precision using PySyft

We can also convert a tensor to fixed precision using .fix_precision()

In [139]:
x = th.tensor([0.1,0.2,0.3])

In [140]:
x

tensor([0.1000, 0.2000, 0.3000])

In [141]:
x = x.fix_prec()

In [142]:
x

(Wrapper)>FixedPrecisionTensor>tensor([100, 200, 300])

In [143]:
x.child.child

tensor([100, 200, 300])

In [144]:
# torch tensor
type(x)

syft.frameworks.torch.tensors.interpreters.native.Tensor

In [145]:
# this is an interpreter
type(x.child)

syft.frameworks.torch.tensors.interpreters.precision.FixedPrecisionTensor

In [146]:
# actual encoding
type(x.child.child)

syft.frameworks.torch.tensors.interpreters.native.Tensor

In [132]:
x = x.float_prec()
x

tensor([0.1000, 0.2000, 0.3000])

In [148]:
y = x + x

In [149]:
y = y.float_prec()
y

tensor([0.2000, 0.4000, 0.6000])

### Shared Fixed Precision

And of course, we can combine the two!

In [150]:
x = th.tensor([0.1, 0.2, 0.3])

In [151]:
x = x.fix_prec().share(bob, alice, secure_worker)

In [154]:
x

(Wrapper)>FixedPrecisionTensor>(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:44079632468 -> bob:25617235153]
	-> (Wrapper)>[PointerTensor | me:30051213728 -> alice:17812033504]
	-> (Wrapper)>[PointerTensor | me:15985362009 -> secure_worker:60132834401]
	*crypto provider: me*

In [152]:
y = x + x

In [153]:
y.get().float_prec()

tensor([0.2000, 0.4000, 0.6000])

Make sure to make the point that people can see the model averages in the clear.

# Final Project: Federated Learning with Encrypted Gradient Aggregation

* build on the project wuth a trusted secure aggregator
* aggregate gradients using additive secret sharing and fixed precision encoding
* use three data owners per aggregation
* replace the trusted third part with this encryption protocol (additive secret sharing protocol) so that no one actually has to share their own gradients with any other worker directly
* instead, they will encrypt the individual values

#### clear objects on workers

In [177]:
bob = bob.clear_objects()
alice = alice.clear_objects()
secure_worker = secure_worker.clear_objects()

#### create dataset for simple linear model

In [178]:
# A Toy Dataset
data = th.tensor([[0,0],[0,1],[1,0],[1,1.]], requires_grad=True)
target = th.tensor([[0],[0],[1],[1.]], requires_grad=True) 

In [179]:
# get pointers to training data on each worker by
# sending some training data to bob and alice
# bob data
bobs_data = data[0:2].send(bob)
bobs_target = target[0:2].send(bob)
# alice data
alices_data = data[2:].send(alice)
alices_target = target[2:].send(alice)

#### initialize a linear model

In [180]:
# initialize toy model
model = nn.Linear(2,1)

#### securing federated learning

* we want to have two different model that we want to send to each of the workers because we want to be averaged
* we need two optimizers: one for bob and one for parameters of alice's

In [169]:
# copy the model and send it to bob
alices_model = model.copy().send(alice)

alices_opt = optim.SGD(params=alices_model.parameters(), lr=0.1)

# for alice
# zero out the gradients
alices_model.zero_grad()
# make prediction
alices_pred = alices_model(alices_data)
# calculate the loss with SME 
alices_loss = ((alices_pred - alices_target)**2).sum()
# backpropagate
alices_loss.backward()
# make an optimizer step to update the weights 
alices_opt.step()
# get the loss
alices_loss = alices_loss.get().data

In [170]:
# copy the model and send it to bob
bobs_model = model.copy().send(bob)
# initialize the optimizers
bobs_opt = optim.SGD(params=bobs_model.parameters(), lr=0.1)

bobs_model.zero_grad()
# make prediction
bobs_pred = bobs_model(bobs_data)
# calculate the loss with SME 
bobs_loss = ((bobs_pred - bobs_target)**2).sum()
# backpropagate
bobs_loss.backward()
# make an optimizer step to update the weights 
bobs_opt.step()
# get the loss
bobs_loss = bobs_loss.get().data

In [171]:
# share the models useing secret sharing
alices_model_info = alices_model.fix_precision().share(alice, bob, 
                                                       crypto_provider=secure_worker)
bobs_model_info = bobs_model.fix_precision().share(alice, bob, 
                                                   crypto_provider=secure_worker)
# get the weights
alices_weights = alices_model_info.weight.get()
bobs_weights = bobs_model_info.weight.get()

# get the weights data
bobs_weights_data = bobs_weights.child.get().float_precision()
alices_weights_data = alices_weights.child.get().float_precision() 

In [172]:
# get the bias
alices_bias = alices_model_info.bias.get()
bobs_bias = bobs_model_info.bias.get()

# get the bias data
alices_bias_data = alices_bias.child.get().float_precision() 
bobs_bias_data = bobs_bias.child.get().float_precision()

In [176]:
with th.no_grad():
    # average the models
    model.weight.set_((bobs_weights_data + alices_weights_data)/2)
    model.bias.set_((bobs_bias_data + alices_bias_data)/2)

In [181]:
for round_iter in range(10):

    # copy the model and send it to bob
    bobs_model = model.copy().send(bob)
    alices_model = model.copy().send(alice)

    # initialize the optimizers
    bobs_opt = optim.SGD(params=bobs_model.parameters(), lr=0.1)
    alices_opt = optim.SGD(params=alices_model.parameters(), lr=0.1)

    # training the models and average the models

    for i in range(10):

        # for bob
        # zero out the gradients
        bobs_model.zero_grad()
        # make prediction
        bobs_pred = bobs_model(bobs_data)
        # calculate the loss with SME 
        bobs_loss = ((bobs_pred - bobs_target)**2).sum()
        # backpropagate
        bobs_loss.backward()
        # make an optimizer step to update the weights 
        bobs_opt.step()
        # get the loss
        bobs_loss = bobs_loss.get().data
        #print(bobs_loss)

        # for alice
        # zero out the gradients
        alices_model.zero_grad()
        # make prediction
        alices_pred = alices_model(alices_data)
        # calculate the loss with SME 
        alices_loss = ((alices_pred - alices_target)**2).sum()
        # backpropagate
        alices_loss.backward()
        # make an optimizer step to update the weights 
        alices_opt.step()
        # get the loss
        alices_loss = alices_loss.get().data
        #print(alices_loss)


    # share the models using secret sharing
    alices_model_info = alices_model.fix_precision().share(alice, bob, 
                                                           crypto_provider=secure_worker)
    bobs_model_info = bobs_model.fix_precision().share(alice, bob, 
                                                       crypto_provider=secure_worker)
    # get the weights
    alices_weights = alices_model_info.weight.get()
    bobs_weights = bobs_model_info.weight.get()

    # get the weights data
    bobs_weights_data = bobs_weights.child.get().float_precision()
    alices_weights_data = alices_weights.child.get().float_precision() 
    
    # get the bias
    alices_bias = alices_model_info.bias.get()
    bobs_bias = bobs_model_info.bias.get()

    # get the bias data
    alices_bias_data = alices_bias.child.get().float_precision() 
    bobs_bias_data = bobs_bias.child.get().float_precision()

    with th.no_grad():
        # average the models
        model.weight.set_((bobs_weights_data + alices_weights_data)/2)
        model.bias.set_((bobs_bias_data + alices_bias_data)/2)
        
    secure_worker.clear_objects()
    
    print("Bob:" + str(bobs_loss) + " Alice:" + str(alices_loss))

Bob:tensor(0.0291) Alice:tensor(0.0121)
Bob:tensor(0.0084) Alice:tensor(0.0019)
Bob:tensor(0.0030) Alice:tensor(0.0002)
Bob:tensor(0.0013) Alice:tensor(1.3129e-06)
Bob:tensor(0.0007) Alice:tensor(1.4935e-05)
Bob:tensor(0.0004) Alice:tensor(2.8330e-05)
Bob:tensor(0.0003) Alice:tensor(3.0401e-05)
Bob:tensor(0.0002) Alice:tensor(2.8501e-05)
Bob:tensor(0.0001) Alice:tensor(2.3376e-05)
Bob:tensor(0.0001) Alice:tensor(1.8576e-05)
