# Section: Securing Federated Learning

- Lesson 1: Trusted Aggregator
- Lesson 2: Intro to Additive Secret Sharing
- Lesson 3: Intro to Fixed Precision Encoding
- Lesson 4: Secret Sharing + Fixed Precision in PySyft
- Final Project: Federated Learning wtih Encrypted Gradient Aggregation

# Lesson: Federated Learning with a Trusted Aggregator

In the last section, we learned how to train a model on a distributed dataset using Federated Learning. In particular, the last project aggregated gradients directly from one data owner to another. 

However, while in some cases it could be ideal to do this, what would be even better is to be able to choose a neutral third party to perform the aggregation.

As it turns out, we can use the same tools we used previously to accomplish this.

# Project: Federated Learning with a Trusted Aggregator

In [None]:
# try this project here!

In [None]:
import syft as sy
import torch as th
hook = sy.TorchHook(th)
from torch import nn, optim

In [None]:
bob = sy.VirtualWorker(hook, id='bob')
alice = sy.VirtualWorker(hook, id='alice')
secure_worker = sy.VirtualWorker(hook, id='secure_worker')

In [None]:
bob.add_workers([alice, secure_worker])
alice.add_workers([bob, secure_worker])
secure_worker.add_workers([alice, bob])

In [None]:
data = th.tensor([[0,0], [0,1], [1,0], [1.,1]], requires_grad=True)
target = th.tensor([[0], [0], [1], [1.]], requires_grad=True)

In [None]:
bobs_data = data[0:2].send(bob)
bobs_target = target[0:2].send(bob)

In [None]:
alices_data = data[2:4].send(alice)
alices_target = target[2:4].send(alice)

In [None]:
model = nn.Linear(2, 1)

In [None]:
bobs_model = model.copy().send(bob) 

In [None]:
alices_model = model.copy().send(alice)

In [None]:
bob._objects

In [None]:
alice._objects

In [None]:
bobs_opt = optim.SGD(params=bobs_model.parameters(), lr=0.1)

In [None]:
alices_opt = optim.SGD(params=alices_model.parameters(), lr=0.1)

In [None]:
bobs_opt.zero_grad()
bobs_pred = bobs_model(bobs_data)
bobs_loss = ((bobs_pred - bobs_target) ** 2).sum()
bobs_loss.backward()

bobs_opt.step()
bobs_loss = bobs_loss.get().data
bobs_loss

In [None]:
alices_opt.zero_grad()
alices_pred = alices_model(alices_data)
alices_loss = ((alices_pred - alices_target) ** 2).sum()
alices_loss.backward()

alices_opt.step()
alices_loss = alices_loss.get().data
alices_loss

In [None]:
for i in range(10):
    bobs_opt.zero_grad()
    bobs_pred = bobs_model(bobs_data)
    bobs_loss = ((bobs_pred - bobs_target) ** 2).sum()
    bobs_loss.backward()

    bobs_opt.step()
    bobs_loss = bobs_loss.get().data
    bobs_loss
    
    alices_opt.zero_grad()
    alices_pred = alices_model(alices_data)
    alices_loss = ((alices_pred - alices_target) ** 2).sum()
    alices_loss.backward()

    alices_opt.step()
    alices_loss = alices_loss.get().data
    alices_loss
    

In [None]:
bobs_loss, alices_loss

In [None]:
alices_model.move(secure_worker)
bobs_model.move(secure_worker)

In [None]:
with th.no_grad():
    model.weight.set_(((alices_model.weight + bobs_model.weight) / 2).get())
    model.bias.set_(((alices_model.bias + bobs_model.bias) / 2).get())

In [None]:
for round_iter in range(10):
    bobs_model = model.copy().send(bob) 
    alices_model = model.copy().send(alice)
    
    bobs_opt = optim.SGD(params=bobs_model.parameters(), lr=0.1)
    alices_opt = optim.SGD(params=alices_model.parameters(), lr=0.1)
    
    for i in range(10):
        bobs_opt.zero_grad()
        bobs_pred = bobs_model(bobs_data)
        bobs_loss = ((bobs_pred - bobs_target) ** 2).sum()
        bobs_loss.backward()

        bobs_opt.step()
        bobs_loss = bobs_loss.get().data
        bobs_loss

        alices_opt.zero_grad()
        alices_pred = alices_model(alices_data)
        alices_loss = ((alices_pred - alices_target) ** 2).sum()
        alices_loss.backward()

        alices_opt.step()
        alices_loss = alices_loss.get().data
        alices_loss
    
    alices_model.move(secure_worker)
    bobs_model.move(secure_worker)
    with th.no_grad():
        model.weight.set_(((alices_model.weight + bobs_model.weight) / 2).get())
        model.bias.set_(((alices_model.bias + bobs_model.bias) / 2).get())
        
    #secure_worker.clear_objects()
    
    print('Bob:' + str(bobs_loss) + ' Alice:' + str(alices_loss))    

# Lesson: Intro to Additive Secret Sharing

While being able to have a trusted third party to perform the aggregation is certainly nice, in an ideal setting we wouldn't have to trust anyone at all. This is where Cryptography can provide an interesting alterantive. 

Specifically, we're going to be looking at a simple protocol for Secure Multi-Party Computation called Additive Secret Sharing. This protocol will allow multiple parties (of size 3 or more) to aggregate their gradients without the use of a trusted 3rd party to perform the aggregation. In other words, we can add 3 numbers together from 3 different people without anyone ever learning the inputs of any other actors.

Let's start by considering the number 5, which we'll put into a varible x

In [None]:
x = 5

In [None]:
x = 5 

Let's say we wanted to SHARE the ownership of this number between two people, Alice and Bob. We could split this number into two shares, 2, and 3, and give one to Alice and one to Bob

In [None]:
bob_x_share = 2
alice_x_share = 3

decrypted_x = bob_x_share + alice_x_share
decrypted_x

In [None]:
bob_x_share = 2
alice_x_share = 3

decrypted_x = bob_x_share + alice_x_share
decrypted_x

Note that neither Bob nor Alice know the value of x. They only know the value of their own SHARE of x. Thus, the true value of X is hidden (i.e., encrypted). 

The truly amazing thing, however, is that Alice and Bob can still compute using this value! They can perform arithmetic over the hidden value! Let's say Bob and Alice wanted to multiply this value by 2! If each of them multiplied their respective share by 2, then the hidden number between them is also multiplied! Check it out!

In [None]:
bob_x_share = 2 * 2
alice_x_share = 3 * 2

decrypted_x = bob_x_share + alice_x_share
decrypted_x

This even works for addition between two shared values!!

In [None]:
# encrypted "5"
bob_x_share = 2
alice_x_share = 3

# encrypted "7"
bob_y_share = 5
alice_y_share = 2

# encrypted 5 + 7
bob_z_share = bob_x_share + bob_y_share
alice_z_share = alice_x_share + alice_y_share

decrypted_z = bob_z_share + alice_z_share
decrypted_z

In [None]:
bob_x_share = 2
alice_x_share = 3

bob_y_share = 5
alice_y_share = 2

bob_z_share = bob_x_share + bob_y_share
alice_z_share = alice_x_share + alice_y_share

decrypted_z = bob_z_share + alice_z_share
decrypted_z

As you can see, we just added two numbers together while they were still encrypted!!!

One small tweak - notice that since all our numbers are positive, it's possible for each share to reveal a little bit of information about the hidden value, namely, it's always greater than the share. Thus, if Bob has a share "3" then he knows that the encrypted value is at least 3.

This would be quite bad, but can be solved through a simple fix. Decryption happens by summing all the shares together MODULUS some constant. I.e.

In [None]:
x = 5

Q = 23740629843760239486723

bob_x_share = 23552870267 # <- a random number
alice_x_share = Q - bob_x_share + x
alice_x_share

In [None]:
x = 5

Q = 23740629843760239486723

bob_x_share = 23552870267
alice_x_share = Q - bob_x_share + x
alice_x_share

In [None]:
(bob_x_share + alice_x_share) % Q 

In [None]:
(bob_x_share + alice_x_share) % Q

So now, as you can see, both shares are wildly larger than the number being shared, meaning that individual shares no longer leak this inforation. However, all the properties we discussed earlier still hold! (addition, encryption, decryption, etc.)

# Project: Build Methods for Encrypt, Decrypt, and Add 

In this project, you must take the lessons we learned in the last section and write general methods for encrypt, decrypt, and add. Store shares for a variable in a tuple like so.

In [None]:
x_share = (2,5,7)

Even though normally those shares would be distributed amongst several workers, you can store them in ordered tuples like this for now :)

In [None]:
# try this project here!

In [None]:
import random

In [None]:
def encrypt(x, n_shares=3):
    shares = list()
    for i in range(n_shares - 1):
        shares.append(random.randint(0, Q))
    final_value = Q - sum(shares) % Q + x
    shares.append(final_value)
    return x, tuple(shares)

def decrypt(x, shares):
    if x > 0:
        return sum(shares) % Q
    else:
        return sum(shares) % Q - Q  

In [None]:
encrypt(5, n_shares=3)

In [None]:
decrypt(*encrypt(5, n_shares=3))

In [None]:
def encrypt(x, n_shares=3):
    shares = list()
    for i in range(n_shares - 1):
        shares.append(random.randint(0, Q))
    final_value = x - sum(shares) % Q
    shares.append(final_value)
    return x, tuple(shares)

def decrypt(x, shares):
    if x > 0:
        return sum(shares) % Q
    else:
        return sum(shares) % Q - Q

In [None]:
decrypt(*encrypt(-5, n_shares=3))

In [None]:
def add(a, b):
    c = list()
    c.append(a[0] + b[0])
    assert(len(a) == len(b))
    for i in range(len(a[1])):
        c.append(a[1][i]+b[1][i])
    return c[0], tuple(c[1:])    

In [None]:
decrypt(*add(encrypt(5), encrypt(10)))

In [None]:
x = encrypt(5)
x

In [None]:
y = encrypt(10)
y

In [None]:
z = add(x, y)
z

In [None]:
decrypt(*z)

In [None]:
decrypt(*encrypt(15))

# Lesson: Intro to Fixed Precision Encoding

As you may remember, our goal is to aggregate gradients using this new Secret Sharing technique. However, the protocol we've just explored in the last section uses positive integers. However, our neural network weights are NOT integers. Instead, our weights are decimals (floating point numbers).

Not a huge deal! We just need to use a fixed precision encoding, which lets us do computation over decimal numbers using integers!

In [None]:
BASE=10
PRECISION=4

In [None]:
BASE = 10
PRECISION = 4

In [None]:
def encode(x):
    return int(x * BASE ** PRECISION) % Q

def decode(x):
    return (x if x <= Q/2 else x - Q) / BASE ** PRECISION

In [None]:
encode(3.5)

In [None]:
decode(35000)

In [None]:
decode(35000)

In [None]:
Q

In [None]:
decode(encode(-0.5))

In [None]:
decode(5000 + 5000)

In [None]:
x = encrypt(encode(5.5))
y = encrypt(encode(2.3))
z = add(x,y)
decode(decrypt(*z))

In [None]:
x = encrypt(encode(5.5))
y = encrypt(encode(2.3))
z = add(x, y)
decode(decrypt(*z))

# Lesson: Secret Sharing + Fixed Precision in PySyft

While writing things from scratch is certainly educational, PySyft makes a great deal of this much easier for us through its abstractions.

In [None]:
bob = bob.clear_objects()
alice = alice.clear_objects()
secure_worker = secure_worker.clear_objects()

In [None]:
import torch as th
import torch
import syft as sy

hook  = sy.TorchHook(th)

In [None]:
bob = sy.VirtualWorker(hook, id='bob')
alice = sy.VirtualWorker(hook, id='alice')
secure_worker = sy.VirtualWorker(hook, id='secure worker') 

In [None]:
bob = bob.clear_objects()
alice = alice.clear_objects()
secure_worker = secure_worker.clear_objects()

In [None]:
x = th.tensor([1,2,3,4,5])

In [None]:
x = th.tensor([1,2,3,4,5])
x

### Secret Sharing Using PySyft

We can share using the simple .share() method!

In [None]:
x = x.share(bob, alice, secure_worker)

In [None]:
x = x.share(bob, alice, secure_worker)
x

In [None]:
bob._objects

and as you can see, Bob now has one of the shares of x! Furthermore, we can still call addition in this state, and PySyft will automatically perform the remote execution for us!

In [None]:
y = x + x

In [None]:
y = x + x

In [None]:
y

In [None]:
y.get()

In [None]:
x.get()

### Fixed Precision using PySyft

We can also convert a tensor to fixed precision using .fix_precision()

In [None]:
x = th.tensor([0.1,0.2,0.3])

In [None]:
x = th.tensor([0.1,0.2,0.3])

In [None]:
x

In [None]:
x

In [None]:
x = x.fix_prec()

In [None]:
x = x.fix_prec()
x

In [None]:
x.child.child

In [None]:
x.child.child

In [None]:
y = x + x

In [None]:
y = x + x
y

In [None]:
y = y.float_prec()
y

In [None]:
y = y.float_prec()
y

### Shared Fixed Precision

And of course, we can combine the two!

In [None]:
x = th.tensor([0.1, 0.2, 0.3])

In [None]:
x = th.tensor([0.1, 0.2, 0.3])
x

In [None]:
x = x.fix_prec().share(bob, alice, secure_worker)

In [None]:
x = x.fix_prec().share(bob, alice, secure_worker)
x

In [None]:
y = x + x

In [None]:
y = x + x
y

In [None]:
y.get().float_prec()

In [None]:
y.get().float_prec()

In [None]:
x.get().float_prec()

In [None]:
x = th.tensor([0.1, 0.2, 0.3])
x = x.fix_prec().share(bob, alice, secure_worker)
y = th.tensor([0.2, 0.3, 0.4])
y = y.fix_prec().share(bob, alice, secure_worker)
z = x + y
x.get().float_prec(), y.get().float_prec(), z.get().float_prec()

In [None]:
x = th.tensor([0.1, 0.2, 0.3])
x = x.fix_prec(base=10, precision_fractional=4)
x

Make sure to make the point that people can see the model averages in the clear.

# Final Project: Federated Learning with Encrypted Gradient Aggregation

In [4]:
import syft as sy
import torch as th
hook = sy.TorchHook(th)
from torch import nn, optim
import numpy as np
import torch

In [5]:
bob = sy.VirtualWorker(hook, id='bob')
alice = sy.VirtualWorker(hook, id='alice')
server_1 = sy.VirtualWorker(hook, id='server_1')
server_2 = sy.VirtualWorker(hook, id='server_2')
server_3 = sy.VirtualWorker(hook, id='server_3')

In [6]:
bob.clear_objects()
alice.clear_objects()
server_1.clear_objects()
server_2.clear_objects()
server_3.clear_objects()

<VirtualWorker id:server_3 #objects:0>

In [7]:
bob.add_workers([alice, server_1, server_2, server_3])
alice.add_workers([bob, server_1, server_2, server_3])
server_1.add_workers([alice, bob, server_2, server_3])
server_2.add_workers([alice, bob, server_1, server_3])
server_3.add_workers([alice, bob, server_1, server_2])

Worker alice already exists. Replacing old worker which could cause                     unexpected behavior
Worker server_1 already exists. Replacing old worker which could cause                     unexpected behavior
Worker server_2 already exists. Replacing old worker which could cause                     unexpected behavior
Worker server_3 already exists. Replacing old worker which could cause                     unexpected behavior
Worker bob already exists. Replacing old worker which could cause                     unexpected behavior
Worker server_1 already exists. Replacing old worker which could cause                     unexpected behavior
Worker server_2 already exists. Replacing old worker which could cause                     unexpected behavior
Worker server_3 already exists. Replacing old worker which could cause                     unexpected behavior
Worker alice already exists. Replacing old worker which could cause                     unexpected behavior
Worker bob a

<VirtualWorker id:server_3 #objects:0>

In [8]:
'''
data = np.random.rand(10000, 2)
target = np.sum(data, axis=1) + 1
data = th.tensor(data, dtype=th.float32, requires_grad=True)
target = th.tensor(target, dtype=th.float32, requires_grad=True).reshape(len(data), -1)
'''

'\ndata = np.random.rand(10000, 2)\ntarget = np.sum(data, axis=1) + 1\ndata = th.tensor(data, dtype=th.float32, requires_grad=True)\ntarget = th.tensor(target, dtype=th.float32, requires_grad=True).reshape(len(data), -1)\n'

In [9]:
data = torch.load('data.pt')
target = torch.load('target.pt')

data_train = data[0:6000]
target_train = target[0:6000]

data_val = data[6000:8000]
target_val = target[6000:8000]

data_test = data[8000:10000]
target_test = target[8000:10000]

bobs_data = data_train[0:3000].send(bob)
bobs_target = target_train[0:3000].send(bob)

alices_data = data_train[3000:6000].send(alice)
alices_target = target_train[3000:6000].send(alice)


In [None]:
data = th.tensor([[0,0], [0,1], [1,0], [1.,1]], requires_grad=True)
target = th.tensor([[0], [0], [1], [1.]], requires_grad=True)

bobs_data = data[0:2].send(bob)
bobs_target = target[0:2].send(bob)

alices_data = data[2:4].send(alice)
alices_target = target[2:4].send(alice)

In [None]:
# FedAvg algorithm -- model averaging
model = nn.Linear(2, 1)
loss_ModAvg = []
for round_iter in range(100):
    bobs_model = model.copy().send(bob) 
    alices_model = model.copy().send(alice)
    
    bobs_opt = optim.SGD(params=bobs_model.parameters(), lr=0.1)
    alices_opt = optim.SGD(params=alices_model.parameters(), lr=0.1)
    
    for i in range(10):
        bobs_opt.zero_grad()
        bobs_pred = bobs_model(bobs_data)
        bobs_loss = ((bobs_pred - bobs_target) ** 2).sum() / len(bobs_data)
        bobs_loss.backward()
                
        bobs_opt.step()                      
        bobs_loss = bobs_loss.get().data
        bobs_loss

        alices_opt.zero_grad()
        alices_pred = alices_model(alices_data)
        alices_loss = ((alices_pred - alices_target) ** 2).sum() / len(alices_data)
        alices_loss.backward()

        alices_opt.step()               
        alices_loss = alices_loss.get().data
        alices_loss    
      
    #with th.no_grad():
    bobs_model.get()
    alices_model.get()
    
    with th.no_grad():
        for param, alices_param, bobs_param in zip(model.parameters(), alices_model.parameters(), bobs_model.parameters()):
            param.set_((alices_param + bobs_param) / 2)
                    
    #secure_worker.clear_objects()
    
    print('Bob:' + str(bobs_loss) + ' Alice:' + str(alices_loss))    
    pred = model(data_train)
    loss = ((pred - target_train) ** 2).sum() / len(data_train)
    loss = loss.data
    loss_ModAvg.append(loss.item())

In [None]:
# FedAvg algorithm -- gradient averaging
model = nn.Linear(2, 1)
loss_GradAvg = []
for round_iter in range(10000):
    bobs_model = model.copy().send(bob) 
    alices_model = model.copy().send(alice)
    
    bobs_opt = optim.SGD(params=bobs_model.parameters(), lr=0.1)
    alices_opt = optim.SGD(params=alices_model.parameters(), lr=0.1)
    #opt = optim.SGD(params=model.parameters(), lr=0.1)
    
    #for i in range(10):
    bobs_opt.zero_grad()
    bobs_pred = bobs_model(bobs_data)
    bobs_loss = ((bobs_pred - bobs_target) ** 2).sum() / len(bobs_data)
    bobs_loss.backward()    

    #bobs_opt.step()
    bobs_loss = bobs_loss.get().data
    bobs_loss

    alices_opt.zero_grad()
    alices_pred = alices_model(alices_data)
    alices_loss = ((alices_pred - alices_target) ** 2).sum() / len(alices_data)
    alices_loss.backward()
    
    #alices_opt.step()
    alices_loss = alices_loss.get().data
    alices_loss
               
    #add noise sampled from laplace distribution 
    noise = np.random.laplace(0, 1, 1) / len(alices_data)
    noise = th.tensor(noise, dtype=torch.float32)
    alices_model.weight.grad_noise = alices_model.weight.grad + noise.copy().send(alice)
    alices_model.bias.grad_noise = alices_model.bias.grad + noise.copy().send(alice)
    noise = np.random.laplace(0, 1, 1) / len(bobs_data)
    noise = th.tensor(noise, dtype=torch.float32)
    bobs_model.weight.grad_noise = bobs_model.weight.grad + noise.copy().send(bob)
    bobs_model.bias.grad_noise = bobs_model.bias.grad + noise.copy().send(bob)    
    
    bobs_model.get()
    alices_model.get()  
    
    alices_model.weight.grad = alices_model.weight.grad_noise.get()
    alices_model.bias.grad = alices_model.bias.grad_noise.get()
    bobs_model.weight.grad = bobs_model.weight.grad_noise.get()
    bobs_model.bias.grad = bobs_model.bias.grad_noise.get()      
        
    for alices_param, bobs_param in zip(alices_model.parameters(), bobs_model.parameters()):
        alices_param.grad = (alices_param.grad.fix_precision(base=10, precision_fractional=4).share(server_1, server_2, server_3) + bobs_param.grad.fix_precision(base=10, precision_fractional=4).share(server_1, server_2, server_3)).get().float_precision() / 2
        bobs_param.grad = (alices_param.grad.fix_precision(base=10, precision_fractional=4).share(server_1, server_2, server_3) + bobs_param.grad.fix_precision(base=10, precision_fractional=4).share(server_1, server_2, server_3)).get().float_precision() / 2


    bobs_opt.step()
    alices_opt.step()
    
    #secure_worker.clear_objects()
    
    print('Bob:' + str(bobs_loss) + ' Alice:' + str(alices_loss))
    model = bobs_model.copy()
    pred = model(data_train)
    loss = ((pred - target_train) ** 2).sum() / len(data_train)
    loss = loss.data
    loss_GradAvg.append(loss.item())

In [10]:
# FedAvg algorithm -- gradient averaging (LDP + MPC)
# epsilon = 1.0
model = nn.Linear(2, 1)
train_loss_GradAvg = []
val_loss_GradAvg = []
bobs_model_weight_grad = []
bobs_model_bias_grad = []
alices_model_weight_grad = []
alices_model_bias_grad = []
for round_iter in range(2000):
    bobs_model = model.copy().send(bob)
    alices_model = model.copy().send(alice)
    opt = optim.SGD(params=model.parameters(), lr=0.1 * 100)

    bobs_weight_grad = 0
    bobs_bias_grad = 0
    for example, target in zip(bobs_data, bobs_target):
        weight_grad = th.mm(th.mm(example.copy().get().view(1, 2), bobs_model.weight.copy().get().transpose(1, 0)) + bobs_model.bias.copy().get() - target.copy().get(), example.copy().get().view(1, 2)) * 2
        weight_grad = weight_grad.data
        weight_grad = weight_grad / max(1, weight_grad.norm(1) / 3.9219e-04)
        bias_grad = (th.mm(example.copy().get().view(1, 2), bobs_model.weight.copy().get().transpose(1, 0)) + bobs_model.bias.copy().get() - target.copy().get()) * 2
        bias_grad = bias_grad.data.view(-1)
        bias_grad = bias_grad / max(1, bias_grad.norm(1) / 4.4978e-04)
        bobs_weight_grad += weight_grad
        bobs_bias_grad += bias_grad

    alices_weight_grad = 0
    alices_bias_grad = 0
    for example, target in zip(alices_data, alices_target):
        weight_grad = th.mm(th.mm(example.copy().get().view(1, 2), alices_model.weight.copy().get().transpose(1, 0)) + alices_model.bias.copy().get() - target.copy().get(), example.copy().get().view(1, 2)) * 2
        weight_grad = weight_grad.data
        weight_grad = weight_grad / max(1, weight_grad.norm(1) / 3.9020e-04)
        bias_grad = (th.mm(example.copy().get().view(1, 2), alices_model.weight.copy().get().transpose(1, 0)) + alices_model.bias.copy().get() - target.copy().get()) * 2
        bias_grad = bias_grad.data.view(-1)
        bias_grad = bias_grad / max(1, bias_grad.norm(1) / 4.3543e-04)
        alices_weight_grad += weight_grad
        alices_bias_grad += bias_grad

    bobs_model.get()
    alices_model.get()

    bobs_model.weight.grad = (bobs_weight_grad + th.tensor(np.random.laplace(0, 3.9219e-04 / 1.0, 2).reshape(1, 2), dtype=torch.float32)) / len(bobs_data)
    bobs_model.bias.grad = (bobs_bias_grad + th.tensor(np.random.laplace(0, 4.4978e-04 / 1.0, 1), dtype=torch.float32)) / len(bobs_data)
    alices_model.weight.grad = (alices_weight_grad + th.tensor(np.random.laplace(0, 3.9020e-04 / 1.0, 2).reshape(1, 2), dtype=torch.float32)) / len(alices_data)
    alices_model.bias.grad = (alices_bias_grad + th.tensor(np.random.laplace(0, 4.3543e-04 / 1.0, 1), dtype=torch.float32)) / len(alices_data)
    bobs_model_weight_grad.append(bobs_model.weight.grad)
    bobs_model_bias_grad.append(bobs_model.bias.grad)
    alices_model_weight_grad.append(alices_model.weight.grad)
    alices_model_bias_grad.append(alices_model.bias.grad)
    
    for param, alices_param, bobs_param in zip(model.parameters(), alices_model.parameters(), bobs_model.parameters()):
        param.grad = (alices_param.grad + bobs_param.grad) / 2

    opt.step()

    pred = model(data_train)
    loss = ((pred - target_train) ** 2).sum() / len(data_train)
    train_loss = loss.data
    train_loss_GradAvg.append(round(train_loss.item(), 4))
    pred = model(data_val)
    loss = ((pred - target_val) ** 2).sum() / len(data_val)
    val_loss = loss.data
    val_loss_GradAvg.append(round(val_loss.item(), 4))
    print(str(round_iter+1) + ' train loss:' + str(train_loss) + ' val loss:' + str(val_loss))

1 train loss:tensor(3.6272) val loss:tensor(3.6226)


In [None]:
bobs_model_weight_grad 

NameError: name 'bobs_model_weight_grad' is not defined

In [None]:
bobs_model_bias_grad

In [None]:
alices_model_weight_grad

In [None]:
alices_model_bias_grad

In [33]:
# FedAvg algorithm -- gradient averaging
# to get gradient norm bound C (L1 norm, sensitivity)
model = nn.Linear(2, 1)
criterion = nn.MSELoss()
train_loss_GradAvg = []
val_loss_GradAvg = []
bobs_weight_grad = []
bobs_bias_grad = []
alices_weight_grad = []
alices_bias_grad = []
for round_iter in range(1200):
    bobs_model = model.copy().send(bob)
    alices_model = model.copy().send(alice)

    bobs_opt = optim.SGD(params=bobs_model.parameters(), lr=0.1)
    alices_opt = optim.SGD(params=alices_model.parameters(), lr=0.1)
    #opt = optim.SGD(params=model.parameters(), lr=0.1)

    for example, target in zip(bobs_data, bobs_target):
        weight_grad = th.mm(th.mm(example.copy().get().view(1, 2), bobs_model.weight.copy().get().transpose(1, 0)) + bobs_model.bias.copy().get() - target.copy().get(), example.copy().get().view(1, 2)) * 2
        weight_grad = weight_grad.data
        print(weight_grad)
        output = model(example.copy().get())
        loss = criterion(output, target.copy().get())
        loss.backward()
        print(model.weight.grad.copy())
        #break
        bias_grad = (th.mm(example.copy().get().view(1, 2), bobs_model.weight.copy().get().transpose(1, 0)) + bobs_model.bias.copy().get() - target.copy().get()) * 2
        bias_grad = bias_grad.data.view(-1)
        print(bias_grad)
        print(model.bias.grad.copy())
        break
        bobs_weight_grad.append(weight_grad)
        bobs_bias_grad.append(bias_grad)
    break
    #bobs_opt.step()
    bobs_opt.zero_grad()
    bobs_pred = bobs_model(bobs_data)
    bobs_loss = ((bobs_pred - bobs_target) ** 2).sum() / len(bobs_data)
    bobs_loss.backward()
    bobs_loss = bobs_loss.get().data
    bobs_loss

    for example, target in zip(alices_data, alices_target):
        weight_grad = th.mm(th.mm(example.copy().get().view(1, 2), alices_model.weight.copy().get().transpose(1, 0)) + alices_model.bias.copy().get() - target.copy().get(), example.copy().get().view(1, 2)) * 2
        weight_grad = weight_grad.data
        bias_grad = (th.mm(example.copy().get().view(1, 2), alices_model.weight.copy().get().transpose(1, 0)) + alices_model.bias.copy().get() - target.copy().get()) * 2
        bias_grad = bias_grad.data.view(-1)
        alices_weight_grad.append(weight_grad)
        alices_bias_grad.append(bias_grad)

    #alices_opt.step()
    alices_opt.zero_grad()
    alices_pred = alices_model(alices_data)
    alices_loss = ((alices_pred - alices_target) ** 2).sum() / len(alices_data)
    alices_loss.backward()
    alices_loss = alices_loss.get().data
    alices_loss

    #add noise sampled from laplace distribution
    bobs_model.get()
    alices_model.get()

    for alices_param, bobs_param in zip(alices_model.parameters(), bobs_model.parameters()):
        t = (alices_param.grad + bobs_param.grad) / 2
        alices_param.grad = t
        bobs_param.grad = t

    bobs_opt.step()
    alices_opt.step()

    model = bobs_model.copy()
    pred = model(data_train)
    loss = ((pred - target_train) ** 2).sum() / len(data_train)
    train_loss = loss.data
    train_loss_GradAvg.append(round(train_loss.item(), 4))
    pred = model(data_val)
    loss = ((pred - target_val) ** 2).sum() / len(data_val)
    val_loss = loss.data
    val_loss_GradAvg.append(round(val_loss.item(), 4))
    print(str(round_iter+1) + ' Bob:' + str(bobs_loss) + ' Alice:' + str(alices_loss) + ' train loss:' + str(train_loss) + ' val loss:' + str(val_loss))

tensor([[-0.2091, -0.6363]])
tensor([[-0.2091, -0.6363]])
tensor([-2.0768])
tensor([-2.0768])


In [2]:
for param in alices_model.parameters():
    print(param, param.grad)

NameError: name 'alices_model' is not defined

In [9]:
bobs_weight_grad_2 = bobs_weight_grad[:982*3000]
bobs_bias_grad_2 = bobs_bias_grad[:982*3000]
alices_weight_grad_2 = alices_weight_grad[:982*3000]
alices_bias_grad_2 = alices_bias_grad[:982*3000]


In [10]:
weight_norm = []
for tensor_2 in bobs_weight_grad_2:
    weight_norm.append(np.linalg.norm(th.tensor(tensor_2).numpy().ravel(), 1))
np.median(weight_norm)

  current_tensor = hook_self.torch.native_tensor(*args, **kwargs)


0.00031698507

In [11]:
weight_norm = []
for tensor_2 in bobs_bias_grad_2:
    weight_norm.append(np.linalg.norm(th.tensor(tensor_2).numpy().ravel(), 1))
np.median(weight_norm)

0.0003633499

In [12]:
weight_norm = []
for tensor_2 in alices_weight_grad_2:
    weight_norm.append(np.linalg.norm(th.tensor(tensor_2).numpy().ravel(), 1))
np.median(weight_norm)

0.00031708734

In [13]:
weight_norm = []
for tensor_2 in alices_bias_grad_2:
    weight_norm.append(np.linalg.norm(th.tensor(tensor_2).numpy().ravel(), 1))
np.median(weight_norm)

0.0003528595

: 

In [None]:
th.tensor([0.0001495711, 0.00016760826, 0.00014689856, 0.00016474724]) + th.tensor([4.4650624e-05, 5.1498413e-05, 4.4847096e-05, 5.1498413e-05]) + th.tensor([0.0001770143, 0.00019645691, 0.0001748463, 0.00019645691]) + th.tensor([0.00019131065, 0.00021219254, 0.00018900707, 0.00021266937]) + th.tensor([5.2198142e-05, 6.055832e-05, 5.2361345e-05, 5.9843063e-05]) + th.tensor([0.00016261633, 0.00018405914, 0.00016166244, 0.00018167496]) + th.tensor([0.00019012422, 0.00021290779, 0.0001968217, 0.0002193451]) + th.tensor([6.6191686e-05, 7.43866e-05, 6.824422e-05, 7.6293945e-05]) + th.tensor([0.00018854644, 0.00021314621, 0.00019632914, 0.00021982193]) + th.tensor([0.00021137806, 0.00023818016, 0.00021091083, 0.00023651123])

In [None]:
# single example gradient norm (C value)
bobs_weight_C = 0.00014336
bobs_bias_C = 0.0001611
alices_weight_C = 0.00014419
alices_bias_C = 0.00016189

In [None]:
# total gradient norm (C value, 5000 examples)
bobs_weight_C = 0.5076
bobs_bias_C = 0.2839
alices_weight_C = 0.5551
alices_bias_C = 0.2370

In [None]:
th.tensor([0.5076/5000, 0.2839/5000, 0.5551/5000, 0.2370/5000])

In [None]:
%matplotlib

In [None]:
import matplotlib.pyplot as plt

#plt.plot(loss_ModAvg, label='ModAvg')
plt.plot(loss_GradAvg[:], label='GradAvg')
plt.legend(loc='best')
plt.xlabel('# communications')
plt.ylabel('Loss')
plt.show()

In [None]:
loss_GradAvg[-1]

In [None]:
import torch as th
import syft as sy

hook = sy.TorchHook(th)
bob = sy.VirtualWorker(hook, id ='bob')
alice = sy.VirtualWorker(hook, id ='alice')
secure_worker = sy.VirtualWorker(hook, id ='secure_worker')

data = th.tensor(
    [[0,0],[1,0],[0,1],[1,1]],
    dtype = th.float,
    requires_grad = True
)
targets = th.tensor(
    [[0],[0],[1],[1]],
    dtype = th.float,
    requires_grad=True
)

bob.clear_objects()
alice.clear_objects()
secure_worker.clear_objects()

bob_data = data[:2].send(bob)
bob_target = targets[:2].send(bob)

alice_data = data[2:].send(alice)
alice_target = targets[2:].send(alice)

model = th.nn.Linear(2,1)
'''
bob_model = model.copy().send(bob)
bob_optim = th.optim.SGD(params=bob_model.parameters(), lr = 0.1)

alice_model = model.copy().send(alice)
alice_optim = th.optim.SGD(params=alice_model.parameters(), lr = 0.1)
'''

for _ in range(10):
    bob_model = model.copy().send(bob)
    bob_optim = th.optim.SGD(params=bob_model.parameters(), lr = 0.1)

    alice_model = model.copy().send(alice)
    alice_optim = th.optim.SGD(params=alice_model.parameters(), lr = 0.1)

    for _ in range(10):
        bob_optim.zero_grad()
        bob_pred = bob_model(bob_data)
        bob_loss = ((bob_pred - bob_target)**2).sum()
        bob_loss.backward()
        bob_optim.step()

        alice_optim.zero_grad()
        alice_pred = alice_model(alice_data)
        alice_loss = ((alice_pred - alice_target)**2).sum()
        alice_loss.backward()
        alice_optim.step()

# share model weights 
    bob_weights = bob_model.weight.data.clone().get().fix_prec().share(bob, alice, secure_worker)
    bob_bias = bob_model.bias.data.clone().get().fix_prec().share(bob, alice, secure_worker)

    alice_weights = alice_model.weight.data.clone().get().fix_prec().share(bob, alice, secure_worker)
    alice_bias = alice_model.bias.data.clone().get().fix_prec().share(bob, alice, secure_worker)

# average the weights and update main model
    model.weight.data = (bob_weights + alice_weights).get().float_prec() / 2
    model.bias.data = (bob_bias + alice_bias).get().float_prec() / 2

    model_pred = model(data)
    model_loss = ((model_pred - targets)**2).sum()
    print(f'Model Loss: {model_loss}')
    print(f'bob_loss: {bob_loss.get()} | alice_loss: {alice_loss.get()}')

