# Section: Securing Federated Learning

- Lesson 1: Trusted Aggregator
- Lesson 2: Intro to Additive Secret Sharing
- Lesson 3: Intro to Fixed Precision Encoding
- Lesson 4: Secret Sharing + Fixed Precision in PySyft
- Final Project: Federated Learning wtih Encrypted Gradient Aggregation

# Lesson: Federated Learning with a Trusted Aggregator

In the last section, we learned how to train a model on a distributed dataset using Federated Learning. In particular, the last project aggregated gradients directly from one data owner to another. 

However, while in some cases it could be ideal to do this, what would be even better is to be able to choose a neutral third party to perform the aggregation.

As it turns out, we can use the same tools we used previously to accomplish this.

# Project: Federated Learning with a Trusted Aggregator

In [9]:
# try this project here!

In [10]:
import syft as sy
import torch as th



In [11]:
hook = sy.TorchHook(th)

W0710 12:47:01.570128 13956 hook.py:97] Torch was already hooked... skipping hooking process


In [12]:
from torch import nn,optim

In [13]:
# A Toy Dataset
data = th.tensor([[1.0,1.0],[3,1.0],[1,1.0],[0,1.0],[23.0,1.0],[6,1.0],[7,1.0],[15,1.0],[9.0,1.0]], requires_grad=True)


target = th.tensor([[2.],[6.0], [2.0], [0],[46.0],[12.0],[14.0],[30.0],[18.0]], requires_grad=True)
        


# Workers

In [14]:
m = 3 #number of workers

In [15]:
workers  =[sy.VirtualWorker(hook, id = "w"+ str(i)) for i in range(m)]
chunk_size = data.shape[0]//m



# Make Datasets

In [16]:
#make a mini-dataset, one per worker
datasets = [ [
              data[j*chunk_size : (j+1) * chunk_size  ] ,
              target[j*chunk_size : (j+1) * chunk_size] 
             ]
            
             for  j in range(m)
           ]

print("Datasets[0]", *datasets[0], sep= "\n")

#send datasets to workers
for k in range(m):
    
    datasets[k][0] = datasets[k][0].send(workers[k])
    
    datasets[k][1] = datasets[k][1].send(workers[k])

Datasets[0]
tensor([[1., 1.],
        [3., 1.],
        [1., 1.]], grad_fn=<SliceBackward>)
tensor([[2.],
        [6.],
        [2.]], grad_fn=<SliceBackward>)


In [17]:
import random
from random import shuffle

def train_secure_grads(m,datasets,workers, iterations=20):
    """
    iterations: int
    
    m: int
        number of models/workers
        
    datasets: list of lists of tensors
        [ [data_1,targets_1],[data_2, targets_2], ...     ]
    
    workers: list of VirtualWorkers
    
    iterations: int 
        number of SGD steps to do.
        
    """

    # Create a model for each worker 
    models = [ nn.Linear(2,1) for _ in range(m)]
    
    
    # Global model
    global_model = nn.Linear(2,1)
    
    # Create optims 
    optims = []

    for i,t in enumerate(datasets):
        
        _data,_target  = t[0],t[1]
        
        
        # Data preparation
        
        # send model to the data
        models[i] = models[i].send(_data.location)
        
        #create optimizer for model i 
        #learning rate >= 0.1, makess trainig diverge on most models with little data.
        optim_i = optim.SGD(params=models[i].parameters(), lr=0.001)
        optims.extend( [optim_i] )
        
        for _ in range(iterations):            
            # do normal training
            optims[i].zero_grad()
            pred = models[i](_data)
            loss = ((pred - _target)**2).sum()
            loss.backward()
            
            optims[i].step()

            
            print("Loss: {} for model: {}".format( loss.clone().get().item(), i ))
        
        print("Params {} for model: {}".format(models[i].weight.clone().get() ,i))
        print()
    
    
    # Average models on selected worker
    
    #shuffle models. Select first to aggregate gradients
    shuffle(models)
    
    #Trusted agregator
    # Selected worker to aggregate gradients.
    sel_model = models[0]
    
    #move_model = lambda x: x.move(sel_worker)

    w,b = sel_model.weight.data ,sel_model.bias.data
    
    for model in models[1:]:
        # Move model to selected worker
        model.move(sel_model.location)
        
        w+= model.weight.data
        b+=  model.bias.data
    
    w = w/m
    b = b/m
    
    with th.no_grad():
    
        global_model.weight.set_(  w.get() 
                                  )
        global_model.bias.set_(    b.get() 
                                  )
    
    print("Params for model global model: {}".format(global_model.weight.clone()))

        
    return global_model


In [18]:
train_secure_grads(m = 3, datasets = datasets,workers = workers, iterations = 50  )

Loss: 58.597496032714844 for model: 0
Loss: 55.003089904785156 for model: 0
Loss: 51.63465881347656 for model: 0
Loss: 48.47798156738281 for model: 0
Loss: 45.51971435546875 for model: 0
Loss: 42.747371673583984 for model: 0
Loss: 40.14923858642578 for model: 0
Loss: 37.71435546875 for model: 0
Loss: 35.432437896728516 for model: 0
Loss: 33.293853759765625 for model: 0
Loss: 31.28957748413086 for model: 0
Loss: 29.411151885986328 for model: 0
Loss: 27.650657653808594 for model: 0
Loss: 26.00066375732422 for model: 0
Loss: 24.454212188720703 for model: 0
Loss: 23.004791259765625 for model: 0
Loss: 21.646284103393555 for model: 0
Loss: 20.37297248840332 for model: 0
Loss: 19.179485321044922 for model: 0
Loss: 18.06080436706543 for model: 0
Loss: 17.01221466064453 for model: 0
Loss: 16.029308319091797 for model: 0
Loss: 15.107946395874023 for model: 0
Loss: 14.244258880615234 for model: 0
Loss: 13.434610366821289 for model: 0
Loss: 12.675599098205566 for model: 0
Loss: 11.964044570922852 

Linear(in_features=2, out_features=1, bias=True)

# Lesson: Intro to Additive Secret Sharing

While being able to have a trusted third party to perform the aggregation is certainly nice, in an ideal setting we wouldn't have to trust anyone at all. This is where Cryptography can provide an interesting alterantive. 

Specifically, we're going to be looking at a simple protocol for Secure Multi-Party Computation called Additive Secret Sharing. This protocol will allow multiple parties (of size 3 or more) to aggregate their gradients without the use of a trusted 3rd party to perform the aggregation. In other words, we can add 3 numbers together from 3 different people without anyone ever learning the inputs of any other actors.

Let's start by considering the number 5, which we'll put into a varible x

In [19]:
x = 5

Let's say we wanted to SHARE the ownership of this number between two people, Alice and Bob. We could split this number into two shares, 2, and 3, and give one to Alice and one to Bob

In [20]:
bob_x_share = 2
alice_x_share = 3

decrypted_x = bob_x_share + alice_x_share
decrypted_x

5

Note that neither Bob nor Alice know the value of x. They only know the value of their own SHARE of x. Thus, the true value of X is hidden (i.e., encrypted). 

The truly amazing thing, however, is that Alice and Bob can still compute using this value! They can perform arithmetic over the hidden value! Let's say Bob and Alice wanted to multiply this value by 2! If each of them multiplied their respective share by 2, then the hidden number between them is also multiplied! Check it out!

In [21]:
bob_x_share = 2 * 2
alice_x_share = 3 * 2

decrypted_x = bob_x_share + alice_x_share
decrypted_x

10

This even works for addition between two shared values!!

In [22]:
# encrypted "5"
bob_x_share = 2
alice_x_share = 3

# encrypted "7"
bob_y_share = 5
alice_y_share = 2

# encrypted 5 + 7
bob_z_share = bob_x_share + bob_y_share
alice_z_share = alice_x_share + alice_y_share

decrypted_z = bob_z_share + alice_z_share
decrypted_z

12

As you can see, we just added two numbers together while they were still encrypted!!!

One small tweak - notice that since all our numbers are positive, it's possible for each share to reveal a little bit of information about the hidden value, namely, it's always greater than the share. Thus, if Bob has a share "3" then he knows that the encrypted value is at least 3.

This would be quite bad, but can be solved through a simple fix. Decryption happens by summing all the shares together MODULUS some constant. I.e.

In [23]:
x = 5

Q = 23740629843760239486723

bob_x_share = 23552870267 # <- a random number
alice_x_share = Q - bob_x_share + x
alice_x_share

23740629843736686616461

In [24]:
(bob_x_share + alice_x_share) % Q

5

So now, as you can see, both shares are wildly larger than the number being shared, meaning that individual shares no longer leak this inforation. However, all the properties we discussed earlier still hold! (addition, encryption, decryption, etc.)

# Project: Build Methods for Encrypt, Decrypt, and Add 

In this project, you must take the lessons we learned in the last section and write general methods for encrypt, decrypt, and add. Store shares for a variable in a tuple like so.

In [25]:
x_share = (2,5,7)

Even though normally those shares would be distributed amongst several workers, you can store them in ordered tuples like this for now :)

In [26]:
# try this project here!


In [27]:
import random 

def encrypt(value, n_shares):
    """
    value: int 
        
    n_shares: int 
        number of shares to split 
    """
    
    Q = 23740629843760239486723
    secure_values = ()
    
   
    for k in range(n_shares):
        
        if k < n_shares -1:
            y = random.randint(-Q,Q)
            
            secure_values+= (y,)
        else:
            y = Q 
            
            for i in range(k):
                y -=  secure_values[i]
            
            y = y % Q
                
            y+=value
            
                
            secure_values+= (y,)
        
    return secure_values
        


            
            

def decrypt(values, Q):
    """
    values: tuple of ints
    
    Q: int 
        prime number
    
    return sum of values module Q
    """
    return sum(values) % Q
    
def add(values_1, values_2, Q):
    assert len(values_1) == len(values_2), "Must have same length"
    
    t = ()
    
    for i in range(len(values_1)):
        t+= ( (values_1[i] + values_2[i]) % Q ,)
        
    return t
    
    
    
    

In [28]:
values_e  = encrypt(20,10)
values_e2  = encrypt(50,10)
print(values_e)
print(values_e2)

value = decrypt(values_e, Q)

value2 = decrypt(values_e2, Q)

print(value)

print(value2)

sum_values = add(values_e,values_e2,Q)

print(sum_values)


(-15400368266993006068197, -11775402909742148457434, 15311908170804368504784, 21787459108172405796441, 13629521662793987397254, -15435138088330808081638, 22499298251183564261166, 6780452511785644633328, 11176933272122249534673, 22647225819484460939812)
(20869319668123403329385, 15091977196243081444462, 6939932336302767932880, 8723022491147967045141, -5793512603149877549856, -22691390493855741069250, -2465736812456750803355, -17678304805536881371525, -23139789255253028199129, 20144482278435059241297)
20
50
(5468951401130397261188, 3316574286500932987028, 22251840507107136437664, 6769851755560133354859, 7836009059644109847398, 9354731105333929822558, 20033561438726813457811, 12842777550009002748526, 11777773860629460822267, 19051078254159280694386)


In [29]:
(1,2) + (1,1)

(1, 2, 1, 1)

# Lesson: Intro to Fixed Precision Encoding

As you may remember, our goal is to aggregate gradients using this new Secret Sharing technique. However, the protocol we've just explored in the last section uses positive integers. However, our neural network weights are NOT integers. Instead, our weights are decimals (floating point numbers).

Not a huge deal! We just need to use a fixed precision encoding, which lets us do computation over decimal numbers using integers!

In [35]:
BASE=10
PRECISION=4

In [36]:
def encode(x):
    return int((x * (BASE ** PRECISION)) % Q)

def decode(x):
    return (x if x <= Q/2 else x - Q) / BASE**PRECISION

In [37]:
encode(3.5)

35000

In [38]:
decode(35000)

3.5

In [39]:
x = encrypt(encode(5.5))
y = encrypt(encode(2.3))
z = add(x,y)
decode(decrypt(z))

TypeError: encrypt() missing 1 required positional argument: 'n_shares'

# Lesson: Secret Sharing + Fixed Precision in PySyft

While writing things from scratch is certainly educational, PySyft makes a great deal of this much easier for us through its abstractions.

In [40]:
bob = bob.clear_objects()
alice = alice.clear_objects()
secure_worker = secure_worker.clear_objects()

NameError: name 'bob' is not defined

In [41]:
x = th.tensor([1,2,3,4,5])

### Secret Sharing Using PySyft

We can share using the simple .share() method!

In [42]:
x = x.share(bob, alice, secure_worker)

NameError: name 'bob' is not defined

In [43]:
bob._objects

NameError: name 'bob' is not defined

and as you can see, Bob now has one of the shares of x! Furthermore, we can still call addition in this state, and PySyft will automatically perform the remote execution for us!

In [44]:
y = x + x

In [45]:
y

tensor([ 2,  4,  6,  8, 10])

In [46]:
y.get()



AttributeError: 'Tensor' object has no attribute 'child'

### Fixed Precision using PySyft

We can also convert a tensor to fixed precision using .fix_precision()

In [103]:
x = th.tensor([0.1,0.2,0.3])

In [104]:
x

tensor([0.1000, 0.2000, 0.3000])

In [105]:
x = x.fix_prec()

In [106]:
x.child.child

tensor([100, 200, 300])

In [107]:
y = x + x

In [108]:
y = y.float_prec()
y

tensor([0.2000, 0.4000, 0.6000])

### Shared Fixed Precision

And of course, we can combine the two!

In [112]:
bob,alice,secure_worker = (sy.VirtualWorker(hook, id = x ) for x in ["bob","alice","secure_worker"] )

In [113]:
x = th.tensor([0.1, 0.2, 0.3])

In [114]:
x = x.fix_prec().share(bob, alice, secure_worker)

In [115]:
y = x + x

In [117]:
y,x

((Wrapper)>FixedPrecisionTensor>(Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:71611505858 -> bob:84124392813]
 	-> (Wrapper)>[PointerTensor | me:22254103362 -> alice:12350534841]
 	-> (Wrapper)>[PointerTensor | me:33595740445 -> secure_worker:53031209616]
 	*crypto provider: me*,
 (Wrapper)>FixedPrecisionTensor>(Wrapper)>[AdditiveSharingTensor]
 	-> (Wrapper)>[PointerTensor | me:44513523097 -> bob:90324360530]
 	-> (Wrapper)>[PointerTensor | me:64184315972 -> alice:16446497511]
 	-> (Wrapper)>[PointerTensor | me:44540902778 -> secure_worker:4763932519]
 	*crypto provider: me*)

In [118]:
y.get().float_prec()

tensor([0.2000, 0.4000, 0.6000])

Make sure to make the point that people can see the model averages in the clear.

In [13]:
w1.clear_objects(),w2.clear_objects(),w3.clear_objects()

(<VirtualWorker id:w1 #objects:0>,
 <VirtualWorker id:w2 #objects:0>,
 <VirtualWorker id:w3 #objects:0>)

In [14]:
w1._objects,w2._objects,w3._objects

({}, {}, {})

In [15]:
w1,w2,w3 = (sy.VirtualWorker(hook, id = "w" + str(i)) for i in range(1,4) )

In [16]:
a,b = th.tensor([1,1,1,1]), th.tensor([5,5,5,5])

In [17]:
a = a.share(w1,w2,w3)

In [18]:
b = b.share(w1,w2,w3)

In [20]:
c = a+b

In [21]:
c

(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:55259610816 -> w1:92321648997]
	-> (Wrapper)>[PointerTensor | me:79959464640 -> w2:30549673585]
	-> (Wrapper)>[PointerTensor | me:58414339208 -> w3:10270487504]
	*crypto provider: me*

# Final Project: Federated Learning with Encrypted Gradient Aggregation

In [3]:
import syft as sy
import torch as th

W0710 13:03:09.509400 17824 secure_random.py:22] Falling back to insecure randomness since the required custom op could not be found for the installed version of TensorFlow (1.14.0). Fix this by compiling custom ops.
W0710 13:03:09.563257 17824 deprecation_wrapper.py:119] From F:\Instalaciones\Anaconda3\envs\privateai\lib\site-packages\tf_encrypted\session.py:28: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.



In [4]:
hook = sy.TorchHook(th)

In [5]:
from torch  import nn,optim

### Dataset 

In [22]:
# A Toy Dataset
data = th.tensor([[1.0,1.0],[3,1.0],[1,1.0],[0,1.0],[23.0,1.0],[6,1.0],[7,1.0],[15,1.0],[9.0,1.0]], requires_grad=True)


target = th.tensor([[2.],[6.0], [2.0], [0],[46.0],[12.0],[14.0],[30.0],[18.0]], requires_grad=True)

# Workers

In [23]:
m=  3 #number of workers

In [24]:
workers  =[sy.VirtualWorker(hook, id = "w"+ str(i)) for i in range(m)]
chunk_size = data.shape[0]//m

## Make Datasets


In [25]:
#make a mini-dataset, one per worker
datasets = [ [
              data[j*chunk_size : (j+1) * chunk_size  ] ,
              target[j*chunk_size : (j+1) * chunk_size] 
             ]
            
             for  j in range(m)
           ]

print("Datasets[0]", *datasets[0], sep= "\n")

#send datasets to workers
for k in range(m):
    
    datasets[k][0] = datasets[k][0].send(workers[k])
    
    datasets[k][1] = datasets[k][1].send(workers[k])

Datasets[0]
tensor([[1., 1.],
        [3., 1.],
        [1., 1.]], grad_fn=<SliceBackward>)
tensor([[2.],
        [6.],
        [2.]], grad_fn=<SliceBackward>)


In [47]:
import random
from random import shuffle

def train_additive_secret_sharing(m,datasets,workers, iterations=20):
    """
    iterations: int
    
    m: int
        number of models/workers
        
    datasets: list of lists of tensors
        [ [data_1,targets_1],[data_2, targets_2], ...     ]
    
    workers: list of VirtualWorkers
    
    iterations: int 
        number of SGD steps to do.
        
    """

    # Create a model for each worker 
    models = [ nn.Linear(2,1) for _ in range(m)]
    
    
    # Global model
    global_model = nn.Linear(2,1)
    
    # Create optims 
    optims = []

    for i,t in enumerate(datasets):
        
        _data,_target  = t[0],t[1]
        
        
        # Data preparation
        
        # send model to the data
        models[i] = models[i].send(_data.location)
        
        #create optimizer for model i 
        #learning rate >= 0.1, makess trainig diverge on most models with little data.
        optim_i = optim.SGD(params=models[i].parameters(), lr=0.001)
        optims.extend( [optim_i] )
        
        for _ in range(iterations):            
            # do normal training
            optims[i].zero_grad()
            pred = models[i](_data)
            loss = ((pred - _target)**2).sum()
            loss.backward()
            
            optims[i].step()

            
            print("Loss: {} for model: {}".format( loss.clone().get().item(), i ))
        
        print("Params {} for model: {}".format(models[i].weight.clone().get() ,i))
        print()
    
    
    # Average models using additive secret sharing.
   
    w,b = None,None
    
    for k,model in enumerate(models):
        
        # Share model params between workers
        if k == 0:
            w = model.weight.data.clone().get()
            w = w.share(*workers)
            
            b = model.bias.data.clone().get()
            b = b.share(*workers)
            
            
        else:
            w+= model.weight.data.clone().get().share(*workers)
            b+=  model.bias.data.clone().get().share(*workers)
    
    
    with th.no_grad():
    
        global_model.weight.set_(  w.get().float()
                                  )
        global_model.bias.set_(    b.get().float() 
                                  )
    
    print("Params for model global model: {}".format(global_model.weight.clone()))

        
    return global_model









In [48]:
train_additive_secret_sharing(m = 3,datasets = datasets,workers = workers, iterations=20)

Loss: 41.730979919433594 for model: 0
Loss: 39.138465881347656 for model: 0
Loss: 36.70909881591797 for model: 0
Loss: 34.43257522583008 for model: 0
Loss: 32.29927062988281 for model: 0
Loss: 30.30017852783203 for model: 0
Loss: 28.42683982849121 for model: 0
Loss: 26.67133903503418 for model: 0
Loss: 25.02625274658203 for model: 0
Loss: 23.484628677368164 for model: 0
Loss: 22.039953231811523 for model: 0
Loss: 20.686120986938477 for model: 0
Loss: 19.417409896850586 for model: 0
Loss: 18.228464126586914 for model: 0
Loss: 17.114255905151367 for model: 0
Loss: 16.070079803466797 for model: 0
Loss: 15.091530799865723 for model: 0
Loss: 14.17447280883789 for model: 0
Loss: 13.315030097961426 for model: 0
Loss: 12.509580612182617 for model: 0
Params tensor([[0.8863, 0.0022]], requires_grad=True) for model: 0

Loss: 2675.000244140625 for model: 1
Loss: 49.47021484375 for model: 1
Loss: 0.918511152267456 for model: 1
Loss: 0.02064458839595318 for model: 1
Loss: 0.003995848353952169 for mo

Linear(in_features=2, out_features=1, bias=True)