<a href="https://colab.research.google.com/github/SarahSchnei/Deep-Learning-and-Differential-Privacy/blob/master/ToyFederatedLearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Federated Learning using a dataset about toys

Federated Learning is the idea that one large global model can be trained and maintained by a group of smaller, more individualized models. An example of this is with a mobile phone's predictive text feature. Your mobile phone is continuously learning and updating it's predictive model as you interact with it, and it shares this updated model with the software engineers that built it. Everyone's updated models are aggregated into a single model and that model is distributed as a software update.

In [0]:
from torch import nn, optim
import torch as th
import syft as sy

In [0]:
##!pip install syft



In [0]:
###creating the toy dataset

data = th.tensor([[1.,1], [0,1], [1, 0], [0,0]], requires_grad=True)
target = th.tensor([[1.],[1], [0], [0]], requires_grad=True)

In [0]:
model = nn.Linear(2,1) ##two features in (data), one feature out (target)

In [0]:
def train(iterations=20):
  for iter in range(20):
    opt = optim.SGD(params=model.parameters(), lr=0.1)
    ### SGD means stochastic gradient descent, lr is the learning rate
    opt.zero_grad()
    ###making sure gradients are zero to start
    pred = model(data)
    loss = ((pred-target)**2).sum()
    loss.backward() ##updating the weights in our gradients
    opt.step()
    print(loss.data)
    
train()

tensor(1.3331)
tensor(0.8759)
tensor(0.5817)
tensor(0.3887)
tensor(0.2613)
tensor(0.1769)
tensor(0.1206)
tensor(0.0828)
tensor(0.0573)
tensor(0.0399)
tensor(0.0281)
tensor(0.0199)
tensor(0.0142)
tensor(0.0102)
tensor(0.0074)
tensor(0.0054)
tensor(0.0040)
tensor(0.0029)
tensor(0.0022)
tensor(0.0016)


In [0]:
hook = sy.TorchHook(th)

W0716 04:21:10.703423 140705261594496 hook.py:98] Torch was already hooked... skipping hooking process


In [0]:
###bringing in our two virtual machines or workers, bob and alice 
bob = sy.VirtualWorker(hook, id='bob')
alice = sy.VirtualWorker(hook, id='alice')

In [0]:
###sending our previously defined data to bob and alice
data_bob = data[0:2].send(bob)
target_bob = data[0:2].send(bob)
data_alice = data[0:2].send(alice)
target_alice = data[0:2].send(alice)

In [0]:
##Our whole data set is split evenly across bob and alice
datasets = [(data_bob, target_bob), (data_alice, target_alice)]

In [0]:
model = nn.Linear(2,1)
opt = optim.SGD(params=model.parameters(), lr=0.1)

In [0]:
_data, _target = datasets[0]

In [0]:
_data
## returns my id and bob's id

(Wrapper)>[PointerTensor | me:35295272140 -> bob:5507095491]

In [0]:
_data.location

<VirtualWorker id:bob #objects:6>

In [0]:
model = model.send(_data.location)
##sending the model to the data where it's located on bob

In [0]:
list(model.parameters())

[Parameter containing:
 Parameter>[PointerTensor | me:92446118489 -> bob:17904156271],
 Parameter containing:
 Parameter>[PointerTensor | me:78030646172 -> bob:31841827102]]

In [0]:
def train(iterations=20):
  
  model = nn.Linear(2,1)
  opt = optim.SGD(params=model.parameters(), lr=0.1) 
  
  for iter in range(iterations):
    
    for _data, _target in datasets:
      ##sending out model to where the data is (on bob)
      model = model.send(_data.location)

      opt.zero_grad()
      pred = model(_data)
      loss = ((pred - _target)**2).sum()
      loss.backward()
      opt.step()

      ##calling the smarter model that trained on bob back to the handle model
      model = model.get()
      
      print(loss.get())

In [0]:
train()

tensor(4.4111, requires_grad=True)
tensor(3.1595, requires_grad=True)
tensor(2.3085, requires_grad=True)
tensor(1.7298, requires_grad=True)
tensor(1.3362, requires_grad=True)
tensor(1.0686, requires_grad=True)
tensor(0.8867, requires_grad=True)
tensor(0.7629, requires_grad=True)
tensor(0.6788, requires_grad=True)
tensor(0.6216, requires_grad=True)
tensor(0.5827, requires_grad=True)
tensor(0.5562, requires_grad=True)
tensor(0.5382, requires_grad=True)
tensor(0.5260, requires_grad=True)
tensor(0.5177, requires_grad=True)
tensor(0.5120, requires_grad=True)
tensor(0.5082, requires_grad=True)
tensor(0.5056, requires_grad=True)
tensor(0.5038, requires_grad=True)
tensor(0.5026, requires_grad=True)
tensor(0.5017, requires_grad=True)
tensor(0.5012, requires_grad=True)
tensor(0.5008, requires_grad=True)
tensor(0.5005, requires_grad=True)
tensor(0.5004, requires_grad=True)
tensor(0.5003, requires_grad=True)
tensor(0.5002, requires_grad=True)
tensor(0.5001, requires_grad=True)
tensor(0.5001, requi

So in this exersise, we build a simple model, sent it to where the data was, and updated the model with the results of training on the data on bob. In the example of training the text prediction on a mobile phone, thousands of virtual workers like bob and alice would send back their updated models, and a system update would be implemented on the average of the update. This is thought to be the safer, more private way of training and updating models on involving thousands or even hundreds of thousands of users. The idea is that instead of bringing user data to the model (on a central server somewhere), the average of updated models allows each user their privacy while also allowing models to be updated.