<a href="https://colab.research.google.com/github/SamuelaAnastasi/PrivateAi_Challenge_FederatedLearning/blob/master/PrivateAi_Challenge_FederatedLearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Federated Learning
Federated Learning is a technique for training Deep Learning models on data to which you do not have access. Basically:

Instead of bringing all the data to one machine and training a model, we bring the model to the data, train it locally, and merely upload "model updates" to a central server.
In order to perform Federated Learning, we need to be able to use Deep Learning techniques on remote machines. This will require a new set of tools. Specifically, we will use an extension of PyTorch called PySyft.

In [107]:
!pip install tf-encrypted

! URL="https://github.com/openmined/PySyft.git" && FOLDER="PySyft" && if [ ! -d $FOLDER ]; then git clone -b dev --single-branch $URL; else (cd $FOLDER && git pull $URL && cd ..); fi;

!cd PySyft; python setup.py install  > /dev/null

import os
import sys
module_path = os.path.abspath(os.path.join('./PySyft'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
!pip install --upgrade --force-reinstall lz4
!pip install --upgrade --force-reinstall websocket
!pip install --upgrade --force-reinstall websockets
!pip install --upgrade --force-reinstall zstd

From https://github.com/openmined/PySyft
 * branch              HEAD       -> FETCH_HEAD
Already up to date.
zip_safe flag not set; analyzing archive contents...
Collecting lz4
  Using cached https://files.pythonhosted.org/packages/0a/c6/96bbb3525a63ebc53ea700cc7d37ab9045542d33b4d262d0f0408ad9bbf2/lz4-2.1.10-cp36-cp36m-manylinux1_x86_64.whl
[31mERROR: syft 0.1.21a1 has requirement msgpack>=0.6.1, but you'll have msgpack 0.5.6 which is incompatible.[0m
Installing collected packages: lz4
  Found existing installation: lz4 2.1.10
    Uninstalling lz4-2.1.10:
      Successfully uninstalled lz4-2.1.10
Successfully installed lz4-2.1.10


Collecting websocket
Collecting greenlet (from websocket)
  Using cached https://files.pythonhosted.org/packages/bf/45/142141aa47e01a5779f0fa5a53b81f8379ce8f2b1cd13df7d2f1d751ae42/greenlet-0.4.15-cp36-cp36m-manylinux1_x86_64.whl
Collecting gevent (from websocket)
  Using cached https://files.pythonhosted.org/packages/f2/ca/5b5962361ed832847b6b2f9a2d0452c8c2f29a93baef850bb8ad067c7bf9/gevent-1.4.0-cp36-cp36m-manylinux1_x86_64.whl
Installing collected packages: greenlet, gevent, websocket
  Found existing installation: greenlet 0.4.15
    Uninstalling greenlet-0.4.15:
      Successfully uninstalled greenlet-0.4.15
  Found existing installation: gevent 1.4.0
    Uninstalling gevent-1.4.0:
      Successfully uninstalled gevent-1.4.0
  Found existing installation: websocket 0.2.1
    Uninstalling websocket-0.2.1:
      Successfully uninstalled websocket-0.2.1
Successfully installed gevent-1.4.0 greenlet-0.4.15 websocket-0.2.1


Collecting websockets
  Using cached https://files.pythonhosted.org/packages/43/71/8bfa882b9c502c36e5c9ef6732969533670d2b039cbf95a82ced8f762b80/websockets-7.0-cp36-cp36m-manylinux1_x86_64.whl
[31mERROR: syft 0.1.21a1 has requirement msgpack>=0.6.1, but you'll have msgpack 0.5.6 which is incompatible.[0m
Installing collected packages: websockets
  Found existing installation: websockets 7.0
    Uninstalling websockets-7.0:
      Successfully uninstalled websockets-7.0
Successfully installed websockets-7.0


Collecting zstd
[31mERROR: syft 0.1.21a1 has requirement msgpack>=0.6.1, but you'll have msgpack 0.5.6 which is incompatible.[0m
Installing collected packages: zstd
  Found existing installation: zstd 1.4.0.0
    Uninstalling zstd-1.4.0.0:
      Successfully uninstalled zstd-1.4.0.0
Successfully installed zstd-1.4.0.0


In [0]:
import torch as th

In [109]:
x = th.tensor([1,2,3,4])
x

tensor([1, 2, 3, 4])

In [110]:
y = x + x
y

tensor([2, 4, 6, 8])

In [0]:
import syft as sy

In [112]:
hook = sy.TorchHook(th)

W0704 21:47:12.558662 140695636375424 hook.py:98] Torch was already hooked... skipping hooking process


In [113]:
th.tensor([1,2,3,4])

tensor([1, 2, 3, 4])

#Basic Remote Execution in PySyft
The essence of Federated Learning is the ability to train models in parallel on a wide number of machines. Thus, we need the ability to tell remote machines to execute the operations required for Deep Learning. 

To do this we need to work with pointers to tensors by creating a "pretend" machine owned by a "pretend" person - Bob.

In [0]:
bob = sy.VirtualWorker(hook, id="bob")

In [115]:
# check bob objects for the moment is empty
bob._objects

{81998041404: tensor([1, 2, 3, 4, 5])}

In [116]:
# create tensor
x = th.tensor([1,2,3,4])
x

tensor([1, 2, 3, 4])

In [0]:
# send tensors to bob and get a pointer to them
x = x.send(bob)

In [118]:
## check bob objects we passed
bob._objects

{17804070558: tensor([1, 2, 3, 4]), 81998041404: tensor([1, 2, 3, 4, 5])}

In [119]:
x.location

<VirtualWorker id:bob #objects:2>

In [120]:
x.id_at_location

17804070558

In [121]:
x.id

36753592694

In [122]:
x.owner

<VirtualWorker id:me #objects:0>

In [123]:
x

(Wrapper)>[PointerTensor | me:36753592694 -> bob:17804070558]

In [124]:
# getting back objects from bob
x = x.get()
x

tensor([1, 2, 3, 4])

In [125]:
# check bob objects now is again empty
bob._objects

{81998041404: tensor([1, 2, 3, 4, 5])}

##Project: Playing with Remote Tensors
Create another VirtualWorker called alice - send() and .get() a tensor to TWO workers

In [0]:
alice = sy.VirtualWorker(hook, id="alice")

In [0]:
x = th.tensor([1,2,3,4])

In [0]:
# send tensor to both workers
x_pointer = x.send(bob, alice)

In [129]:
# Multi-Pointer object
x_pointer

(Wrapper)>[MultiPointerTensor]
	-> (Wrapper)>[PointerTensor | me:30995196446 -> bob:67129709531]
	-> (Wrapper)>[PointerTensor | me:47323543206 -> alice:57091047014]

In [130]:
# Multi-Pointer's children
x_pointer.child.child

{'alice': (Wrapper)>[PointerTensor | me:47323543206 -> alice:57091047014],
 'bob': (Wrapper)>[PointerTensor | me:30995196446 -> bob:67129709531]}

In [131]:
bob._objects

{67129709531: tensor([1, 2, 3, 4]), 81998041404: tensor([1, 2, 3, 4, 5])}

In [132]:
alice._objects

{57091047014: tensor([1, 2, 3, 4])}

In [133]:
#get tensors back
x_pointer.get()

[tensor([1, 2, 3, 4]), tensor([1, 2, 3, 4])]

In [134]:
bob._objects

{81998041404: tensor([1, 2, 3, 4, 5])}

In [135]:
alice._objects

{}

In [0]:
# chain tensor creation and sending it to workers
x = th.tensor([1,2,3,4]).send(bob, alice)

In [137]:
bob._objects

{20747919031: tensor([1, 2, 3, 4]), 81998041404: tensor([1, 2, 3, 4, 5])}

In [138]:
alice._objects

{94646269265: tensor([1, 2, 3, 4])}

In [139]:
# get and sum of tensors in the workers
x.get(sum_results=True)

tensor([2, 4, 6, 8])

##Lesson: Introducing Remote Arithmetic

In [0]:
# create tensors, send them to the same worker and perform sum on the remote tensors
x = th.tensor([1,2,3,4]).send(bob)
y = th.tensor([2,2,2,2]).send(bob)

In [141]:
x

(Wrapper)>[PointerTensor | me:80234279436 -> bob:80378129223]

In [142]:
y

(Wrapper)>[PointerTensor | me:43881539793 -> bob:33542592074]

In [143]:
# sum remote tensor creates another pointer to the remote sum tensor
z = x + y
z

(Wrapper)>[PointerTensor | me:37651134916 -> bob:76627089940]

In [144]:
z = z.get()
z

tensor([3, 4, 5, 6])

In [145]:
z = th.add(x,y)
z

(Wrapper)>[PointerTensor | me:41282028287 -> bob:82269945241]

In [146]:
z = z.get()
z

tensor([3, 4, 5, 6])

In [0]:
x = th.tensor([1.,2,3,4], requires_grad=True).send(bob)
y = th.tensor([2.,2,2,2], requires_grad=True).send(bob)

In [148]:
z = (x + y).sum()
z

(Wrapper)>[PointerTensor | me:40157947106 -> bob:67497578024]

In [149]:
z.backward()

(Wrapper)>[PointerTensor | me:30651849921 -> bob:51706651522]

In [150]:
x = x.get()
x

tensor([1., 2., 3., 4.], requires_grad=True)

In [151]:
x.grad

tensor([1., 1., 1., 1.])

#Project: Learn a Simple Linear Model
Create a simple linear model which will train on simple dataset belowusing only Variables and .backward() to do so (no optimizers or nn.Modules). Both data and model shoul be located on Workers's machine.

In [0]:
input = th.tensor([[1.,1],[0,1,],[1,0],[0,0]], requires_grad=True).send(bob)
target = th.tensor([[1.],[1],[0],[0]], requires_grad=True).send(bob)

In [153]:
input

(Wrapper)>[PointerTensor | me:95531835635 -> bob:25990831963]

In [154]:
target

(Wrapper)>[PointerTensor | me:81329093958 -> bob:11885460063]

In [0]:
weights = th.tensor([[0.],[0.]], requires_grad=True).send(bob)

In [156]:
weights

(Wrapper)>[PointerTensor | me:45902744420 -> bob:44312842320]

In [157]:
for i in range(10):

    pred = input.mm(weights)

    loss = ((pred - target)**2).sum()

    loss.backward()

    weights.data.sub_(weights.grad * 0.1)
    weights.grad *= 0

    print(loss.get().data)

tensor(2.)
tensor(0.5600)
tensor(0.2432)
tensor(0.1372)
tensor(0.0849)
tensor(0.0538)
tensor(0.0344)
tensor(0.0220)
tensor(0.0141)
tensor(0.0090)


#Lesson: Garbage Collection and Common Errors

In [0]:
bob = bob.clear_objects()

In [159]:
bob._objects

{}

In [0]:
x = th.tensor([1,2,3,4,5]).send(bob)

In [161]:
bob._objects

{}

In [0]:
del x

In [163]:
bob._objects

{}

In [0]:
x = th.tensor([1,2,3,4,5]).send(bob)

In [165]:
bob._objects

{}

In [0]:
x = "asdf"

In [167]:
bob._objects

{}

In [0]:
x = th.tensor([1,2,3,4,5]).send(bob)

In [169]:
x

(Wrapper)>[PointerTensor | me:48704028804 -> bob:18087747578]

In [170]:
bob._objects

{}

In [0]:
x = "asdf"

In [172]:
bob._objects

{}

In [0]:
del x

In [174]:
bob._objects

{}

In [175]:
bob = bob.clear_objects()
bob._objects

{}

In [0]:
for i in range(1000):
    x = th.tensor([1,2,3,4,5]).send(bob)

In [177]:
bob._objects

{}

In [0]:
x = th.tensor([1,2,3,4,5]).send(bob)
y = th.tensor([1,1,1,1,1])

In [204]:
# this will not work because tensors are located in different machines
z = x + y

TensorsNotCollocatedException: ignored

In [0]:
x = th.tensor([1,2,3,4,5]).send(bob)
y = th.tensor([1,1,1,1,1]).send(alice)

In [203]:
# this too will not work because tensors are sent to different workers
x + y

TensorsNotCollocatedException: ignored