# Lesson: Introducing Federated Learning

Federated Learning is a technique for training Deep Learning models on data to which you do not have access. Basically:

Federated Learning: Instead of bringing all the data to one machine and training a model, we bring the model to the data, train it locally, and merely upload "model updates" to a central server.

Use Cases:

    - app company (Texting prediction app)
    - predictive maintenance (automobiles / industrial engines)
    - wearable medical devices
    - ad blockers / autotomplete in browsers (Firefox/Brave)
    
Challenge Description: data is distributed amongst sources but we cannot aggregated it because of:

    - privacy concerns: legal, user discomfort, competitive dynamics
    - engineering: the bandwidth/storage requirements of aggregating the larger dataset

# Lesson: Introducing / Installing PySyft

In order to perform Federated Learning, we need to be able to use Deep Learning techniques on remote machines. This will require a new set of tools. Specifically, we will use an extensin of PyTorch called PySyft.

### Install PySyft

The easiest way to install the required libraries is with [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/overview.html). Create a new environment, then install the dependencies in that environment. In your terminal:

```bash
conda create -n pysyft python=3
conda activate pysyft # some older version of conda require "source activate pysyft" instead.
conda install jupyter notebook
pip install syft
pip install numpy
```

If you have any errors relating to zstd - run the following (if everything above installed fine then skip this step):

```
pip install --upgrade --force-reinstall zstd
```

and then retry installing syft (pip install syft).

If you are using Windows, I suggest installing [Anaconda and using the Anaconda Prompt](https://docs.anaconda.com/anaconda/user-guide/getting-started/) to work from the command line. 

With this environment activated and in the repo directory, launch Jupyter Notebook:

```bash
jupyter notebook
```

and re-open this notebook on the new Jupyter server.

If any part of this doesn't work for you (or any of the tests fail) - first check the [README](https://github.com/OpenMined/PySyft.git) for installation help and then open a Github Issue or ping the #beginner channel in our slack! [slack.openmined.org](http://slack.openmined.org/)

In [1]:
import torch as t

In [57]:
x = t.tensor([1,2,3,4,5])
x

tensor([1, 2, 3, 4, 5])

In [5]:
y = x + x
y

tensor([ 2,  4,  6,  8, 10])

In [8]:
import syft as sy

In [9]:
hook = sy.TorchHook(t) # Behind the scenes it actually modified a bunch of PyTorch API



# Lesson: Basic Remote Execution in PySyft

## PySyft => Remote PyTorch

The essence of Federated Learning is the ability to train models in parallel on a wide number of machines. Thus, we need the ability to tell remote machines to execute the operations required for Deep Learning.

Thus, instead of using Torch tensors - we're now going to work with **pointers** to tensors. Let me show you what I mean. First, let's create a "pretend" machine owned by a "pretend" person - we'll call him Bob.

In [10]:
bob = sy.VirtualWorker(hook, id="bob")

In [11]:
bob._objects

{}

In [12]:
x = t.tensor([1,2,3,4,5])

In [58]:
x = x.send(bob)#sending data to bob 

In [14]:
bob._objects

{21992865125: tensor([1, 2, 3, 4, 5])}

But what is returned when we sent it
> ANSWER: Pointer to the remote Object<br>

Pointer is a kind of tensor and has full tensor API executing commands like a normal tensor would<br>
Each command is serialised to a simple JSON or tuple format sent to Bob(worker) and then bob executes it on our behalf and returns to us a new pointer to the object

In [16]:
x.location #x is pointing to bob

<VirtualWorker id:bob #objects:1>

In [19]:
x.location == bob

True

In [21]:
x.id_at_location #x has an id at location

21992865125

In [23]:
x.id #x has id as well

96473216663

In [24]:
x.owner

<VirtualWorker id:me #objects:0>

In [25]:
hook.local_worker

<VirtualWorker id:me #objects:0>

In [26]:
x

(Wrapper)>[PointerTensor | me:96473216663 -> bob:21992865125]

In [59]:
x = x.get()#To get information back from bob
x 

tensor([1, 2, 3, 4, 5])

In [67]:
bob._objects#its again empty

{}

As u can see this allows us to do everything that pytorch can normallyy do but we can execute it arbitrary remote machines

# Project: Playing with Remote Tensors

In this project, I want you to .send() and .get() a tensor to TWO workers by calling .send(bob,alice). This will first require the creation of another VirtualWorker called alice.

In [68]:
alice = sy.VirtualWorker(hook, id="alice")

In [76]:
x = t.tensor([1,2,3,4,5])

In [77]:
x_ptr = x.send(bob, alice)

In [78]:
x_ptr

(Wrapper)>[MultiPointerTensor]
	-> (Wrapper)>[PointerTensor | me:26935489499 -> bob:53596775586]
	-> (Wrapper)>[PointerTensor | me:99135673241 -> alice:27803641852]

MultiPointer is a pointer that points to multiple machines<br>
It has an object called chile, which is a dictionary of various workers

In [79]:
x_ptr.child.child

{'bob': (Wrapper)>[PointerTensor | me:26935489499 -> bob:53596775586],
 'alice': (Wrapper)>[PointerTensor | me:99135673241 -> alice:27803641852]}

In [80]:
x_ptr.get() #obviously u'll get 2 back

[tensor([1, 2, 3, 4, 5]), tensor([1, 2, 3, 4, 5])]

In [81]:
x = t.tensor([1,2,3,4,5]).send(bob, alice)

In [82]:
x.get(sum_results=True)

tensor([ 2,  4,  6,  8, 10])