# Tutorial: Private Deep Learning

Welcome to PySyft's introductory tutorial for privacy preserving deep learning. This series of notebooks is a step-by-step guide for you to get to know the new tools and techniques required for doing deep learning on secret data. 

## Why Take This Tutorial?

**1) A Competitive Career Advantage** - For the past 20 years, the digital revolution has made data more and more accessible in ever larger quantities as analog processes have become digitized. However, with new regulation such as [GDPR](https://eugdpr.org/), enterprises are under pressure to have less freedom with how they use - and more importantly how they analyze - personal information. **Bottom Line:** Data Scientists aren't going to have access to as much data with "old school" tools, but by learning the tools of Private Deep Learning, YOU can be ahead of this curve and have a competitive advantage in you career. 

**2) Entrepreneurial Opportunities** - There are a whole host of problems in society that Deep Learning can solve, but many of the most important haven't been explored because it would require access to incredibly sensitive information about people (consider using Deep Learning to help people with mental or relationship issues!). Thus, learning Private Deep Learning unlocks a whole host of new startup opportunities for you which were not previously available to others without these toolsets.

**3) Social Good** - Deep Learning can be used to solve a wide variety of problems in the real world, but Deep Learning on *personal information* is Deep Learning about people, *for people*. Learning how to do Deep Learning on data you don't own represents more than a career or entrepreneurial opportunity, it is the opportunity to help solve some of the most personal and important problems in people's lives - and to do it at scale.


## What do I need to know?

**1) Deep Learning with PyTorch** - if you don't just hop on over to [fast.ai](https://fast.ai), learn Deep Learning and PyTorch, and come back here.

## How do I get extra credit?

- Star PySyft on Github! - [https://github.com/OpenMined/PySyft](https://github.com/OpenMined/PySyft)
- Make a Youtube video teaching this notebook!


... ok ... let's do this!

# Part 0: Setup

To begin, you'll need to make sure you have the right things installed. To do so, head on over to PySyft's readme and follow the setup instructions. TLDR for most folks is.

- Install Python 3.5 or higher
- Install PyTorch 0.3.1 (it MUST be this version)
- Clone PySyft (git clone https://github.com/OpenMined/PySyft.git)
- cd PySyft
- pip install -r requirements.txt
- python setup.py install
- python setup.py test

If any part of this doesn't work for you (or any of the tests fail) - first check the [README](https://github.com/OpenMined/PySyft.git) for installation help and then open a Github Issue or ping the #beginner channel in our slack! [slack.openmined.org](http://slack.openmined.org/)

In [1]:
# Run this cell to see if things work
import syft as sy
hook = sy.TorchHook() # always run this when you import syft

sy.FloatTensor([1,2,3,4,5])


 1
 2
 3
 4
 5
[syft.core.frameworks.torch.tensor.FloatTensor of size 5]

If this cell executed, then you're off to the races! Let's do this!

# Part 1: The Basic Tools of Private Data Science

So - the first question you may be wondering is - How in the world do we train a model on data we don't have access to? 

Well, the answer is surprisingly simple. If you're used to working in PyTorch, then you're used to working with torch.Tensor objects like these!

In [2]:
import torch
x = torch.FloatTensor([1,2,3,4,5])
y = x + x
print(y[0])

2.0


Obviously, using these super fancy (and powerful!) tensors is important, but also requires you to have the data on your local machine. This is where our journey begins. 

# Section 1.1 - Sending Tensors to Bob's Machine

Instead of Tensors - we're now going to work with **pointers** to tensors. Let me show you what I mean. First, let's create a "pretend" machine owned by a "pretend" person - we'll call him Bob.

In [3]:
bob = sy.VirtualWorker(id="bob")

For all intenstive purposes, Bob's machine is on another planet - perhaps on Mars! But, at the moment the machine is empty. Let's create some data so that we can send it to Bob and learn about pointers!

In [4]:
x = torch.FloatTensor([1,2,3,4,5])
y = torch.FloatTensor([1,1,1,1,1])

And now - let's send our tensors to Bob!!

In [5]:
x_ptr = x.send(bob)
y_ptr = y.send(bob)

In [6]:
x_ptr

FloatTensor[_PointerTensor - id:9588549167 owner:me loc:bob id@loc:22773152532]

BOOM! Now Bob has two tensors! Don't believe me? Have a look for youself!

In [7]:
bob._objects

{22773152532: [_LocalTensor - id:22773152532 owner:bob],
 55142277162: [_LocalTensor - id:55142277162 owner:bob]}

Now notice something. When we print our pointer...

In [8]:
x_ptr

FloatTensor[_PointerTensor - id:9588549167 owner:me loc:bob id@loc:22773152532]

We some useful metadata! First, ID of our tensor is 5648307076, that makes sense... it was allocated a random ID. However, there's also a few other pieces of metadata.


- loc: bob
- id@loc: 69591406818
- owner: me

Hopefully the naming of these attributes is quite intuitive. "loc" is short for "location" and it is a reference to the location that the pointer is pointing to! See?

In [9]:
x_ptr.location

<syft.core.workers.virtual.VirtualWorker id:bob>

In [10]:
bob

<syft.core.workers.virtual.VirtualWorker id:bob>

In [11]:
bob == x_ptr.location

True

The "id@loc" parameter is similar. It tells us the id that the Tensor object on Bob's machine has (the one that we're pointing to). See?

In [12]:
x_ptr.id_at_location

22773152532

In [13]:
bob._objects[x_ptr.id_at_location].parent


 1
 2
 3
 4
 5
[syft.core.frameworks.torch.tensor.FloatTensor of size 5]

And finally - we have the third attribute "owner: me" which is very similar to ".location". However, instead of specifying where the pointer is pointing, it specifies the owner of the pointer itself, which is me. Fun fact, just like we had a VirtualWorker object for Bob, we (by default) always have one for us as well. This worker is automatically created when we "import syft"

In [14]:
me = sy.local_worker
me

<syft.core.workers.virtual.VirtualWorker id:me>

In [15]:
x_ptr.owner

<syft.core.workers.virtual.VirtualWorker id:me>

In [16]:
me == x_ptr.owner

True

And finally, just like we can call .send() on a tensor, we can call .get() on a pointer to a tensor to get it back!!!

In [17]:
x_ptr

FloatTensor[_PointerTensor - id:9588549167 owner:me loc:bob id@loc:22773152532]

In [18]:
x_ptr.get()


 1
 2
 3
 4
 5
[syft.core.frameworks.torch.tensor.FloatTensor of size 5]

In [19]:
y_ptr

FloatTensor[_PointerTensor - id:6250808360 owner:me loc:bob id@loc:55142277162]

In [20]:
y_ptr.get()


 1
 1
 1
 1
 1
[syft.core.frameworks.torch.tensor.FloatTensor of size 5]

In [21]:
bob._objects

{}

And as you can see... Bob no longer has the tensors anymore!!! They've moved back to our machine!

# Section 1.2 - Using Tensor Pointers

So, sending and receiving tensors from Bob is great, but this is hardly Deep Learning! We want to be able to perform tensor _operations_ on remote tensors. Fortunately, tensor pointers make this quite easy!! You can just use poiners like you would normal tensors!

In [22]:
x = sy.FloatTensor([1,2,3,4,5]).send(bob)
y = sy.FloatTensor([1,1,1,1,1]).send(bob)

In [23]:
z = x + y

In [24]:
z

FloatTensor[_PointerTensor - id:7444257226 owner:me loc:bob id@loc:8176688069]

And voila! As you can see, the operation executed on bob's machine and then returned a pointer to the new result back to us! If we call .get() on the pointer, we will then receive the result back to our machine!

In [25]:
z.get()


 2
 3
 4
 5
 6
[syft.core.frameworks.torch.tensor.FloatTensor of size 5]

### Torch Functions

This API has been extended to all of Torch's operations!!!

In [26]:
z = torch.add(x,y)
z

FloatTensor[_PointerTensor - id:8604781857 owner:me loc:bob id@loc:1040860252]

In [27]:
z.get()


 2
 3
 4
 5
 6
[syft.core.frameworks.torch.tensor.FloatTensor of size 5]

### Variables (including backpropagation!)

In [28]:
x = sy.Variable(sy.FloatTensor([1,2,3,4,5]), requires_grad=True).send(bob)
y = sy.Variable(sy.FloatTensor([1,1,1,1,1]), requires_grad=True).send(bob)

In [29]:
z = (x + y).sum()

In [30]:
z.backward()

In [31]:
x = x.get()

In [32]:
x

Variable containing:
 1
 2
 3
 4
 5
[syft.core.frameworks.torch.tensor.FloatTensor of size 5]

In [33]:
x.grad

Variable containing:
 1
 1
 1
 1
 1
[syft.core.frameworks.torch.tensor.FloatTensor of size 5]

So as you can see, the API is really quite flexible and capable of performing nearly any operation you would normaly perform in Torch on *remote data*. This lays the groundwork for our more advanced privacy preserving protocols such as Federated Learing, Secure Multi-Party Computation, and Differential Privacy

# Bonus Section - Common Errors

If you try to do an operation between two tensors which aren't on the same machine, you'll get an error that looks like this!!!

In [34]:
x = sy.FloatTensor([1,2,3,4,5]).send(bob)
y = sy.FloatTensor([1,1,1,1,1])

In [29]:
z = y + x

Exception: All tensors must be on the same machine. 

You just tried to call an operation between a tensor on your local machine and a tensor on remote worker 'bob'. Call .send("bob") on the local tensor (the one with ID:1729512448) or .get() on the remote tensor (the one with ID:3097104970)

Or, alterantively, if you try to interact with pointers to tensors on a worker which no longer exist, you'll see an error like this!!!

In [30]:
x = sy.FloatTensor([1,2,3,4,5]).send(bob)

# delete all objects on bob
bob._objects = {}

y = x + x

Exception: Tensor "26632801091" not found on worker "bob"!!!

You just tried to interact with an object ID:26632801091 on worker bob which does not exist!!! Use .send() and .get() on all your tensors to make sure they're on the same machines.

If you think this tensor does exist, check the ._objects dictionary on the worker and see for yourself!!! The most common reason this error happens is because someone calls .get() on the object's pointer without realizing it (which deletes the remote object and sends it to the pointer). Check your code to make sure you haven't already called .get() on this pointer!!!