In [4]:
%%capture

# Step 1: Clone PySyft Library
! git clone --single-branch --branch 'syft_0.3.0' https://github.com/OpenMined/PySyft.git

# NOTICE: Part 2 runs installation calls specific to Colab. 
# Don't run this code if you're running PySyft locally. Just
# pull down 

# Step 2: Setup Colab Environment
! cd PySyft && ./scripts/colab.sh

import sys
sys.path.append("/content/PySyft/src") # prevents needing restart

# Part 1: Connect to a Remote Duet Server

As the data scientist, we want to perform data science on data that is sitting in the Data Owner's Duet server (in their Notebook).

In order to do this, we must run the code that the Data Owner sends us, which importantly includes the ID of their duet session. This will create a direct connection from my notebook to the remote Duet server.

I will run their code below and follow the instructions it gives.

In [5]:
import syft as sy
sy.VERBOSE=False
duet = sy.join_duet('6e2fd3c1b35a9d433af3802cdc16227e')

🎤  🎸  ♪♪♪ joining duet ♫♫♫  🎻  🎹

♫♫♫ >[93m DISCLAIMER[0m:[1m Duet is an experimental feature currently 
♫♫♫ > in alpha. Do not use this to protect real-world data.
[0m♫♫♫ >
♫♫♫ > Punching through firewall to OpenGrid Network Node at network_url: 
♫♫♫ > http://ec2-18-216-8-163.us-east-2.compute.amazonaws.com:5000
♫♫♫ >
♫♫♫ > ...waiting for response from OpenGrid Network... [92mDONE![0m

♫♫♫ > Duet Client ID: [1mab888d6fd4d7bf65b7de648c30798818[0m

♫♫♫ > [95mSTEP 1:[0m Send the Duet Client ID to your duet partner!

♫♫♫ > ...waiting for partner to connect...
♫♫♫ > ...using a running event loop...

♫♫♫ > [92mCONNECTED![0m


# Part 2: Search for Available Data

As a data scientist, I want to answer questions using the data owner's data. In order to do this, I must first search for data relevant to my problem.

For now, since notebooks aren't designed to have terribly large amounts of data, we can look for the data we're interested by just printing the table of available within the duet store.

In [8]:
duet.store.pandas

Unnamed: 0,ID,Tags,Description
0,<UID:6d753d36-ee31-4c31-a689-c29ce1160267>,[#age_data],This is a list of people's ages. Let's keep it...


In [13]:
# Looks like there's some intereting age data. Let's get a pointer to it!

age_data_ptr = duet.store[0]

age_data_ptr

<syft.proxy.torch.TensorPointer at 0x7f89422f7dd8>

In [14]:
# Now I have a reference to a remote dataset!

# Part 3: Perform Analysis with PyTorch

Now we can perform analysis using data that is in the Data Owner's duet server! 

Let's use the age data to calculate the average age in the dataset!

In [15]:
average_age = age_data_ptr.mean()

In [16]:
# And now let's try to download our result!

average_age.get()

UnknownPrivateException: ignored

In [17]:
average_age = age_data_ptr.float().mean()

In [18]:
average_age.get()

AuthorizationException: ignored

In [20]:
average_age.request(name="Age Request",
                    reason="I am a data scientist and I want to know the average age of the people in your dataset.")

Request Message Id:<UID:362b91ed-a175-452e-a3e9-f12978338872>


# Part 4: Wait for Data Owner to Approve

Now we wait for the data owner to approve our request!

We can check the status of our request by running the following code

In [21]:
duet.requests.pandas

In [22]:
# Once the request disappears - we can download our results!
average_age.get()

tensor(36.)