### Trade Demo

#### Goal: 
- Load the trade data for the country `Canada`
- Launch a domain node for canada
- Login into the domain node
- Format the `Canada` trade dataset and convert to Numpy array
- Convert the dataset to a private tensor
- Upload `Canada's` trade on the domain node
- Create a Data Scientist User

In [1]:
%load_ext autoreload
%autoreload 2

import pandas as pd

canada = pd.read_csv("../../trade_demo/datasets/ca - feb 2021.csv")

  exec(code_obj, self.user_global_ns, self.user_ns)


### Step 1: Load the dataset

We have trade data for the country, which has provided data from Feb 2021. They key colums are:

- Commodity Code: the official code of that type of good
- Reporter: the country claiming the import/export value
- Partner: the country being claimed about
- Trade Flow: the direction of the goods being reported about (imports, exports, etc)
- Trade Value (US$): the declared USD value of the good

Let's have a quick look at the top five rows of the dataset.

In [2]:
canada.head()

Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Partner,Partner ISO,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Netweight (kg),Trade Value (US$),Flag
0,HS,2021,202102,February 2021,4,0,1,Imports,124,Canada,...,"Other Asia, nes",,6117,"Clothing accessories; made up, knitted or croc...",0,,,,9285,0
1,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,18,Cocoa and cocoa preparations,0,,,0.0,116604,0
2,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,United Kingdom,,18,Cocoa and cocoa preparations,0,,,0.0,1495175,0
3,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,United Rep. of Tanzania,,18,Cocoa and cocoa preparations,0,,,0.0,2248,0
4,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Singapore,,18,Cocoa and cocoa preparations,0,,,0.0,47840,0


### Step 2: Spin up the Domain Node (if you haven't already)

SKIP THIS STEP IF YOU'VE ALREADY RUN IT!!!

As the main requirement of this demo is to perform analysis on the Canada's trade dataset. So, we need to spin up a domain node for Canada.

Assuming you have [Docker](https://www.docker.com/) installed and configured with >=8GB of RAM, navigate to PySyft/packages/hagrid and run the following commands in separate terminals (can be done at the same time):


```bash
# install hagrid cli tool
pip install -e .
```

```bash
hagrid launch Canada domain
```

<div class="alert alert-block alert-info">
    <b>Quick Tip:</b> Don't run this now, but later when you want to stop these nodes, you can simply run the same argument with the "stop" command. So from the PySyft/grid directory you would run. Note that these commands will delete the database by default. Add the flag "--keep_db=True" to keep the database around. Also note that simply killing the thread created by ./start is often insufficient to actually stop all nodes. Run the ./stop script instead. To stop the nodes listed above (and delete their databases) run:

```bash
hagrid land Canada
```
</div>

### Step 3: Login into the Domain as the Admin User

In [46]:
import syft as sy

# Let's login into the domain node
domain_node = sy.login(email="info@openmined.org", password="changethis", port=8081)

Connecting to http://localhost:8081... done! 	 Logging into adp... done!


In [4]:
canada.head()

Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Partner,Partner ISO,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Netweight (kg),Trade Value (US$),Flag
0,HS,2021,202102,February 2021,4,0,1,Imports,124,Canada,...,"Other Asia, nes",,6117,"Clothing accessories; made up, knitted or croc...",0,,,,9285,0
1,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,18,Cocoa and cocoa preparations,0,,,0.0,116604,0
2,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,United Kingdom,,18,Cocoa and cocoa preparations,0,,,0.0,1495175,0
3,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,United Rep. of Tanzania,,18,Cocoa and cocoa preparations,0,,,0.0,2248,0
4,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Singapore,,18,Cocoa and cocoa preparations,0,,,0.0,47840,0


In [5]:
canada[canada["Partner"] == "Egypt"]

Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Partner,Partner ISO,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Netweight (kg),Trade Value (US$),Flag
1,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,18,Cocoa and cocoa preparations,0,,,0.0,116604,0
411,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,30,Pharmaceutical products,0,,,,972862,0
440,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,33,"Essential oils and resinoids; perfumery, cosme...",0,,,,66552,0
499,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,62,Apparel and clothing accessories; not knitted ...,0,,,,462646,0
668,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,40,Rubber and articles thereof,0,,,,95,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
222429,HS,2021,202102,February 2021,4,0,2,Exports,124,Canada,...,Egypt,,8462,Machine-tools; (including presses) for working...,0,,,1727.0,20096,0
222601,HS,2021,202102,February 2021,6,1,2,Exports,124,Canada,...,Egypt,,841440,Compressors; air compressors mounted on a whee...,0,,,132.0,1575,0
222668,HS,2021,202102,February 2021,6,1,2,Exports,124,Canada,...,Egypt,,842940,Tamping machines and road rollers; self-propelled,0,,,2928.0,19001,0
222796,HS,2021,202102,February 2021,6,1,2,Exports,124,Canada,...,Egypt,,846229,"Machine-tools; bending, folding, straightening...",0,,,1727.0,20096,0


In [6]:
# For, simplicity we will upload the first 10000 rows of the dataset.
canada = canada[:10000]

### Step 4: Format dataset and convert to numpy array

In [7]:
# In order to the convert the whole dataset into an numpy array,
# We need to format string to integer values.

In [9]:
# Let's create a function that converts string to int.

import hashlib
from math import isnan, nan

hash_db = {}
hash_db[nan] = nan


def convert_string(s: str, digits: int = 15):
    """Maps a string to a unique hash using SHA, converts it to a hash or an int"""
    if type(s) is str:
        new_hash = int(hashlib.sha256(s.encode("utf-8")).hexdigest(), 16) % 10 ** digits
        hash_db[s] = new_hash
        return new_hash
    else:
        return s

In [9]:
# Let's filter out the string/object type columns
string_cols = []
for col, dtype in canada.dtypes.items():
    if dtype in ['object', 'str']:
        string_cols.append(col)

# Convert string values to integer
for col in canada.columns:
    canada[col] = canada[col].map(lambda x: convert_string(x))

In [10]:
# Let's checkout the formatted dataset
canada.head()

Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Partner,Partner ISO,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Netweight (kg),Trade Value (US$),Flag
0,109781654799833,2021,202102,618044004745978,4,0,1,740486968595500,124,524003986429176,...,251096594821383,,20063876956541,660262300000000.0,0,,,,9285,0
1,109781654799833,2021,202102,618044004745978,2,0,1,740486968595500,124,524003986429176,...,382796968671830,,897295160791370,428856000000000.0,0,,,0.0,116604,0
2,109781654799833,2021,202102,618044004745978,2,0,1,740486968595500,124,524003986429176,...,37977140198169,,897295160791370,428856000000000.0,0,,,0.0,1495175,0
3,109781654799833,2021,202102,618044004745978,2,0,1,740486968595500,124,524003986429176,...,711459702489058,,897295160791370,428856000000000.0,0,,,0.0,2248,0
4,109781654799833,2021,202102,618044004745978,2,0,1,740486968595500,124,524003986429176,...,718934534792483,,897295160791370,428856000000000.0,0,,,0.0,47840,0


In [11]:
# Great !!! now let's convert the whole dataset to numpy array.
np_dataset = canada.values

# Type cast to float values to prevent overflow
np_dataset = np_dataset.astype(float) 

### Step 5: Converting the dataset to private tensors

In [12]:
from syft.core.adp.entity import Entity

In [14]:
# The 'Partner' column i.e the countries to which the data is exported
# is private, therefore let's create entities for each of the partner defined

entities = [Entity(name=partner) for partner in canada["Partner"]]

In [15]:
# Let's convert the whole dataset to a private tensor

private_dataset_tensor = sy.Tensor(np_dataset).private(0.01, 1e15, entity=Entity(name="Canada")).tag("private_canada_trade_dataset")

In [15]:
private_dataset_tensor[:, 0]

Tensor(child=SingleEntityPhiTensor(entity=Canada, child=[1.09781655e+14 1.09781655e+14 1.09781655e+14 ... 1.09781655e+14
 1.09781655e+14 1.09781655e+14]))

### Step 6: Upload Canada's trade data on the domain

In [None]:
# Awesome, now let's upload the dataset to the domain.
# For, simplicity we will upload the first 10000 rows of the dataset.

domain_node.load_dataset(
    assets={"feb2020": private_dataset_tensor},
    name="Canada Trade Data - First 10000 rows",
    description="""A collection of reports from Canada's statistics 
                    bureau about how much it thinks it imports and exports from other countries.""",
)

TypeError: __init__() got an unexpected keyword argument 'budget'

In [20]:
private_dataset_tensor.send(domain_node)

<syft.proxy.syft.core.tensor.tensor.TensorPointer at 0x144f08910>

Cool !!! The dataset was successfully uploaded onto the domain.

In [17]:
# Now, let's check datasets available on the domain.
domain_node.store.pandas

### Step 7: Create a Data Scientist User

Open http://localhost:8081, login is the root user (username: info@openmined.org, password:changethis), and create a user with the following attributes:

- Name: Sheldon Cooper
- Email: sheldon@caltech.edu
- Password: bazinga

In [22]:
# Alternatively, you can create the same from the notebook itself.
domain_node.users.create(
    **{
        "name": "Sheldon Cooper",
        "email": "sheldon@caltech.edu",
        "password": "bazinga",
    },
)

[2021-08-12T14:22:58.514201-0400][CRITICAL][logger]][44906] UnknownPrivateException has been triggered.


TypeError: __init__() got an unexpected keyword argument 'budget'

In [23]:
domain_node.users.create(
    **{
        "name": "Leonard Hodfstadder",
        "email": "leonard@caltech.edu",
        "password": "penny",
    },
)

```
Great !!! We were successfully able to create a new user.
Now, let's move to the Data Scientist notebook, to check out their experience.
```

### Step 8: Decline request to download entire datsaet

In [47]:
# Let's check if there are any requests pending for approval.
domain_node.requests.pandas

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,"[#feb2020, __getitem__, __eq__, __len__]",,<UID: ec007686de6c4fc5831d216218f48118>,<UID: bb7ecac9e59f42ba969257aecbe2a378>,
1,"[#feb2020, __getitem__, __eq__, __len__]",,<UID: 2cf2fb86e6bf46569b1b166225d4f4fe>,<UID: 99055c3a28664ed58a30c77210cf282b>,
2,"[#feb2020, __getitem__, __eq__, __len__]",,<UID: 7f5bbd5c3601474bb25dbbf4676a1a3b>,<UID: 895899fc2810474484ae36e1a9069a1e>,
3,"[#feb2020, __getitem__, __eq__, __len__]",,<UID: c0372566818c47f8ad44b92199ada2fe>,<UID: e27c3b097123450394f4b0c824fe6e0e>,
4,[],Access whole dataset,<UID: 342fe9bbcd534fa792834cf5168d4d4c>,<UID: e83a115eed2a412f9a6cab99eab4f167>,<class 'syft.lib.python.Float'>
5,"[#feb2020, __getitem__, __eq__, __eq__]",,<UID: 34eabbdfaccf4696923a836afe060127>,<UID: 6626b43af56445729d2f03adc62a6161>,
6,[#feb2020],Access whole dataset,<UID: d9b32cacd7f84807ae2def49040e379e>,<UID: e8b9b611a06b40b9bb2e590f4dac56f1>,<class 'syft.core.tensor.tensor.Tensor'>


In [43]:
domain_node.requests[-1].accept()

In [None]:
# Looks like the DS wants to download the whole dataset. We cannot allow that.
# Let's select and deny this request.
domain_node.requests[0].deny()

### STOP: Return to Data Scientist - Canada.ipynb - STEP 3!!