# Trade Demo

**Summary:** In this demo, a data scientist wants to be able to determine that the amount of goods exported from a handful of nations (usa, canada, netherlands, united kingdom, and italy) matches the amount of goods those nations claim to have imported from each other. We want to return a list of commodities where the ratio of expected imports / exports is off by 10% or more. Importantly, the data scientist should be able to do this:

- without requiring any nation to disclose to anyone the amount of any particular good they have imported or exported (unless they're in violation)
- without needing a data compliance officer to manually accept any .get() requests.

In [1]:
import pandas as pd
schema = pd.read_csv('datasets/schema.csv')

canada = pd.read_csv('datasets/ca - feb 2021.csv')
italy =  pd.read_csv('datasets/it - feb 2021.csv')
netherlands =  pd.read_csv('datasets/nl - feb 2021.csv')
united_states = pd.read_csv('datasets/us - feb 2021.csv')

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


## Step 1: Load The Dataset

We have trade data from 4 countries, all of which have provided data from Feb 2021. They key colums are:

- Commodity Code: the official code of that type of good
- Reporter: the country claiming the import/export value
- Partner: the country being claimed about
- Trade Flow: the direction of the goods being reported about (imports, exports, etc)
- Trade Value (US$): the declared USD value of the good

So if we considered the following row, it specifies that Canada thinks that it imports $1,955,175 USD of "Cocoa and cocoa preparations" from the United Kingdom.

In [2]:
canada.head()

Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Partner,Partner ISO,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Netweight (kg),Trade Value (US$),Flag
0,HS,2021,202102,February 2021,4,0,1,Imports,124,Canada,...,"Other Asia, nes",,6117,"Clothing accessories; made up, knitted or croc...",0,,,,9285,0
1,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,18,Cocoa and cocoa preparations,0,,,0.0,116604,0
2,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,United Kingdom,,18,Cocoa and cocoa preparations,0,,,0.0,1495175,0
3,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,United Rep. of Tanzania,,18,Cocoa and cocoa preparations,0,,,0.0,2248,0
4,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Singapore,,18,Cocoa and cocoa preparations,0,,,0.0,47840,0


## Step 2: Spin Up Domain Nodes

As the main requirement of this demo is that none of these countries should be sharing their data with anyone else, each country will get their own domain node to hold/protect their data while it's under study. So, we need to spin up 4 domain nodes.

Assuming you have [Docker](https://www.docker.com/) installed and configured with >=8GB of RAM, navigate to PySyft/packages/hagrid and run the following commands in separate terminals (can be done at the same time):


```bash
# install hagrid cli tool
pip install -r requirements.txt
pip install -e .
```

```bash
hagrid launch Canada --port=8081
```
```bash
hagrid launch United States --port=8082
```
```bash
hagrid launch Italy --port=8083
```
```bash
hagrid launch Netherlands --port=8084
```


Additionally we'll need to setup a Network node which will help us interact with the remote data

```bash
hagrid launch United Nations --port 8085 --type network
```

<div class="alert alert-block alert-info">
    <b>Quick Tip:</b> Don't run this now, but later when you want to stop these nodes, you can simply run the same argument with the "stop" command. So from the PySyft/grid directory you would run. Note that these commands will delete the database by default. Add the flag "--keep_db=True" to keep the database around. Also note that simply killing the thread created by ./start is often insufficient to actually stop all nodes. Run the ./stop script instead. To stop the nodes listed above (and delete their databases) run:

```bash
hagrid land Canada
```
```bash
hagrid land United States
```
```bash
hagrid land Italy
```
```bash
hagrid land Netherlands
```
```bash
hagrid land United Nations
```
</div>

## Step 3: Register domain nodes with the United Nations network!

You may if you wish reset the default ADMIN username/password using the user interface. For now, we'll just leave it as is to keep things easy and login using the defaults set by PyGrid (info@openmined.org and changethis).

In [None]:
import syft as sy

# Login: (defaults to localhost if you don't specify a url)
ca = sy.login(email="info@openmined.org", password="changethis", port=8081)
usa = sy.login(email="info@openmined.org", password="changethis", port=8082)
it = sy.login(email="info@openmined.org", password="changethis", port=8083)
ne = sy.login(email="info@openmined.org", password="changethis", port=8084)

un = sy.login(email="info@openmined.org", password="changethis", port=8085)

```
Connecting to Canada... connected!         ...logging in as info@openmined.org... logged in!
Connecting to United States... connected!  ...logging in as info@openmined.org... logged in!
Connecting to Italy... connected!          ...logging in as info@openmined.org... logged in!
Connecting to Netherlands... connected!    ...logging in as info@openmined.org... logged in!
Connecting to United Nations... connected! ...logging in as info@openmined.org... logged in!
```

In [None]:
# STRETCH:
# ca.known_networks #prints a table of all known networks populated from a openmined hosted file (like a github url)

In [3]:
# Each domain needs to download and counter sign a network agreement
# The domain user can download the network agreement as follows:
un.network_agreement

United Nations Network Agreement: https://aws.s3.networkagreement.pdf


In [8]:
# Each domain admin applies to join the Network (so that users can find them!). 
# This could also be done via the user interface! (CC: @Thiago)
# When applying to network, the user will be prompted to upload a counter signed network agreement
# Once, the user uploads the network agreement, the application will be submitted and same will be informed
# to the user.

ca.apply_to_network(network=un, reason="This is Sue Grafton. Per our phone convo, we'd like to join your network.", name="Sue Grafton", email="sue@canada.ca")
usa.apply_to_network(network=un, reason="We were connected before, just need to re-establish with new node", name="John Doe",  email="john@usa.gov")
it.apply_to_network(network=un, reason="We heard great things about the UNGP and would like to participate", name="Suzy Song", email="suzy@it.it")
ne.apply_to_network(network=un, reason="We recently spun up a domain and would like to join the network.", name="Bill Gates", email="bill@ne.ne")

United Nations Network Agreement: https://aws.s3.networkagreement.pdf
Canada, you are required to counter sign and upload the Network Agreement below.


United Nations Network Agreement: https://aws.s3.networkagreement.pdf
USA, you are required to counter sign and upload the Network Agreement below.


United Nations Network Agreement: https://aws.s3.networkagreement.pdf
Italy, you are required to counter sign and upload the Network Agreement below.


United Nations Network Agreement: https://aws.s3.networkagreement.pdf
Netherlands, you are required to counter sign and upload the Network Agreement below.


```
Application submitted from Canada -> United Nations!
You'll get an email (sue@canada.ca) when your application has been processed!

Application submitted from United States -> United Nations!
You'll get an email (john@usa.gov) when your application has been processed!

Application submitted from Italy -> United Nations!
You'll get an email (suzy@it.it) when your application has been processed!

Application submitted from Netherlands -> United Nations!
You'll get an email (bill@ne.ne) when your application has been processed!
```

In [169]:
# United Nations admin checks network affiliation applications upon receiving 4 emails that 4 applications have been received!
# This could also be done via the user interface (CC: @Thiago)

un.subscription_requests

Unnamed: 0,Date,Domain,Status,Response Date,Name,Email,Reason
0,2021-06-29,Canada,PENDING,,Sue Grafton,sue@canada.ca,"This is Sue Grafton. Per our phone convo, we'd..."
1,2021-06-29,United States,PENDING,,John Doe,john@usa.gov,"""We were connected before, just need to re-est..."
2,2021-06-27,Italy,PENDING,,Suzy Song,suzy@it.it,"""We heard great things about the UNGP and woul..."
3,2021-06-23,Netherlands,PENDING,,Bill Gates,bill@ne.ne,"""We recently spun up a domain and would like t..."


In [None]:
# Then the UN accepts all the requests
# This could also be done via the user interface (CC: @Thiago)

un.subscription_requests[0].accept(notify_by_email=True)
un.subscription_requests[1].accept(notify_by_email=True)
un.subscription_requests[2].accept(notify_by_email=True)
un.subscription_requests[3].accept(notify_by_email=True)

```
Accepting request from Canada!        ... sending email notification to sue@canada.ca... sent!
Accepting request from United States! ... sending email notification to john@usa.gov... sent!
Accepting request from Italy!         ... sending email notification to suzy@it.it... sent!
Accepting request from Netherlands!   ... sending email notification to bill@ne.ne... sent!
```

In [165]:
# ... which we can check here ...

un.subscription_requests

Unnamed: 0,Date,Domain,Status,Response Date,Name,Email,Reason
0,2021-06-29,Canada,ACCEPT,2021-06-29,Sue Grafton,sue@canada.ca,"This is Sue Grafton. Per our phone convo, we'd..."
1,2021-06-29,United States,ACCEPT,2021-06-29,John Doe,john@usa.gov,"""We were connected before, just need to re-est..."
2,2021-06-27,Italy,ACCEPT,2021-06-29,Suzy Song,suzy@it.it,"""We heard great things about the UNGP and woul..."
3,2021-06-23,Netherlands,ACCEPT,2021-06-29,Bill Gates,bill@ne.ne,"""We recently spun up a domain and would like t..."


## Step 4: Each domain admin loads in their dataset

In [181]:
# Canada loads data

# Canada's data is a dataframe with >200K rows and 22 columns
canada[0:3]

Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Partner,Partner ISO,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Netweight (kg),Trade Value (US$),Flag
0,HS,2021,202102,February 2021,4,0,1,Imports,124,Canada,...,"Other Asia, nes",,6117,"Clothing accessories; made up, knitted or croc...",0,,,,9285,0
1,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,18,Cocoa and cocoa preparations,0,,,0.0,116604,0
2,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,United Kingdom,,18,Cocoa and cocoa preparations,0,,,0.0,1495175,0


In [None]:
from syft.types import OneHotChar

In [198]:
# for the automatic differential privacy we have to specify some schema information
# for non-numerical data we have to specify how it should be encoded (picking from a few options)
# for numerical data we have to specify a tuple with the possible range
column_schemas = {}
column_schemas['Classification'] = ('private', OneHotChar(encoding='ascii', max_len=5)) # encode each value as 1-hot representation ascii characters
column_schemas['Year'] = ('public', int)
column_schemas['Period'] = ("public", str)
column_schemas['Period Desc.'] = ("public", str)
column_schemas['Aggregate Level'] = ("private", {0, 2, 4, 6}) # fixed set of only these options
column_schemas['Is Leaf Code'] = ("private", bool)
column_schemas['Trade Flow'] = ("private", {"Imports", "Exports", "Re-exports", "Re-imports"})
column_schemas['Reporter Code'] = ("public", int)
column_schemas['Reporter'] = ("public", str)
column_schemas['Reporter ISO'] = ("public", str)
column_schemas['Partner'] = ("public", str)
column_schemas['Commodity'] = ('private', OneHotChar(encoding='ascii', max_len=500))
column_schemas['Trade Value'] = ("private", (0, 3000000000))

# (everything else will be assumed to be public and type will be inferred)

In [None]:
### Aternative way to define schema
from syft.types import (
    OneHotCharType,
    IntegerType,
    FloatType,
    CharType,
    ChoiceType,
    BooleanType,
    StructType,
    StructField,
)

"""
class StructField:
    def __init__(name:str, dtype:Any, private:bool=False, nullable:bool=False, description:str=""):
        pass
"""

column_schemas = StructType(
    [
        StructField(
            "Classification",
            OneHotCharType(encoding="ascii", max_len=5),
            private=True,
            nullable=False,
        ),
        StructField("Year", IntegerType(), False),
        StructField("Period", CharType(), False),
        StructField("Period Desc.", CharType(), False),
        StructField("Aggregate Level", ChoiceType(choices={0, 2, 4, 6}), True),
        StructField("Is Leaf Code", BooleanType(), True),
        StructField(
            "Trade Flow",
            ChoiceType(choices={"Imports", "Exports", "Re-exports", "Re-imports"}),
            True,
        ),
        StructField("Reporter Code", IntegerType(), False),
        StructField("Reporter", CharType(), False),
        StructField("Reporter ISO", CharType(), False),
        StructField("Partner", CharType(), False),
        StructField("Commodity", OneHotCharType(encoding="ascii", max_len=500), True),
        StructField("Trade Value", FloatType(min_val=0, max_val=3000000000), True),
    ]
)

# (everything else will be assumed to be public and type will be inferred)

# dataset_ptr : A way to store the transformations on the dataset and store the final schema and attaching that schema to the dataset_ptr
# dataset_ptr.possbile_schemas.filter() # Attaching the citations

In [207]:
schema[0:3] # descriptions for the columns

Unnamed: 0,Column,Description
0,Classification,Commodity Classification (HS= Harmonized System)
1,Year,4-digit year
2,Period,yyyymm


In [205]:
canada.column_descriptions = schema
canada.sample_data = canada[0:3] # we're approved to release this data in the clear as sample data (we could also hand-generate it)

In [None]:
# this is the column which is used by the differential privacy engine to represent the "individual" whose infomration needs protecting
canada = canada.private(uid_column="Partner", column_schemas=column_schemas)

```
WARNING: when creating private tensor "Trade Value" data was found that is less than the limit specified and will be truncated to 3000000000
```

In [None]:
from syft.utils import Citation

# a few tags for earching (these can be anything and can be searched as key-value or just by string search)
metadata = {"country": "canada", "type": "trade", "origin": "un"}


# describe the data
description = "This dataset represents aggregated trade statistics as reported by Canada about what it believes was imported/exported to/from its country in Feb 2021."

ca.load_dataset(
    assets={"table": canada},
    description=description,
    website="https://unstats.un.org/home/",
    email="sue@canada.ca",
    phone="(901) 326-4464",
    citations=[
        Citation(
            title="",
            url="https://unstats.un.org/home/",
            author={"susan@canada.ca"},
            year=2015,
            journal={"Nature"},
        )
    ]
    ** metadata,
)

## Note: Currently, there doesn't seem to be a good python package for citations.
# We can possibly create a custom citation class, refer to website for citation format: https://github.com/leonoverweel/bibtex-python-package-citations
# We can follow the one provided by Numpy.

# SCRAP CODE FOR DEMO

In [170]:
import pandas as pd

cars = {'Date': [pd.Timestamp("2021-06-29"),pd.Timestamp("2021-06-29"),pd.Timestamp("2021-06-27"),pd.Timestamp("2021-06-23")],
        'Domain': ['Canada','United States','Italy','Netherlands'],
        'Status': ['PENDING', 'PENDING', 'PENDING', 'PENDING'],
        'Response Date': [None, None, None, None],
        "Name": ['Sue Grafton', 'John Doe', 'Suzy Song', 'Bill Gates'],
        "Email": ['sue@canada.ca', 'john@usa.gov', 'suzy@it.it', 'bill@ne.ne'],
        'Reason': ["This is Sue Grafton. Per our phone convo, we'd like to join your network.",
                   '"We were connected before, just need to re-establish with new node"',
                   '"We heard great things about the UNGP and would like to participate"',
                   '"We recently spun up a domain and would like to join the network."'],
        }

df = pd.DataFrame(cars, columns = ['Date','Domain', 'Status', 'Response Date', 'Name', 'Email','Reason'])
df

Unnamed: 0,Date,Domain,Status,Response Date,Name,Email,Reason
0,2021-06-29,Canada,PENDING,,Sue Grafton,sue@canada.ca,"This is Sue Grafton. Per our phone convo, we'd..."
1,2021-06-29,United States,PENDING,,John Doe,john@usa.gov,"""We were connected before, just need to re-est..."
2,2021-06-27,Italy,PENDING,,Suzy Song,suzy@it.it,"""We heard great things about the UNGP and woul..."
3,2021-06-23,Netherlands,PENDING,,Bill Gates,bill@ne.ne,"""We recently spun up a domain and would like t..."


In [120]:
canada[0:5]

Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Partner,Partner ISO,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Netweight (kg),Trade Value (US$),Flag
0,HS,2021,202102,February 2021,4,0,1,Imports,124,Canada,...,"Other Asia, nes",,6117,"Clothing accessories; made up, knitted or croc...",0,,,,9285,0
1,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Egypt,,18,Cocoa and cocoa preparations,0,,,0.0,116604,0
2,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,United Kingdom,,18,Cocoa and cocoa preparations,0,,,0.0,1495175,0
3,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,United Rep. of Tanzania,,18,Cocoa and cocoa preparations,0,,,0.0,2248,0
4,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Singapore,,18,Cocoa and cocoa preparations,0,,,0.0,47840,0


In [30]:
# Non matching example
italy[italy['Commodity Code'] == "18"][italy['Partner'] == "Canada"]
canada[canada['Commodity Code'] == "18"][canada['Partner'] == "Italy"]

  italy[italy['Commodity Code'] == "18"][italy['Partner'] == "Canada"]
  canada[canada['Commodity Code'] == "18"][canada['Partner'] == "Italy"]


Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Partner,Partner ISO,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Netweight (kg),Trade Value (US$),Flag
32291,HS,2021,202102,February 2021,2,0,1,Imports,124,Canada,...,Italy,,18,Cocoa and cocoa preparations,0,,,,1010792,0
55735,HS,2021,202102,February 2021,2,0,2,Exports,124,Canada,...,Italy,,18,Cocoa and cocoa preparations,0,,,0.0,2063200,0


In [85]:
canada[canada['Commodity Code'] == str(551090)][canada['Partner'] == "United States"]

  canada[canada['Commodity Code'] == str(551090)][canada['Partner'] == "United States"]


Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Partner,Partner ISO,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Netweight (kg),Trade Value (US$),Flag


In [99]:
canada_commodities = set(canada[canada['Partner'] == 'United States of America']['Commodity Code'])
usa_commoddities = set(united_states[united_states['Partner'] == 'Canada']['Commodity Code'])

In [111]:
for i in canada_commodities.intersection(usa_commoddities):
    us_imports = united_states[united_states['Commodity Code'] == str(i)][united_states['Partner'] == "Canada"][united_states['Trade Flow'] == "Imports"]
    
    canada_exports = canada[canada['Commodity Code'] == str(i)][canada['Partner'] == 'United States of America'][canada['Trade Flow'] == "Exports"]
    
    if(len(us_imports) > 0 and len(canada_exports) > 0):
        us_thinks_it_imports_from_canada = int(us_imports['Trade Value (US$)'])
        canada_thinks_it_exports_to_us = int(canada_exports['Trade Value (US$)'])        
        print(i, us_thinks_it_imports_from_canada, canada_thinks_it_exports_to_us, "\t" + str(us_imports['Commodity']).replace("\n"," ").split("  ")[2].split("Name")[0])

  us_imports = united_states[united_states['Commodity Code'] == str(i)][united_states['Partner'] == "Canada"][united_states['Trade Flow'] == "Imports"]
  canada_exports = canada[canada['Commodity Code'] == str(i)][canada['Partner'] == 'United States of America'][canada['Trade Flow'] == "Exports"]


84 1660673548 1863126155 	Nuclear reactors, boilers, machinery and mecha... 
41 1046772 1830656 	Raw hides and skins (other than furskins) and ... 
26 65241464 63228517 	Ores, slag and ash 
8540 366408 393892 	Thermionic, cold cathode or photo-cathode valv... 
04 16463987 16482274 	Dairy produce; birds' eggs; natural honey; edi... 
52 1276643 2117212 	Cotton 
06 36414025 37175452 	Trees and other plants, live; bulbs, roots and... 
35 17486000 18238532 	Albuminoidal substances; modified starches; gl... 
84 1660673548 1863126155 	Nuclear reactors, boilers, machinery and mecha... 
50 6389 14027 	Silk 
051110 1787483 1782096 	Animal products; bovine semen 
2918 11562838 1139550 	Acids; carboxylic acid with additional oxygen ... 
53 58049 202663 	Vegetable textile fibres; paper yarn and woven... 
46 192282 3688299 	Manufactures of straw, esparto or other plaiti... 
33 111101126 125346608 	Essential oils and resinoids; perfumery, cosme... 
32 78485382 84396873 	Tanning or dyeing extracts; ta

KeyboardInterrupt: 

In [None]:
# canada[canada['Commodity Code'] == "18"][canada['Partner'] == "United States"]

In [None]:
## Network Agreement

#print("United Nations Network Agreement: https://aws.s3.networkagreement.pdf")

### Creating Widget / Buttons

In [3]:
# https://github.com/peteut/ipython-file-upload
# For reference: https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html

import io
from IPython.display import display, HTML
import fileupload

def _upload(label="Browse"):

    _upload_widget = fileupload.FileUploadWidget(label=label)

    def _cb(change):
        # TODO: Write code to upload the document to s3 or store in syft server
        decoded = io.StringIO(change['owner'].data.decode('utf-8'))
        filename = change['owner'].filename
        print('Uploaded `{}` ({:.2f} kB)'.format(
            filename, len(decoded.read()) / 2 **10))

    _upload_widget.observe(_cb, names='data')
    display(_upload_widget)
    
# print("United Nations Network Agreement: https://aws.s3.networkagreement.pdf")
# _upload("Upload Data Deposit Agreement")
# _upload("Upload Network Agreement")

In [4]:
from IPython.display import display, HTML

upload_button = HTML('''
<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width">
  <title>JS Bin</title>
</head>

<body>
  <button style="color:white;border-radius:8px;background-color:#1589FF;display:inline-block;width:20%; height:110%;" onclick="document.getElementById('getFile').click()">Upload Agreement</button>
  <input type='file' id="getFile" style="display:none">
</body>

</html>
''')
# print("United Nations Network Agreement: https://aws.s3.networkagreement.pdf")
# print("Upload your Agreement here:")
# upload_button

In [11]:
# print("United Nations Network Agreement: https://aws.s3.networkagreement.pdf")
# print("Canada, you are required to counter sign and upload the Network Agreement below.")
# display(upload_button)
# print("United Nations Network Agreement: https://aws.s3.networkagreement.pdf")
# print("USA, you are required to counter sign and upload the Network Agreement below.")
# display(upload_button)
# print("United Nations Network Agreement: https://aws.s3.networkagreement.pdf")
# print("Italy, you are required to counter sign and upload the Network Agreement below.")
# display(upload_button)
# print("United Nations Network Agreement: https://aws.s3.networkagreement.pdf")
# print("Netherlands you are required to counter sign and upload the Network Agreement below.")
# display(upload_button)