
The Notebook is divided into three part:
- **Part I**: Data Scientist submits an application to join a network
- **Part II**: Network approves the application to join the network
- **Part III**: Data Scientist searches the network for relavant datasets

### User:  Data Scientist (Part I)

#### Goal:
- Search Available Networks
- Select one network
- Create an account on the network

#### Summary:
1. The user searches and view all available Networks.
2. User selects a Network.
3. User downloads the contract associated with the Network.
4. The user reads the contract and signs it offline.
5. User registers on the selected Network.
    - The user submits an application for account creation with the following details:
        - user details like name and email
        - signed contract
    - User is notified about the application status via email.
7. Once the application is approved, user can login with the credential recieved in the email.

In [29]:
from enum import Enum
class bcolors(Enum):
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

In [10]:
import pandas as pd
import syft as sy

# Let's search for all the available networks
sy.networks

Unnamed: 0,Name,Hosted Domains,Datasets,Description,Tags,Url
0,United Nations,4,5,The UN hosts data related to the commodity and...,"[Commodities, Health]",https://un.openmined.org


In [None]:
# Hmm.. the UN network seems interesting, let a pointer to its client.
un = sy.networks[0]

In [141]:
# Let's try to login into the network
sy.login(url=un.url, email="sheldon@caltech.edu", password="bazinga")


[91mAuthenticationError: [0m
	Incorrect email or password. 
	If you're a new user, you need to create an account on the network using the following steps:[94m
		client = sy.networks[0]  # select the network

		# Download the terms and condition of the network
		client.tnc(path="/path/to/store/termsandcondition/file/")
		# Read and sign the agreement and upload it during account creation

		client.create_account(
                    email='tonystark@marvel.com',
                    name="Tony Stark", 
                    tnc_path="/path/to/tnc.pdf"
                )



In [7]:
# Okay, so we need an account with the network. Let's create one.
# Firstly, let's download the data access agreement of the network.
# Every network, hosts a data access agreement that needs to be counter signed and uploaded by the user
# to access any dataset hosted on the network.

un.data_access_agreement

United Nations Data Access Agreement: https://aws.s3.dataaccessagreement.pdf


In [139]:
# Let's read and sign the Terms and Condition pdf.

In [5]:
# Alright, we have read and signed the terms and condition.
# Let's create an account on the network.
# During account creation the user will be prompted to upload the counter signed data access agreement.
# A link to the same will be provided in case the user needs to download the agreement again.
# Once, the agreement is uploaded, an application for account creation will be submitted to the network and the user is informed the same.

un.create_account(
    email="sheldon@caltech.edu",
    name="Sheldon Cooper",
)

United Nations Data Access Agreement: https://aws.s3.dataAccessAgreement.pdf
Sheldon, you are required to counter sign and upload the Data Access Agreement.


An application for account creation has been submitted to United Nations network! ❤️ 💯
You'll get an email [1m(sheldon@caltech.edu)[0m when your application has been processed!


In [None]:
# Great we have successfully submit a request to created an account on the network.

##### *Meanwhile on the Network Owner side*

### User: Network Owner (Part II)

#### Goal:
- Search for pending subscription requests
- For each request downloads the terms and conditions and signs them offline.
- Uploads the terms and conditions and accept/decline the subscription request.

#### Summary:
1. The user list all the requests submitted for account creation.
2. The user selects one of the requests.
3. Downloads the terms and condition document attached to the request.
4. Verifies and counter signs the tnc document offline.
5. Upload the tnc document and accept/decline request for new account creation.

In [230]:
# Note: Now the user is the network owner.
# Let's connect to my network

un_client = sy.login(
    email="info@openmined.org", password="changethis", url="https://un.openmined.org"
)

Connecting to United Nations... connected!	Logging in as [1minfo@openmined.org[0m... logged in!


In [196]:
# List all the signup requests submitted to the network
un_client.users.signup_requests

Unnamed: 0,Id,Name,Email,SubmittedOn,ApprovedOn,State
0,1fea0220d6344637a48666522174ca55,Sheldon Cooper,sheldon@caltech.edu,2021-07-19,,Pending
1,b47ce10f1101428a851f154acaee4cb1,Raj Koothrappali,raj@ucla.edu,2021-07-09,2021-07-11,Accepted
2,3b2ef59ea5be41239319a5656c7d87d7,Howard Wolowitz,howard@mit.edu,2021-07-10,2021-07-12,Declined


In [None]:
# There is one pending request for account creation, let's get a pointer to the request
pending_signup_request = unclient.users.signup_requests[0]

In [10]:
# Let's download the terms and condition document and counter sign it.
pending_signup_request.data_access_agreement

Data Access Agreement submitted by Sheldon Cooper: https://aws.s3.userSubmitteddataAccessAgreement.pdf


In [12]:
# Let's counter sign the document offline.
# Now, that we have signed the Data Access Agreement let's approve the request.
# During request approval the network owner has to upload the countersigned data access agreement.
# Both the data access submitted by the user and one uploaded by the network owner has be to saved
# in database, as it will come handy in case of conflicts.
# On uploading the agreement, the request is accepted and same is informed to the user.

# The request user, recieves an email with the login credentials.
pending_signup_request.accept(
    notify_by_email=True,
)

Data Access Agreement submitted by Sheldon Cooper: https://aws.s3.userSubmitteddataAccessAgreement.pdf
Upload the counter signed Data Access Agreement here.


Yay, a new user has added to your network! ❤️


In [None]:
# Or if we choose to decline the request
# The request user recieves an email with the reason specified by the network owner.
pending_signup_request.decline(
    notify_by_email=True,
    message="not all required fields are filled in the data access agreement document",
)

In [199]:
# Great !!!, we successfully, accept a request for account creation.

##### Meanwhile at the end of the Data Scientist
##### *1 - 2 days have passed .... The application for account creation has been approved and the Data Scientist has received the login credentials in the email.*

### User: Data Scientist (Part III)

#### Goal:
- Login into the network
- Search for relavant datasets
- Select a dataset that is relavant to the user

#### Summary:
1. The user logs into the network.
2. The user searches the network dataset store for all the available datasets.
3. The user selects a dataset
4. The user get a pointer to the dataset

In [227]:
import syft as sy

# # Let us login into the network, using the credentials provided in the email (sent on application approval)
# # Select the United Nations network
un = sy.network[0]

# Login into the network
un_network_client = sy.login(url=un.url, email="sheldon@caltech.edu", password="bazinga")

# Or
un_network_client = un.login(email="sheldon@caltech.edu", password="bazinga")
# un_network_client.save_logins()

Connecting to United Nations... connected!	Logging in as [1msheldon@caltech.edu[0m... logged in!


In [5]:
# Great !! Now we have a pointer to the network.
# Let's search through the network dataset store for datasets
un_network_client.datasets

Unnamed: 0,Name,Tags,Description,Dtype,Id,Domain,Shape
0,breast_cancer,"[mri, breast cancer, dicoms]",Labelled image dataset of patients suffering d...,ImageClassificationDataset,56lkw24,WHO,"((25000, 300, 300), (25000))"
1,canada_trade_data,"[canada, trade, un, commodities]",This dataset represents aggregated trade stati...,DataFrame,f3s9h1m,Canada,"(25000, 22)"
2,netherlands_trade_data,"[netherlands, trade, commodities, export]",This dataset represents aggregated trade stati...,DataFrame,2kf3o5d,Netherlands,"(35000, 22)"
3,italy_trade_data,"[italy, trade, un, commodities, export, import]",This dataset represents aggregated trade stati...,DataFrame,42wk65l,Italy,"(30000, 22)"
4,us_trade_data,"[us, trade, un, commodities]",This dataset represents aggregated trade stati...,DataFrame,86pfgh1,United States,"(40000, 22)"


In [7]:
# Currently, I'm interested in datasets containing information related to commodities trade between countries

# Let's filter out the results accordingly using tags
un_network_client[["trade" in tags for tags in un_client["tags"]]]

# Refer to Django rest framework
un_network_client.datasets.filter(name__icontains="trade_data")

Unnamed: 0,Name,Tags,Description,Dtype,Id,Domain,Shape
1,canada_trade_data,"[canada, trade, un, commodities]",This dataset represents aggregated trade stati...,DataFrame,f3s9h1m,Canada,"(25000, 22)"
2,netherlands_trade_data,"[netherlands, trade, commodities, export]",This dataset represents aggregated trade stati...,DataFrame,2kf3o5d,Netherlands,"(35000, 22)"
3,italy_trade_data,"[italy, trade, un, commodities, export, import]",This dataset represents aggregated trade stati...,DataFrame,42wk65l,Italy,"(30000, 22)"
4,us_trade_data,"[us, trade, un, commodities]",This dataset represents aggregated trade stati...,DataFrame,86pfgh1,United States,"(40000, 22)"


In [8]:
# Let's check out one of the datasets
un_network_client.datasets.loc[1]

Name                                           canada_trade_data
Tags                            [canada, trade, un, commodities]
Description    This dataset represents aggregated trade stati...
Dtype                                                  DataFrame
Id                                                       f3s9h1m
Domain                                                    Canada
Shape                                                (25000, 22)
Name: 1, dtype: object

In [None]:
# Let's get a pointer to the one of the datasets. Let's get pointer to the canada trade data.
# We can do so by either passing the var_name or Id to the network client.

# Using index
canada_dataset_ptr = un_network_client.datasets[1]

# Using the Id
canada_dataset_ptr = un_client.datasets["f3s9h1m"] 

# Using var_name
canada_dataset_ptr = un_client.datasets["canada_trade_data"]

In [49]:
# Let's explore the dataset.

# Let's read the description associated with the dataset
print(canada_dataset_ptr.description)

This dataset represents aggregated trade statistics as reported by Canada about what it believes was imported/exported to/from its country in Feb 2021.


In [7]:
# Alteratively, we can get a pointer to one particular domain and list the datasets hosted by that domain.
# Let's demonstrate this, by first listing all the domains attached to the network

un_network_client.domains

Unnamed: 0,Id,Name,Description,Tags
0,bc4e2ff0bc9e43818deb6cba5a3fac75,Canada,This domain hosts datasets provided by the gov...,"[trade data, healthcare, commodities]"
1,387eedbbbe4c40cd812aaf9b84154d29,United States,This domain hosts datasets provided by the gov...,"[trade data, commodities]"
2,9a59cbcae92242638b375967bfe8a9ba,Italy,This domain hosts datasets provided by the gov...,"[trade data, commodities]"
3,aaf48ca2378e480ea771d1a89d216291,Netherlands,This domain hosts datasets provided by the gov...,[trade data]
4,2206860d0fee43519c381eacb70a4668,WHO,This domain hosts datasets provided by hospita...,[healthcare]


In [10]:
# Now, we can get a pointer to one of the domains and list the datasets hosted by that domain.
# Let's get a pointer to the Canada domain.

canada_domain = un_network_client.domains[0]  # Or un_network_client["bc4e2ff0"]

# Now, we can simply list all the datasets hosted by this domain, using the `.datasets` method.
canada_domain.datasets

Unnamed: 0,Name,Tags,Description,Dtype,Id,Domain,Shape
1,canada_trade_data,"[canada, trade, un, commodities]",This dataset represents aggregated trade stati...,DataFrame,f3s9h1m,Canada,"(25000, 22)"


In [None]:
# Great, we can see there is one dataset currently hosted by the domain `Canada`.
# We can get the pointer to the dataset, similar to the way we were doing in case of a network client.

# Using the Id
canada_dataset_ptr = canada_domain.datasets["f3s9h1m"] 

# Using Name
canada_dataset_ptr = canada_domain.datasets["canada_trade_data"]

##### Awesome, we were successfully able to search across different datasets in the network and select one that is relevant to us. In the next notebook, we will learn how to ETL the selected dataset.

### Dummy Data Creation

#### Part I:

In [1]:
import pandas as pd
from enum import Enum

class bcolors(Enum):
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    
    
# print("An application for account creation has been submitted to United Nations network! ❤️ 💯")
# print(f"You'll get an email {bcolors.BOLD.value}(sheldon@caltech.edu){bcolors.ENDC.value} when your application has been processed!")


# Print available networks
available_networks = [
    {
        "Name": "United Nations",
        "Hosted Domains": 4,
        "Datasets": 5,
        "Description": "The UN hosts data related to the commodity and health sector.",
        "Tags": ["Commodities", "Health"],
        "Url": "https://un.openmined.org",
    }
]
# pd.DataFrame(available_networks)

# Authetication Error
login_error = f"""
{bcolors.FAIL.value}AuthenticationError: {bcolors.ENDC.value}
\tIncorrect email or password. 
\tIf you're a new user, you need to create an account on the network using the following steps:{bcolors.OKBLUE.value}
\t\tclient = sy.networks[0]  # select the network

\t\t# Download the terms and condition of the network
\t\tclient.tnc(path="/path/to/store/termsandcondition/file/")
\t\t# Read and sign the agreement and upload it during account creation

\t\tclient.create_account(
                    email='tonystark@marvel.com',
                    name="Tony Stark", 
                    tnc_path="/path/to/tnc.pdf"
                )
"""
# print(login_error)

# Message on downloading T&C
tnc_message = f"Downloading Terms and Condition to path:`/home/ubuntu/Desktop/` \n{bcolors.OKGREEN.value}Download Completed"
# print(tnc_message)

# Submit request for new account creation
# We can use the emoji pypi package to render emojis.
new_account_request = f"""
An application for account creation has been submitted to United Nations network! ❤️ 💯
You'll get an email {bcolors.BOLD.value}(sheldon@redcross.com){bcolors.ENDC.value} when your application has been processed!
"""
# print(new_account_request)

In [2]:
# https://github.com/peteut/ipython-file-upload
# For reference: https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html

import io
from IPython.display import display, HTML
import fileupload


def _upload(label="Browse"):

    _upload_widget = fileupload.FileUploadWidget(label=label)

    def _cb(change):
        # TODO: Write code to upload the document to s3 or store in syft server
        decoded = io.StringIO(change["owner"].data.decode("utf-8"))
        filename = change["owner"].filename
        print(
            "Uploaded `{}` ({:.2f} kB)".format(filename, len(decoded.read()) / 2 ** 10)
        )

    _upload_widget.observe(_cb, names="data")
    display(_upload_widget)


upload_button = HTML(
    """
<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width">
  <title>JS Bin</title>
</head>

<body>
  <button style="color:white;border-radius:8px;background-color:#1589FF;display:inline-block;width:20%; height:110%;" onclick="document.getElementById('getFile').click()">Upload Agreement</button>
  <input type='file' id="getFile" style="display:none">
</body>

</html>
"""
)
# print("United Nations Network Agreement: https://aws.s3.networkagreement.pdf")
# print("Upload your Agreement here:")
# upload_button
# print("United Nations Network Agreement: https://aws.s3.networkagreement.pdf")
# _upload("Upload Data Deposit Agreement")
# _upload("Upload Network Agreement")


## Upload network agreement

# print("United Nations Data Access Agreement: https://aws.s3.dataAccessAgreement.pdf")
# print("Sheldon, you are required to counter sign and upload the Data Access Agreement.")
# display(upload_button)
# print("An application for account creation has been submitted to United Nations network! ❤️ 💯")
# print(f"You'll get an email {bcolors.BOLD.value}(sheldon@caltech.edu){bcolors.ENDC.value} when your application has been processed!")

#### Part II:

In [3]:
# Client connection
do_client_connection = f"Connecting to United Nations... connected!\tLogging in as {bcolors.BOLD.value}info@openmined.org{bcolors.ENDC.value}... logged in!"
#print(do_client_connection)

import uuid

# Dummy signup requests
signup_requests = [
    {
        "Id": uuid.uuid4().hex,
        "Name": "Sheldon Cooper",
        "Email": "sheldon@caltech.edu",
        "SubmittedOn": "2021-07-19",
        "ApprovedOn": None,
        "State": "Pending",
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Raj Koothrappali",
        "Email": "raj@ucla.edu",
        "SubmittedOn": "2021-07-09",
        "ApprovedOn": "2021-07-11",
        "State": "Accepted",
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Howard Wolowitz",
        "Email": "howard@mit.edu",
        "SubmittedOn": "2021-07-10",
        "ApprovedOn": "2021-07-12",
        "State": "Declined",
    },
]
# pd.DataFrame(signup_requests)

# print("Data Access Agreement submitted by Sheldon Cooper: https://aws.s3.dataAccessAgreement.pdf")


# print("Data Access Agreement submitted by Sheldon Cooper: https://aws.s3.userSubmitteddataAccessAgreement.pdf")
# print("Upload the counter signed Data Access Agreement here.")
# display(upload_button)
# print("Yay, a new user has added to your network! ❤️")


#### Part III:

In [4]:
# Connection to network
ds_client_connection = f"Connecting to United Nations... connected!\tLogging in as {bcolors.BOLD.value}sheldon@caltech.edu{bcolors.ENDC.value}... logged in!"
# print(ds_client_connection)


import pandas as pd

dataset_store = [
    {
        "Name": "breast_cancer",
        "Tags": ["mri", "breast cancer", "dicoms"],
        "Description": "Labelled image dataset of patients suffering different types of breast cancer",
        "Dtype": "ImageClassificationDataset",
        "Id": "56lkw24",
        "Domain": "WHO",
        "Shape": "((25000, 300, 300), (25000))",
    },
    {
        "Name": "canada_trade_data",
        "Tags": ["canada", "trade", "un", "commodities"],
        "Description": "This dataset represents aggregated trade statistics as reported by Canada about what it believes was imported/exported to/from its country in Feb 2021.",
        "Dtype": "DataFrame",
        "Id": "f3s9h1m",
        "Domain": "Canada",
        "Shape": "(25000, 22)",
    },
    {
        "Name": "netherlands_trade_data",
        "Tags": ["netherlands", "trade", "commodities", "export"],
        "Description": "This dataset represents aggregated trade statistics as reported by Netherlands about what it believes was imported/exported to/from its country in Feb 2021.",
        "Dtype": "DataFrame",
        "Id": "2kf3o5d",
        "Domain": "Netherlands",
        "Shape": "(35000, 22)",
    },
    {
        "Name": "italy_trade_data",
        "Tags": ["italy", "trade", "un", "commodities", "export", "import"],
        "Description": "This dataset represents aggregated trade statistics as reported by Italy about what it believes was imported/exported to/from its country in Feb 2021.",
        "Dtype": "DataFrame",
        "Id": "42wk65l",
        "Domain": "Italy",
        "Shape": "(30000, 22)",
    },
    {
        "Name": "us_trade_data",
        "Tags": ["us", "trade", "un", "commodities"],
        "Description": "This dataset represents aggregated trade statistics as reported by United States about what it believes was imported/exported to/from its country in Feb 2021.",
        "Dtype": "DataFrame",
        "Id": "86pfgh1",
        "Domain": "United States",
        "Shape": "(40000, 22)",
    },
]

dataset_store = pd.DataFrame(dataset_store)
filtered_dataset = dataset_store[["trade" in tags for tags in dataset_store["Tags"]]]

domain_list = [
    {
        "Id": uuid.uuid4().hex,
        "Name": "Canada",
        "Description": "This domain hosts datasets provided by the government of Canada.",
        "Tags": ["trade data", "healthcare", "commodities"],
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "United States",
        "Description": "This domain hosts datasets provided by the government of United States.",
        "Tags": ["trade data", "commodities"],
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Italy",
        "Description": "This domain hosts datasets provided by the government of Italy.",
        "Tags": ["trade data", "commodities"],
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Netherlands",
        "Description": "This domain hosts datasets provided by the government of Netherlands.",
        "Tags": [
            "trade data",
        ],
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "WHO",
        "Description": "This domain hosts datasets provided by hospitals affliated with WHO.",
        "Tags": [
            "healthcare",
        ],
    },
]

domain_list = pd.DataFrame(domain_list)

canada_domain_datasets = [
    {
        "Name": "canada_trade_data",
        "Tags": ["canada", "trade", "un", "commodities"],
        "Description": "This dataset represents aggregated trade statistics as reported by Canada about what it believes was imported/exported to/from its country in Feb 2021.",
        "Type": "TabularDataset",
        "Id": "f3s9h1m",
        "Domain": "Canada",
    },
]

canada_domain_datasets = pd.DataFrame(canada_domain_datasets)

#### Exploring the Concept of Distributed Dataset

In [None]:
## How to create Distributed Dataset
# Distrbuted Dataset: There are collection of datasets which are group together and hosted by the DS.
# The datasets in the collection might follow a different set of schema than the original dataset.
# These dataset in these collection are just contain pointer to the original dataset which each dataset
# containing schema along with the information to convert the original data into the dataset specified by the schema.

ddataset_ptr = DistrbutedDataset(
    [canada_dataset_ptr, us_dataset_ptr, italy_dataset_ptr],
    email="sheldon@caltech.edu",
    **kwargs,
)

ddataset_ptr2 = DistrbutedDataset(
    [canada_dataset_ptr, us_dataset_ptr],
    email="howard@mit.edu",
    **kwargs,
)


ddataset = un_network_client.dist_datasets.filter(name__icontains="trade_data", dtype="DataFrame")