### Select a dataset

In order to access a dataset, a user needs to login into a domain to which the dataset belongs.
As we know, we can login into a domain via two ways:
- If we know the url of the domain
- Select domain via Networks

Overall the flow for a user to access a dataset:
- The user logs into the domain. **[P0]**
- The user is able to list datasets available in the domain **[P0]**
- The user is able to filter datasets via properties like Id, tags, and title **[P1]**
- The user selects the desired dataset via its ID. **[P0]**
- The user can view metadata related to the dataset. **[P0]**

In [None]:
import syft as sy

In [14]:
# Let's quicky list the available datasets
sy.datasets

Unnamed: 0,Id,Name,Tags,Assets,Description,Domain,Network,Usage,Added On
0,766e127be473493483e6614a099e7db4,Diabetes Dataset,"[Health, Classification, Dicom]","[""Images""] -> Tensor; [""Labels""] -> Tensor",A large set of high-resolution retina images,California Healthcare Foundation,WHO,102,"09 Jan 2021, 09:33:09"
1,28736302eb9140c886a76dbe0ab13c05,Canada Commodities Dataset,"[Commodities, Canada, Trade]","[""ca-feb2021""] -> DataFrame",Commodity Trade Dataset,Canada Domain,United Nations,40,"11 Mar 2021, 09:33:09"
2,08941436cf7f44f99fbcbea6aa221932,Italy Commodities Dataset,"[Commodities, Italy, Trade]","[""it-feb2021""] -> DataFrame",Commodity Trade Dataset,Italy Domain,United Nations,23,"09 Mar 2021, 09:33:09"
3,52f2ca8fbef241e6a415eac1a3843646,Netherlands Commodities Dataset,"[Commodities, Netherlands, Trade]","[""ne-feb2021""] -> DataFrame",Commodity Trade Dataset,Netherland Domain,United Nations,20,"12 Apr 2021, 09:33:09"
4,47a768058ab84e36ae5458dabe988b4c,Pnuemonia Dataset,"[Health, Pneumonia, X-Ray]","[""X-Ray-Images""] -> Tensor; [""labels""] -> Tensor",Chest X-Ray images. All provided images are in...,RSNA,WHO,334,"09 Jan 2021, 09:33:09"


In [12]:
# We want to access the `Commodity Trade Dataset` from the `Canada Domain`

# If we have access to the url of the domain, we can login into the domain directly

ca_domain = sy.login(
    email="sheldon@caltech.edu", password="bazinga", url="https://ca.openmined.org"
)

# Or selecting the domain via Network if the url of the domain is unknown

# Let's select the network to which the domain belongs to
sy.networks.filter(name="United Nations")

Unnamed: 0,Id,Name,Hosted Domains,Hosted Datasets,Description,Tags,Url
0,bfd98059c7f34b4f8be9156aa7c99201,United Nations,4,6,The UN hosts data related to the commodity and...,"[Commodities, Census]",https://un.openmined.org


In [None]:
# Let's select the network

un_network = sy.networks[0]
# Or
un_network = sy.networks.filter(name="United Nations")[0]

# Let's select the `Canada Domain` from the UN network
ca_domain = un_network.domains.filter(name="Canada Domain")[0]

# Let's login into the canada domain
ca_domain_client = ca_domain.login(email="sheldon@caltech.edu", password="bazinga")

In [15]:
# Great, now that we have a client to the domain, 
# let's list the available datasets on the domain

ca_domain_client.datasets

Unnamed: 0,Id,Name,Tags,Assets,Description,Domain,Network,Usage,Added On
1,28736302eb9140c886a76dbe0ab13c05,Canada Commodities Dataset,"[Commodities, Canada, Trade]","[""ca-feb2021""] -> DataFrame",Commodity Trade Dataset,Canada Domain,United Nations,40,"11 Mar 2021, 09:33:09"


In [18]:
# Let's select the commodity dataset. 
# We select a dataset either via `index` or `Id`
ca_commodity_data = ca_domain_client.datasets[1]

# Or via `Id`
ca_commodity_data = ca_domain_client.datasets["28736302eb9140c886a76dbe0ab13c05"]

In [34]:
# If the user enters an invalid Id
ca_commodity_data = ca_domain_client.datasets["f6fc3e2b9b4a2319dc7486a705565807"]


    [91mDatasetDoesNotExistException[0m:
        The dataset with Id `f6fc3e2b9b4a2319dc7486a705565807` doesn't exists on the domain.



In [18]:
# If the user enters an invalid index
ca_domain_client.datasets[1]


    [91mIndexOutOfBoundException[0m:
        Index `1` doesn't exists on the domain.



In [29]:
# Let's check the assets attached to the dataset
ca_commodity_data


Name: Canada Commodities Dataset
Description: Commodity Trade Dataset



Unnamed: 0,Asset Key,Type,Shape
0,"[""ca-feb2021""]",DataFrame,"(40000, 7)"


In [27]:
# Let's print the details of the dataset
print(f"Details of the dataset")
print(f"Name: {ca_commodity_data.name}")
print(f"Id: {ca_commodity_data.id}")
print(f"Tags: {ca_commodity_data.tags}")
print(f"Description: {ca_commodity_data.description}")

Details of the dataset
Name: Canada Commodities Dataset
Id: 28736302eb9140c886a76dbe0ab13c05
Tags: ['Commodities', 'Canada', 'Trade']
Description: Commodity Trade Dataset


In [37]:
# Let's access the asset in the dataset
commodity_dataset_ptr = ca_commodity_data["ca-feb2021"]

In [41]:
commodity_dataset_ptr["random-feb2021"]


    [91mInvalidAssetKeyError[0m:
        Asset with key `random-feb2021` does not exist.



In [28]:
# Let's check if there is any metadata attached to the dataset
# Metadata is any additional public information related to the dataset
# We can list the metadata attached to the dataset.
ca_commodity_data_ptr.metadata

Unnamed: 0,name,type
0,sample_data,DataFrame
1,partner_code_mapping,Dict
2,column_names,List


In [10]:
# Let's check the partner name mapping from the metadata
# We can access the metadata via the `key` name.
partner_mapping = ca_commodity_data_ptr.metadata["partner_code_mapping"]
partner_mapping

{818: 'Egypt',
 826: 'United Kingdom',
 156: 'China',
 440: 'Lithuania',
 703: 'Slovakia'}

In [22]:
# Let's take a look at the sample data
sample_data = ca_commodity_data_ptr.metadata["sample_data"]
sample_data

Unnamed: 0,Trade Flow Code,Partner Code,Trade Value (US$)
7191,1,752,20
5239,1,56,3571
1233,1,251,201246
17040,1,144,28139
37574,1,0,43080


#### Dummy Datasets

In [1]:
import pandas as pd
from enum import Enum
import uuid
import torch
import datetime
import json


class bcolors(Enum):
    HEADER = "\033[95m"
    OKBLUE = "\033[94m"
    OKCYAN = "\033[96m"
    OKGREEN = "\033[92m"
    WARNING = "\033[93m"
    FAIL = "\033[91m"
    ENDC = "\033[0m"
    BOLD = "\033[1m"
    UNDERLINE = "\033[4m"

In [2]:
DT_FORMAT = "%d %b %Y, %H:%M:%S"

all_datasets = [
    {
        "Id": uuid.uuid4().hex,
        "Name": "Diabetes Dataset",
        "Tags": ["Health", "Classification", "Dicom"],
        "Assets": '''["Images"] -> Tensor; ["Labels"] -> Tensor''',
        "Description": "A large set of high-resolution retina images",
        "Domain": "California Healthcare Foundation",
        "Network": "WHO",
        "Usage": 102,
        "Added On": datetime.datetime.now().replace(month=1).strftime(DT_FORMAT)
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Canada Commodities Dataset",
        "Tags": ["Commodities", "Canada", "Trade"],
        "Assets": '''["ca-feb2021"] -> DataFrame''',
        "Description": "Commodity Trade Dataset",
        "Domain": "Canada Domain",
        "Network": "United Nations",
        "Usage": 40,
        "Added On": datetime.datetime.now().replace(month=3, day=11).strftime(DT_FORMAT)
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Italy Commodities Dataset",
        "Tags": ["Commodities", "Italy", "Trade"],
        "Assets": '''["it-feb2021"] -> DataFrame''',
        "Description": "Commodity Trade Dataset",
        "Domain": "Italy Domain",
        "Network": "United Nations",
        "Usage": 23,
        "Added On": datetime.datetime.now().replace(month=3).strftime(DT_FORMAT)
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Netherlands Commodities Dataset",
        "Tags": ["Commodities", "Netherlands", "Trade"],
        "Assets": '''["ne-feb2021"] -> DataFrame''',
        "Description": "Commodity Trade Dataset",
        "Domain": "Netherland Domain",
        "Network": "United Nations",
        "Usage": 20,
        "Added On": datetime.datetime.now().replace(month=4, day=12).strftime(DT_FORMAT)
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Pnuemonia Dataset",
        "Tags": ["Health", "Pneumonia", "X-Ray"],
        "Assets": '''["X-Ray-Images"] -> Tensor;  ["labels"] -> Tensor''',
        "Description": "Chest X-Ray images. All provided images are in DICOM format.",
        "Domain": "RSNA",
        "Network": "WHO",
        "Usage": 334,
        "Added On": datetime.datetime.now().replace(month=1).strftime(DT_FORMAT)
    },
]

all_datasets_df = pd.DataFrame(all_datasets)

In [3]:
filtered_network_via_name = [
    {
        "Id": f"{uuid.uuid4().hex}",
        "Name": "United Nations",
        "Hosted Domains": 4,
        "Hosted Datasets": 6,
        "Description": "The UN hosts data related to the commodity and Census data.",
        "Tags": ["Commodities", "Census"],
        "Url": "https://un.openmined.org",
    },
]
filtered_network_via_name = pd.DataFrame(filtered_network_via_name)

In [4]:
canada_domain_datasets_df = all_datasets_df[all_datasets_df["Domain"] ==  "Canada Domain"]

In [5]:
error_on_invalid_dataset = f"""
    {bcolors.FAIL.value}DatasetDoesNotExistException{bcolors.ENDC.value}:
        The dataset with Id `f6fc3e2b9b4a2319dc7486a705565807` doesn't exists on the domain.
"""
print(error_on_invalid_dataset)


    [91mDatasetDoesNotExistException[0m:
        The dataset with Id `f6fc3e2b9b4a2319dc7486a705565807` doesn't exists on the domain.



In [6]:
canada_domain_datasets_df

Unnamed: 0,Id,Name,Tags,Assets,Description,Domain,Network,Usage,Added On
1,28736302eb9140c886a76dbe0ab13c05,Canada Commodities Dataset,"[Commodities, Canada, Trade]","[""ca-feb2021""] -> DataFrame",Commodity Trade Dataset,Canada Domain,United Nations,40,"11 Mar 2021, 09:33:09"


In [7]:
error_on_invalid_index_dataset = f"""
    {bcolors.FAIL.value}IndexOutOfBoundException{bcolors.ENDC.value}:
        Index `1` doesn't exists on the domain.
"""
print(error_on_invalid_index_dataset)


    [91mIndexOutOfBoundException[0m:
        Index `1` doesn't exists on the domain.



In [9]:
dataset_detail = [
    {
        "Asset Key": '["ca-feb2021"]',
        "Type": "DataFrame",
        "Shape": "(40000, 7)"
    },
]
print("""
Name: Canada Commodities Dataset
Description: Commodity Trade Dataset
""")
dataset_detail_df = pd.DataFrame(dataset_detail)
dataset_detail_df


Name: Canada Commodities Dataset
Description: Commodity Trade Dataset



Unnamed: 0,Asset Key,Type,Shape
0,"[""ca-feb2021""]",DataFrame,"(40000, 7)"


In [10]:
error_on_invalid_asset_key = f"""
    {bcolors.FAIL.value}InvalidAssetKeyError{bcolors.ENDC.value}:
        Asset with key `random-feb2021` does not exist.
"""
print(error_on_invalid_asset_key)


    [91mInvalidAssetKeyError[0m:
        Asset with key `random-feb2021` does not exist.



In [11]:
metadata = [
    {"name": "sample_data",
    "type": "DataFrame"},
    {"name": "partner_code_mapping",
    "type": "Dict",},
    {"name": "column_names",
    "type": "List",},
]
metadata = pd.DataFrame(metadata)

In [12]:
sample_data = '{"Trade Flow Code":{"7191":1,"5239":1,"1233":1,"17040":1,"37574":1},"Partner Code":{"7191":752,"5239":56,"1233":251,"17040":144,"37574":0},"Trade Value (US$)":{"7191":20,"5239":3571,"1233":201246,"17040":28139,"37574":43080}}'
sample_data = pd.DataFrame.from_dict(json.loads(sample_data))

In [13]:
column_names = ['Trade Flow Code', 'Partner Code', 'Trade Value (US$)']

partner_mapping = {818: 'Egypt',
 826: 'United Kingdom',
 156: 'China',
 440: 'Lithuania',
 703: 'Slovakia'}