### Select a dataset

In order to access a dataset, a user needs to login into a domain to which the dataset belongs.
As we know, we can login into a domain via two ways:
- If we know the url of the domain
- Select domain via Networks

Overall the flow for a user to access a dataset:
- The user logs into the domain. **[P0]**
- The user is able to list datasets available in the domain **[P0]**
- The user is able to filter datasets via properties like Id, tags, and title **[P1]**
- The user selects the desired dataset via its ID. **[P0]**
- The user can view metadata related to the dataset. **[P1]**

In [None]:
import syft as sy

In [8]:
# Let's quicky list the available datasets
sy.datasets

Unnamed: 0,Id,Name,Tags,Type,Description,Domain,Network,Usage,Added On
0,1ff01c60d40e4d57be2548718f62b6c3,Diabetes Dataset,"[Health, Classification, Dicom]",<class 'torch.Tensor'>,A large set of high-resolution retina images,California Healthcare Foundation,WHO,102,Jan 13 2021
1,708565507a6847cd9132a4b9b2e3cf6f,Canada Commodities Dataset,"[Commodities, Canada, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Canada Domain,United Nations,40,Mar 11 2021
2,fb82efa2c3da4564bd8f5bd6b8bbbef7,Italy Commodities Dataset,"[Commodities, Italy, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Italy Domain,United Nations,23,Mar 13 2021
3,89925bd0f07b4e4e85750fd7fbdcfa42,Netherlands Commodities Dataset,"[Commodities, Netherlands, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Netherland Domain,United Nations,20,Apr 12 2021
4,28542d7ce1c142eeb3b8abbf8e2eef38,Pnuemonia Dataset,"[Health, Pneumonia, X-Ray]",<class 'torch.Tensor'>,Chest X-Ray images. All provided images are in...,RSNA,WHO,334,Jan 13 2021


In [12]:
# We want to access the `Commodity Trade Dataset` from the `Canada Domain`

# If we have access to the url of the domain, we can login into the domain directly

ca_domain = sy.login(
    email="sheldon@caltech.edu", password="bazinga", url="https://ca.openmined.org"
)

# Or selecting the domain via Network if the url of the domain is unknown

# Let's select the network to which the domain belongs to
sy.networks.filter(name="United Nations")

Unnamed: 0,Id,Name,Hosted Domains,Hosted Datasets,Description,Tags,Url
0,bfd98059c7f34b4f8be9156aa7c99201,United Nations,4,6,The UN hosts data related to the commodity and...,"[Commodities, Census]",https://un.openmined.org


In [None]:
# Let's select the network
un_network = sy.networks.filter(name="United Nations")[0]

# Let's select the `Canada Domain` from the UN network
ca_domain = un_network.domains.filter(name="Canada Domain")[0]

# Let's login into the canada domain
ca_domain_client = ca_domain.login(email="sheldon@caltech.edu", password="bazinga")

In [17]:
# Great, now that we have a client to the domain, 
# let's list the available datasets on the domain

ca_domain_client.datasets

Unnamed: 0,Id,Name,Tags,Type,Description,Domain,Network,Usage,Added On
1,708565507a6847cd9132a4b9b2e3cf6f,Canada Commodities Dataset,"[Commodities, Canada, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Canada Domain,United Nations,40,Mar 11 2021


In [18]:
# Let's select the commodity dataset. 
# We select a dataset either via `index` or `Id`
ca_commodity_data_ptr = ca_domain_client.datasets[0]

# Or via `Id`
ca_commodity_data_ptr = ca_domain_client.datasets["708565507a6847cd9132a4b9b2e3cf6f"]

In [34]:
# If the user enters an invalid Id
ca_commodity_data_ptr = ca_domain_client.datasets["f6fc3e2b9b4a2319dc7486a705565807"]


    [91mDatasetDoesNotExistException[0m:
        The dataset with Id `f6fc3e2b9b4a2319dc7486a705565807` doesn't exists on the domain.



In [26]:
# Great !! we have a pointer to the dataset

print(f"Details of the dataset")
print(f"Name: {ca_commodity_data_ptr.name}")
print(f"Tags: {ca_commodity_data_ptr.tags}")
print(f"Type: {ca_commodity_data_ptr.type}")
print(f"Description: {ca_commodity_data_ptr.description}")

Details of the dataset
Name: Canada Commodities Dataset
Tags: ['Commodities', 'Canada', 'Trade']
Type: DataFrameDatasetPointer
Description: Commodity Trade Dataset


In [28]:
# Let's check the public shape of the dataset
ca_commodity_data_ptr.public_shape

(10000, 8)


In [None]:
# TODO: Need to confirm what kind of metadata will be uploaded by the domain
ca_commodity_data_ptr.metadata

#### Dummy Datasets

In [2]:
import pandas as pd
from enum import Enum
import uuid
import torch
import datetime


class bcolors(Enum):
    HEADER = "\033[95m"
    OKBLUE = "\033[94m"
    OKCYAN = "\033[96m"
    OKGREEN = "\033[92m"
    WARNING = "\033[93m"
    FAIL = "\033[91m"
    ENDC = "\033[0m"
    BOLD = "\033[1m"
    UNDERLINE = "\033[4m"

In [3]:
all_datasets = [
    {
        "Id": uuid.uuid4().hex,
        "Name": "Diabetes Dataset",
        "Tags": ["Health", "Classification", "Dicom"],
        "Type": torch.Tensor,
        "Description": "A large set of high-resolution retina images",
        "Domain": "California Healthcare Foundation",
        "Network": "WHO",
        "Usage": 102,
        "Added On": datetime.datetime.now().replace(month=1).strftime("%b %d %Y")
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Canada Commodities Dataset",
        "Tags": ["Commodities", "Canada", "Trade"],
        "Type": "DataFrameDatasetPointer",
        "Description": "Commodity Trade Dataset",
        "Domain": "Canada Domain",
        "Network": "United Nations",
        "Usage": 40,
        "Added On": datetime.datetime.now().replace(month=3, day=11).strftime("%b %d %Y")
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Italy Commodities Dataset",
        "Tags": ["Commodities", "Italy", "Trade"],
        "Type": "DataFrameDatasetPointer",
        "Description": "Commodity Trade Dataset",
        "Domain": "Italy Domain",
        "Network": "United Nations",
        "Usage": 23,
        "Added On": datetime.datetime.now().replace(month=3).strftime("%b %d %Y")
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Netherlands Commodities Dataset",
        "Tags": ["Commodities", "Netherlands", "Trade"],
        "Type": "DataFrameDatasetPointer",
        "Description": "Commodity Trade Dataset",
        "Domain": "Netherland Domain",
        "Network": "United Nations",
        "Usage": 20,
        "Added On": datetime.datetime.now().replace(month=4, day=12).strftime("%b %d %Y")
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Pnuemonia Dataset",
        "Tags": ["Health", "Pneumonia", "X-Ray"],
        "Type": torch.Tensor,
        "Description": "Chest X-Ray images. All provided images are in DICOM format.",
        "Domain": "RSNA",
        "Network": "WHO",
        "Usage": 334,
        "Added On": datetime.datetime.now().replace(month=1).strftime("%b %d %Y")
    },
]

all_datasets_df = pd.DataFrame(all_datasets)

In [11]:
filtered_network_via_name = [
    {
        "Id": f"{uuid.uuid4().hex}",
        "Name": "United Nations",
        "Hosted Domains": 4,
        "Hosted Datasets": 6,
        "Description": "The UN hosts data related to the commodity and Census data.",
        "Tags": ["Commodities", "Census"],
        "Url": "https://un.openmined.org",
    },
]
filtered_network_via_name = pd.DataFrame(filtered_network_via_name)

In [16]:
canada_domain_datasets_df = all_datasets_df[all_datasets_df["Domain"] ==  "Canada Domain"]

In [33]:
error_on_invalid_dataset = f"""
    {bcolors.FAIL.value}DatasetDoesNotExistException{bcolors.ENDC.value}:
        The dataset with Id `f6fc3e2b9b4a2319dc7486a705565807` doesn't exists on the domain.
"""
print(error_on_invalid_dataset)


    [91mDatasetDoesNotExistException[0m:
        The dataset with Id `f6fc3e2b9b4a2319dc7486a705565807` doesn't exists on the domain.

