### Searching for a Dataset

A Dataset object is the representation of the dataset uploaded to domains. When a user is searching for a dataset, the following properties of the datasets are visible to the user:

- Id (unique Id of the dataset)
- Domain (Name of the domain)
- Network (Name of the network)
- Type (Type of dataset - Dataframe, Tensors, Numpy, etc)
- Name (Name of the dataset)
- Tags (List of tags)
- Description (Description to the dataset)
- Usage (Number of users who have used this dataset) **[P1]**
- Added On (Date on which the dataset was added to the domain) **[P1]**

Users should be able to perform the following operations during a search for hosted datasets:
- List all the available datasets
- Filter the datasets via the properties of the dataset
  Properties on which the user can perform a filter:
  - Id
  - Domain
  - Network
  - Name
  - Tags
- Group by datasets via Domain or Network.

In [26]:
import syft as sy

# Let's list all the available datasets
sy.datasets

Unnamed: 0,Id,Name,Tags,Type,Description,Domain,Network,Usage,Added On
0,8ef9ff3af71f40a4b2b7b503c3ada68a,Diabetes Dataset,"[Health, Classification, Dicom]",<class 'torch.Tensor'>,A large set of high-resolution retina images,California Healthcare Foundation,WHO,102,Jan 13 2021
1,e1226a91a0ed4a09a521975e28b5f04a,Canada Commodities Dataset,"[Commodities, Canada, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Canada Domain,United Nations,40,Mar 11 2021
2,4a74b45178b04352ad752f58710c9ded,Italy Commodities Dataset,"[Commodities, Italy, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Italy Domain,United Nations,23,Mar 13 2021
3,3723d1d5171d44859e5e3a187b90f6c6,Netherlands Commodities Dataset,"[Commodities, Netherlands, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Netherland Domain,United Nations,20,Apr 12 2021
4,4c73506468b84a39a484ce96a786964c,Pnuemonia Dataset,"[Health, Pneumonia, X-Ray]",<class 'torch.Tensor'>,Chest X-Ray images. All provided images are in...,RSNA,WHO,334,Jan 13 2021


If the dataset list is huge, then the user can filter via the `.filter` operation.

A user can filter results via three operations
- `filter(property=value)` Exact Match. (Equivalent to exact match query in SQL)
- `filter(property__contains=value)` Case-insensitive containment test. (Equivalent to a ILIKE query in SQL)
- `filter(property__in=[value1, value2, value3])` In a given iterable; often a list, tuple. (Equivalent to an IN query in SQL)

In [27]:
# For example a user wants to search for a dataset with Name `Diabetes Dataset`
sy.datasets.filter(name="Diabetes Dataset")

Unnamed: 0,Id,Name,Tags,Type,Description,Domain,Network,Usage,Added On
0,8ef9ff3af71f40a4b2b7b503c3ada68a,Diabetes Dataset,"[Health, Classification, Dicom]",<class 'torch.Tensor'>,A large set of high-resolution retina images,California Healthcare Foundation,WHO,102,Jan 13 2021


In [29]:
# But, let's say the user wants to find all the datasets with commodities in its name
sy.datasets.filter(name__contains="Commodities")

Unnamed: 0,Id,Name,Tags,Type,Description,Domain,Network,Usage,Added On
1,e1226a91a0ed4a09a521975e28b5f04a,Canada Commodities Dataset,"[Commodities, Canada, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Canada Domain,United Nations,40,Mar 11 2021
2,4a74b45178b04352ad752f58710c9ded,Italy Commodities Dataset,"[Commodities, Italy, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Italy Domain,United Nations,23,Mar 13 2021
3,3723d1d5171d44859e5e3a187b90f6c6,Netherlands Commodities Dataset,"[Commodities, Netherlands, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Netherland Domain,United Nations,20,Apr 12 2021


In [30]:
# Similarly, if a user want to filter out datasets with the given names

names_list = ["Diabetes Dataset", "Canada Commodities Dataset"]

sy.datasets.filter(name__in=names_list)

Unnamed: 0,Id,Name,Tags,Type,Description,Domain,Network,Usage,Added On
0,8ef9ff3af71f40a4b2b7b503c3ada68a,Diabetes Dataset,"[Health, Classification, Dicom]",<class 'torch.Tensor'>,A large set of high-resolution retina images,California Healthcare Foundation,WHO,102,Jan 13 2021
1,e1226a91a0ed4a09a521975e28b5f04a,Canada Commodities Dataset,"[Commodities, Canada, Trade]",DataFrameDatasetPointer,Commodity Trade Dataset,Canada Domain,United Nations,40,Mar 11 2021


Similarly, a user can perform the filter operations described above on the following properties:
- Id
- Name
- Domain
- Network
- Tags

In [33]:
# If a user tries to access a dataset
sy.datasets["8ef9ff3af71f40a4b2b7b503c3ada68a"]


    [91mAccessDeniedException[0m:
        You need to be log into the domain, to access the dataset.



### Dummy Data

In [4]:
import pandas as pd
from enum import Enum
import uuid
import torch
import datetime


class bcolors(Enum):
    HEADER = "\033[95m"
    OKBLUE = "\033[94m"
    OKCYAN = "\033[96m"
    OKGREEN = "\033[92m"
    WARNING = "\033[93m"
    FAIL = "\033[91m"
    ENDC = "\033[0m"
    BOLD = "\033[1m"
    UNDERLINE = "\033[4m"

In [24]:
all_datasets = [
    {
        "Id": uuid.uuid4().hex,
        "Name": "Diabetes Dataset",
        "Tags": ["Health", "Classification", "Dicom"],
        "Type": torch.Tensor,
        "Description": "A large set of high-resolution retina images",
        "Domain": "California Healthcare Foundation",
        "Network": "WHO",
        "Usage": 102,
        "Added On": datetime.datetime.now().replace(month=1).strftime("%b %d %Y")
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Canada Commodities Dataset",
        "Tags": ["Commodities", "Canada", "Trade"],
        "Type": "DataFrameDatasetPointer",
        "Description": "Commodity Trade Dataset",
        "Domain": "Canada Domain",
        "Network": "United Nations",
        "Usage": 40,
        "Added On": datetime.datetime.now().replace(month=3, day=11).strftime("%b %d %Y")
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Italy Commodities Dataset",
        "Tags": ["Commodities", "Italy", "Trade"],
        "Type": "DataFrameDatasetPointer",
        "Description": "Commodity Trade Dataset",
        "Domain": "Italy Domain",
        "Network": "United Nations",
        "Usage": 23,
        "Added On": datetime.datetime.now().replace(month=3).strftime("%b %d %Y")
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Netherlands Commodities Dataset",
        "Tags": ["Commodities", "Netherlands", "Trade"],
        "Type": "DataFrameDatasetPointer",
        "Description": "Commodity Trade Dataset",
        "Domain": "Netherland Domain",
        "Network": "United Nations",
        "Usage": 20,
        "Added On": datetime.datetime.now().replace(month=4, day=12).strftime("%b %d %Y")
    },
    {
        "Id": uuid.uuid4().hex,
        "Name": "Pnuemonia Dataset",
        "Tags": ["Health", "Pneumonia", "X-Ray"],
        "Type": torch.Tensor,
        "Description": "Chest X-Ray images. All provided images are in DICOM format.",
        "Domain": "RSNA",
        "Network": "WHO",
        "Usage": 334,
        "Added On": datetime.datetime.now().replace(month=1).strftime("%b %d %Y")
    },
]

all_datasets_df = pd.DataFrame(all_datasets)

In [31]:
error_on_access_dataset = f"""
    {bcolors.FAIL.value}AccessDeniedException{bcolors.ENDC.value}:
        You need to be log into the domain, to access this dataset. 
"""

print(error_on_access_dataset)


    [91mAccessDeniedException[0m:
        You need to be log into the domain, to access the dataset.

