## Intro

These instructions are intended for data owners / data managers. This tutorial will cover annonating data and uploading it to the domain node.

To be able to continue, you should have received instructions from the IT specialist about how to access the domain node UI (URL, port, credentials, etc). If not, check with the designated IT responsible.

## Logging into the domain

In [None]:
import syft as sy

In [None]:
# the login credentials should have been provided by the IT specialist

try:
    domain_client = sy.login(
      port=8081,
      email="info@openmined.org",
      password="changethis"
   )
except Exception as e:
    print("Unable to login. Please check your domain is up with `!hagrid check localhost:8081`")

## Prepare your dataset

Before uploading the dataset, make sure it contains the necessary data and in the right format.

PySyft support Pandas Dataframes objects, so make sure your dataset can be parsed as such using the [Python Pandas](https://pandas.pydata.org) library.

For example, we will use a simple dataset consisting of n=4 data subjects, their age and their hourly income.

In [1]:
import pandas as pd
data = {'ID': ['011', '015', '022', '034'],
     'Age': [40, 39, 9, 8],
     'Hourly Income': [20, 25, 32, 18]  }

dataset = pd.DataFrame(data)
print(dataset.head())

    ID  Age  Hourly Income
0  011   40             20
1  015   39             25
2  022    9             32
3  034    8             18


## Annotate data

After preparing the dataset in the right format, the next step is to annotate it with **privacy-specific metadata**, which will allows the PySyft library to protect and adjust the visibility different Data Scientists will have into any one of the data subjects.

Each feature needs to define the appropriate minimum and maximum ranges (`lower_bound` and `upper_bound`), which represent the theoretical range of values that could be learned about that aspect.

**If your project has a training set, validation set and test set, you must annotate each data set with metadata as described.**

In [None]:
# extracting the unique identifier for each data subject
data_subjects = sy.DataSubjectArray.from_objs(dataset["ID"])

# adding metadata for feature 'age'
age_data = sy.Tensor(dataset["Age"]).annotate_with_dp_metadata(
   lower_bound=0, upper_bound=120, data_subject=data_subjects
)

# adding metadata for feature 'data'
hourly_income_data = sy.Tensor(dataset["Hourly Income"]).annotate_with_dp_metadata(
   lower_bound=10, upper_bound=500, data_subject=data_subjects
)

# ...this needs to be done for every feature, and for every dataset

## Upload the dataset

Once it's annotated, the dataset can be uploaded to the domain server to be used by data scientists.

Add details like `name` and `description` so that data scientists can more easily come across your dataset.

You uploaded dataset is formed of the tensors with annotated data; it is not the initial dataset.

In [None]:
domain_client.load_dataset(
   name="Age_Income_Dataset",
   assets={
      "Age_Data": age_data, # annotated age data
      "Hourly_Income": hourly_income_data # annotated income data
   },
   description="Our dataset contains..."
)

If your project has multiple datasets, you can upload them separately using the `load_dataset` function above. For example, if your dataset has `testing`, `training` and `validation` datasets, all of them can be uploaded separately, with the corresponding assets, names and descriptions.

## Closing thoughts

Congrats on uploading the dataset! Here's what you should have achieved from this tutorial:\
✅ transformed the dataset to have the right format, if needed\
✅ annotated the dataset\
✅ uploaded the dataset to the domain node