<h2><center>Welcome to HAgrid Quickstart</center></h2>
<p><center>Let's quickly understand the steps that we are going to perform now:</center></p>

<img src="./img/header.png" style="width: 100%; " />

### 1.  Deploying a Domain Server:
    We will deploy a domain server. It will allow the Data Owner to manage the data and control the privacy guarantees of the subjects under study. It will also manage the remote study of the data by a Data Scientist. It also acts as a gatekeeper for the Data Scientist's access to the data to compute and experiment with the results
    
### 2. Uploading Private Dataset:
    We now upload our private data as Data Owners to our domain server for data scientists to remotely perform Data Science on.

### 3. Joining a Network:
    We now join a network for easy administration of legal ? technical details of our project. It provides services to a group of Data Owners and Data Scientists, such as dataset search and bulk project approval (legal / technical) to participate in a project. It also acts as a bridge between it's members (Domains) and their subscribers (Data Scientists) and can provide access to a collection of domains at once.
  
### 4. Creating a User Account for a Data Scientist
    We now finally create an account for our data scientist, whose credentials can now be shared to them so that they can perform computations or answer a specific question using one or more data owners' datasets without copying the private data.

### 5. Login as the Data Scientist
    We now login as the user created in the previous step to strt performing Remote Data Science,
### 6. Perform Remote Data Science
    We now finally perform remote data science without copying the data.

Now, before everything, let's make sure we have the correct version of Syft with us.
Run the below cell:

In [None]:
! pip show syft
! echo "\n✅ Step Complete\n"

To verify that you have Syft setup, see if you can see the "Version" tag above, if it's as per the version you expected or >=0.7, you should be good to go!

### Time to deploy a Domain Server

<img src="./img/deploy_domain.jpg" style="width: 100%; margin:0;" />

Edit the DOMAIN_NAME with your own favorite name and run the cell below:

In [None]:
DOMAIN_NAME = "My Institution Name"
! hagrid launch {DOMAIN_NAME} to docker:80 --tag=latest --tail=false

Voila, hope did not run into any errors. Let's confirm that by running the cell below:

In [None]:
! hagrid check --wait --silent

! echo "\n✅ Step Complete\n"

Now it's time to upload private dataset to our domain

<img src="./img/data.jpg" style="width: 100%; margin:0;" />

We now import Syft and a helper utils.py

In [None]:
import syft as sy
from utils import *
! echo "\n✅ Syft is imported\n"

We now login to the domain we created in the previous step with default credentials

In [None]:
domain_client = sy.login(
    url=auto_detect_domain_host_ip(),
    email="info@openmined.org",
    password="changethis"
)
! echo "\n✅ Logged in into the domain\n"


We now fetch the dataset we want to upload to the Domain server for our Data Scientists

In [None]:
# edit MY_DATASET_URL then run this cell

MY_DATASET_URL = ""

dataset = download_dataset(MY_DATASET_URL)
# see footnotes for information about the dataset

We look at the first 5 entries to confirm our dataset has been fetched correctly

In [None]:
dataset.head()

Now we do the pre-processing of our input dataset

In [None]:
# run this cell
train, val, test = split_and_preprocess_dataset(data=dataset)

We now annotate the train data for Automatic Differential Privacy

In [None]:
# run this cell
data_subjects = DataSubjectList.from_series(train["patient_ids"])
train_image_data = sy.Tensor(train["images"]).annotated_with_dp_metadata(
    min_val=0, max_val=255, data_subjects=data_subjects
)
train_label_data = sy.Tensor(train["labels"]).annotated_with_dp_metadata(
    min_val=0, max_val=1, data_subjects=data_subjects
)

We now annotate the val data for Automatic Differential Privacy

In [None]:
data_subjects = DataSubjectList.from_series(val["patient_ids"])
val_image_data = sy.Tensor(val["images"]).annotated_with_dp_metadata(
    min_val=0, max_val=255, data_subjects=data_subjects
)
val_label_data = sy.Tensor(val["labels"]).annotated_with_dp_metadata(
    min_val=0, max_val=1, data_subjects=data_subjects
)

We now annotate the test data for Automatic Differential Privacy

In [None]:
data_subjects = DataSubjectList.from_series(test["patient_ids"])
test_image_data = sy.Tensor(test["images"]).annotated_with_dp_metadata(
    min_val=0, max_val=255, data_subjects=data_subjects
)
test_label_data = sy.Tensor(test["labels"]).annotated_with_dp_metadata(
    min_val=0, max_val=1, data_subjects=data_subjects
)

Now that our dataset is ready, we upload it to our domain server

In [None]:
# run this cell
domain_client.load_dataset(
    name="BreastCancerDataset",
    assets={
        "train_images": train_image_data,
        "train_labels": train_label_data,
        "val_images": val_image_data,
        "val_labels": val_label_data,
        "test_images": test_image_data,
        "test_labels": test_label_data,
    },
    description="Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. \
    The modified dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens scanned at 40x. \
    Patches of size 50 x 50 were extracted from the original image. The labels 0 is non-IDC and 1 is IDC."
)

We check the dataset on the doamin to confirm that it got uploaded

In [None]:
# run this cell
domain_client.datasets

<img src="./img/network.jpg" style="width: 100%; margin:0;" />

Browse the available networks

In [None]:
sy.networks

Join a network by name

In [None]:
# run this cell
NETWORK_NAME = ""
network_client = sy.networks[NETWORK_NAME]
domain_client.apply_to_network(network_client)

List the domains

In [None]:
# run this cell
network_client.domains

<img src="./img/user_account.jpg" style="width: 100%; margin:0;" />

Create a User

In [None]:
# run this cell
data_scientist_details = domain_client.create_user(
    name="Sam Carter",
    email="sam@stargate.net",
    password="changethis",
    budget=9999
)

Print out the Data Scientist details

In [None]:
# run this cell then copy the output
submit_credentials(data_scientist_details)
print("Please give these details to the Data Scientist 👇🏽")
print(data_scientist_details)