# Data Owner - Domain Management

## Setup Variables

Before we start let's store some variables that will come in handy later in the notebook.

In [1]:
# Set the Ip Address of your Domain node in this variable.
DOMAIN_HOST_IP="20.25.54.155"

In [2]:
# Set the Ip Address of your Network node in this variable.
# The network node with IP `20.121.101.41` is managed and maintained by OpenMined.
NETWORK_IP="20.121.101.41"

In [3]:
# Set the email and password of your Domain node.
# We will be using the default email and password that got created during Domain creation.
# Please update the email and password below incase you update them.

ADMIN_EMAIL="info@openmined.org"
ADMIN_PASSWORD="changethis"

## Install Packages

Let's proceed to install the required packages for accessing the Domain via python.
We will be installing the `syft` library. Syft can be installed using the python `pip` package as follows:

```python
    pip install 'git+https://github.com/OpenMined/PySyft@dev#egg=syft&subdirectory=packages/syft

```
You may choose to skip this step if you already have the latest version of `syft` package installed in your environment.


**Note:** `syft` is currently supported with python versions **3.8 and above**. Older versions may work, however we have stopped testing and supporting them. If you want to further know on how you can install `syft` in a virtual environment based on your operating system, please refer to the [docs](https://openmined.github.io/PySyft/getting_started/index.html).

In [None]:
# Uncomment the line below to install syft
# perform this step if you're using google colab
#!pip install 'git+https://github.com/OpenMined/PySyft@dev#egg=syft&subdirectory=packages/syft

Let's check if syft is installed successfully.

In [5]:
import syft as sy
print(f"You're running syft version: {sy.__version__}")

You're running syft version: 0.7.0-beta.16


## Log into the Domain

Now that we have successfully installed `syft`, let's move on how one can log into a Domain node. There are two ways to log into your own node, as the Data Owner.

1. Using the PySyft library
2. Using the Web Interface

### Using the PySyft library

Let's use the `syft` library to login in to your domain and get an authenticated client to your Domain node.

To login into the your domain you will need the following credentials:
- url to the domain: Here the value in the `DOMAIN_HOST_IP` is the url to your domain.
- email address: We will use the default email (`ADMIN_EMAIL`) set on domain creation.
- password: We will use the default password (`ADMIN_PASSWORD`) set on domain creation.
- port number: Port number on which the domain server is provisioned (defaults to 80).

In [6]:
# Let's log into the domain using the credentials
try:
    domain_client = sy.login(url=DOMAIN_HOST_IP, email=ADMIN_EMAIL, password=ADMIN_PASSWORD)
except Exception as e:
    print("Unable to connect, did you set the `DOMAIN_HOST_IP` variable above?")
    raise e


Anyone can login as an admin to your node right now because your password is still the default PySyft username and password!!!

Connecting to 20.25.54.155... done! 	 Logging into goofy_lim... done!


### Using the Web Interface

We can access the domain node using a Web Interface via the Ip Address defined in the variable `DOMAIN_HOST_IP`.

One can simply access the UI by replacing `DOMAIN_HOST_IP` with your Domain IP Address in the url below.

```python
    http://<your_domain_host_ip>/login/
```

To login into the your domain you will need the following credentials:
- email address: We will use the default email (`info@openmined.org`) set on domain creation
- password: We will use the  default password (`changethis`) set on domain creation


On accessing the url defined above you should be able to see the login page as shown in the image below.

![Domain Login Page](img/pygrid_ui_login.png)

On a sucessful login you will be redirected the users page, where you can manage all the users that have signed up to your domain.

## Network Node

Our next step would be to connect to a Network Node. So, what is a Network Node?

A Network Node is a level of abstraction above a Domain node. It is a server which exists outside of any data owner's institution, providing services to the network of data owners and data scientists.

Therefore, a network node can be considered a collection of domains. A Network acts a bridge between between its members and subscribers. The members are **`Domains`** while subscribers are the **`end-users (e.g. Data Scientist)`** who explore and perform analysis on the datasets hosted by the members.

Thus, in short, a Network node provides a secure interface between its members and subscribers.

For the scope of this demonstration, *OpenMined* has created a Network node, to which we will register our Domain Node later in the notebook.

### Connect to a Network

Let's login into the Network node. The variable `NETWORK_IP` contains the URL/IP address to the Network node hosted by OpenMined.

Since we will be logging into the Network node as a Guest User, therefore we don't need to provide an email or password as part of the login. As a *GUEST USER*, our scope will be limited to only a few operations/functionalities.

**Note:** Network node is a fairly new concept and is under rapid development. New functionalities will be added to it soon.

In [7]:
# Logging to the network node
network_client = sy.login(url=NETWORK_IP)

Connecting to 20.121.101.41... done! 	 Logging into trusting_goodfellow... as GUEST...done!


On successful login, we will receive an authenticated client.

Now that we have an authenticated client to the network, let's list the available domains on this Network.

In [8]:
# List the available domains on this Network
network_client.domains

                                             

Unnamed: 0,host_or_ip,id,is_vpn,name
0,100.64.0.4,70cfc3985f35466bb4368601be94d1ef,1,gracious_smola
1,100.64.0.5,eb7f362d3cd54e0cb73445a13ea0d5e2,1,amazing_hotz


### Join the Network

As part of the next step, we will be joining the OpenMined network. Applying to a network will allow us to be listed as part of the Network.

Let's apply to the Network. When we apply to join a network, the Domain client connects to the Network node through a secured VPN protocol (if a protocol is not established, then it will try to establish one) and then sends a request to join the Network.

In [9]:
# Let's apply to the Network
domain_client.apply_to_network(network_client)

🔌 <DomainClient - goofy_lim: <UID: 5cc32c8194f9463fab1d8f25d7fab583>> successfully connected to the VPN: http://20.121.101.41:80/api/v1
Waiting to connect to VPN.
Connected to VPN
Application submitted.


On a successful request, our Domain is registered to the network node. Let's check this by listing the available domains on the network node.

In [10]:
# Listing the available domains on the Network
# to check if our Domain is present on it or not.
network_client.domains

                                             

Unnamed: 0,host_or_ip,id,is_vpn,name
0,100.64.0.4,70cfc3985f35466bb4368601be94d1ef,1,gracious_smola
1,100.64.0.5,eb7f362d3cd54e0cb73445a13ea0d5e2,1,amazing_hotz
2,100.64.0.6,5cc32c8194f9463fab1d8f25d7fab583,1,goofy_lim


We can also check if the Network node is connected to the Domain via VPN by calling`.vpn_status()` method on the `<domain_client>`. If the Network node is succesfully connected to the Domain via the VPN, then it should be present in the `peers` list (in the response returned by `.vpn_status()` method).

In [11]:
# Verify if domain is connected to the Network node via VPN.
domain_client.vpn_status()

{'status': 'ok',
 'connected': True,
 'host': {'ip': '100.64.0.6',
  'hostname': 'goofy_lim',
  'network': 'omnet',
  'os': 'linux',
  'connection_info': '-',
  'connection_status': 'n/a',
  'connection_type': 'n/a'},
 'peers': [{'ip': '100.64.0.5',
   'hostname': 'amazing_hotz',
   'network': 'omnet',
   'os': 'linux',
   'connection_info': 'offline',
   'connection_status': 'n/a',
   'connection_type': 'n/a'},
  {'ip': '100.64.0.3',
   'hostname': 'focused_howard',
   'network': 'omnet',
   'os': 'linux',
   'connection_info': 'offline',
   'connection_status': 'n/a',
   'connection_type': 'n/a'},
  {'ip': '100.64.0.4',
   'hostname': 'gracious_smola',
   'network': 'omnet',
   'os': 'linux',
   'connection_info': 'offline',
   'connection_status': 'n/a',
   'connection_type': 'n/a'},
  {'ip': '100.64.0.2',
   'hostname': 'node',
   'network': 'omnet',
   'os': 'linux',
   'connection_info': 'offline',
   'connection_status': 'n/a',
   'connection_type': 'n/a'},
  {'ip': '100.64.0.1'

Great !!! Now that we are part of the Network node, let's move on to upload the MedNIST dataset onto our Domain node.

## Create a Dataset

### MedNIST Dataset

We will be using the MedNIST dataset. The MedNIST dataset was gathered from several sets from TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset.

The dataset is kindly made available by Dr. Bradley J. Erickson M.D., Ph.D. (Department of Radiology, Mayo Clinic) under the Creative Commons CC BY-SA 4.0 license. If you use the MedNIST dataset, please acknowledge the source, e.g. https://colab.research.google.com/drive/1wy8XUSnNWlhDNazFdvGBHLfdkGvOHBKe#scrollTo=ZaHFhidyCBJa

Let's move on to download and extract the dataset.

The dataset has been stored pickle file. Lets download the dataset using the instructions below.

In [13]:
# Download the dataset
print("Downloading Dataset....")
!wget -q "https://media.githubusercontent.com/media/shubham3121/datasets/main/MedNIST/MedNIST.pkl"

Downloading Dataset....


Now, before we move forward, let's store some variables related to the dataset.

We require your participant number and the total participant count in the session to allocate you a unique subset of the MedNIST data.

In [14]:
# file path where the MedNIST.pkl is downloaded
FILE_PATH = "./MedNIST.pkl"

# Update the allotted participant number in this variable
MY_PARTICIPANT_NUMBER = 1

# Update the total participants count in this variable
TOTAL_PARTICIPANTS = 10

### Load the Dataset

Below are some helper methods, thatwe will require to load the dataset.

In [15]:
# Helper Methods

import os
import json
import pandas as pd
from PIL import Image
from enum import Enum
from collections import defaultdict
import numpy as np
from syft.core.adp.data_subject_list import DataSubjectList


def get_label_mapping():
    # the data uses the following mapping
    mapping = {
        "AbdomenCT": 0, 
        "BreastMRI": 1, 
        "CXR": 2, 
        "ChestCT": 3, 
        "Hand": 4, 
        "HeadCT": 5
    }
    return mapping

def load_data_as_df(file_path="./MedNIST.pkl"):
    df = pd.read_pickle(file_path)
    df.sort_values("patient_id", inplace=True, ignore_index=True)
    
    # Calculate start and end index based on your participant number
    batch_size = df.shape[0] // TOTAL_PARTICIPANTS
    start_idx = (MY_PARTICIPANT_NUMBER - 1) * batch_size
    end_idx = start_idx + batch_size
    
    # Slice the dataframe according
    df = df[start_idx: end_idx]
    
    # Get label mapping
    mapping = get_label_mapping()
    
    total_num = df.shape[0]
    print("Columns:", df.columns)
    print("Total Images:", total_num)
    print("Label Mapping", mapping)
    return df

def get_data_description(data):
    unique_label_cnt = data.label.nunique()
    lable_mapping = json.dumps(get_label_mapping())
    image_size = data.image[0].shape
    description = f"The MedNIST dataset was gathered from several sets from TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset. The dataset is kindly made available by Dr. Bradley J. Erickson M.D., Ph.D. (Department of Radiology, Mayo Clinic) under the Creative Commons CC BY-SA 4.0 license.\n"
    description += f"Label Count: {unique_label_cnt}\n"
    description += f"Label Mapping: {lable_mapping}\n"
    description += f"Image Dimensions: {image_size}\n"
    description += f"Total Images: {data.shape[0]}\n"
    return description

In [16]:
# Let's load the dataset as a dataframe
dataset_df = load_data_as_df(FILE_PATH)

Columns: Index(['patient_id', 'image', 'label'], dtype='object')
Total Images: 5895
Label Mapping {'AbdomenCT': 0, 'BreastMRI': 1, 'CXR': 2, 'ChestCT': 3, 'Hand': 4, 'HeadCT': 5}


In [17]:
# Let's get a peek of the dataset
dataset_df.head()

Unnamed: 0,patient_id,image,label
0,11000,"[[101, 101, 101, 101, 101, 101, 101, 101, 101,...",0
1,11002,"[[25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, ...",5
2,11002,"[[126, 126, 126, 126, 126, 126, 126, 126, 126,...",3
3,11004,"[[3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 3, 3,...",4
4,11004,"[[101, 101, 101, 101, 101, 101, 101, 101, 101,...",0


Get the dataset description, that needs to be provided to the domain while uploading the dataset.

In [18]:
dataset_description = get_data_description(dataset_df)
print(dataset_description)

The MedNIST dataset was gathered from several sets from TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset. The dataset is kindly made available by Dr. Bradley J. Erickson M.D., Ph.D. (Department of Radiology, Mayo Clinic) under the Creative Commons CC BY-SA 4.0 license.
Label Count: 6
Label Mapping: {"AbdomenCT": 0, "BreastMRI": 1, "CXR": 2, "ChestCT": 3, "Hand": 4, "HeadCT": 5}
Image Dimensions: (64, 64)
Total Images: 5895



We can see that dataset description contains a brief info about the dataset and also a few meta information related to the dataset.

### Prepare Dataset for upload

Let's create data subjects list. Data Subjects are the individuals whose privacy we're trying to protect. Here the patients are the data subjects.

In [19]:
data_subjects = DataSubjectList.from_series(dataset_df['patient_id'])

Next we need to convert our image and label data to numpy array of type **int64**.

In [20]:
# Convert images to numpy int64 array
images = dataset_df['image']
images = np.dstack(images.values).astype(np.int64)
images = np.rollaxis(images,-1)

In [21]:
# Convert labels to numpy int64 array
labels = dataset_df['label'].to_numpy().astype("int64")

Next we will make your data private private with min, max and data subjects. The min and max are minimum and maximum value in the given data.

In [22]:
# converting images to private data
image_data = sy.Tensor(images).private(min_val=0, max_val=255, data_subjects=data_subjects)

In [23]:
# converting labels to private data
label_data = sy.Tensor(labels).private(min_val=0, max_val=5, data_subjects=data_subjects)

Finally, we will upload the images and labels to the domain.

In [24]:
# creating/uploading the dataset

# Name of the dataset
name = "MedNIST Data"

# upload the MedNIST data
domain_client.load_dataset(
    assets={"images": image_data, "labels": label_data},
    name=name,
    description=dataset_description,
    use_blob_storage=True
)

Loading dataset... uploading...🚀                                                                                                                                             

Uploading `images`: 100%|[32m█████████████████████████████████████████████[0m| 1/1 [00:03<00:00,  3.53s/it][0m
Uploading `labels`: 100%|[32m█████████████████████████████████████████████[0m| 1/1 [00:00<00:00,  1.41it/s][0m


Dataset is uploaded successfully !!! 🎉

Run `<your client variable>.datasets` to see your new dataset loaded into your machine!


Now let's check if the dataset we successfully uploaded

In [25]:
domain_client.datasets

Idx,Name,Description,Assets,Id
[0],MedNIST Data,"The MedNIST dataset was gathered from several sets from TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset. The dataset is kindly made available by Dr. Bradley J. Erickson M.D., Ph.D. (Department of Radiology, Mayo Clinic) under the Creative Commons CC BY-SA 4.0 license. Label Count: 6 Label Mapping: {""AbdomenCT"": 0, ""BreastMRI"": 1, ""CXR"": 2, ""ChestCT"": 3, ""Hand"": 4, ""HeadCT"": 5} Image Dimensions: (64, 64) Total Images: 5895","[""images""] -> [""labels""] ->",d4eb895b-1e24-4434-aa78-17fb08dd6693


## Create a Data Scientist Account

In [17]:
data_scientist_details = {
    "name": "Samantha Carter",
    "email": "sam@sg1.net",
    "password": "stargate",
    "budget": 9999,
}

In [18]:
domain_client.users.create(**data_scientist_details)

In [19]:
print("Please give these details to the data scientist:")
login_details = {}
login_details["url"] = HOST_IP
login_details["name"] = data_scientist_details["name"]
login_details["email"] = data_scientist_details["email"]
login_details["password"] = data_scientist_details["password"]
print(login_details)

Please give these details to the data scientist:
{'url': '52.157.8.193', 'name': 'Samantha Carter', 'email': 'sam@sg1.net', 'password': 'stargate'}
