### Playing with PETs

In the previous sessions we have a brief introduction on how to launch a domain/network and do basic computations on private data.

In todays session we will dive more in detail on what is takes to upload a dataset.

### Prerequisites

- conda or miniconda installed. We can refer to [this link](https://docs.anaconda.com/anaconda/install/) to install conda.
- Python  3.7 or higher. If we have conda already installed but have a lower Python version installed, then we can create a new virtual environment as follows:
    - `conda create -n pysyft python=3.9`
    - `conda activate pysyft`
- Jupyter notebook or jupyterlab.
    - To install jupyter notebook - `conda install jupyter` or `pip install notebook`
    - To install jupyterlab - `conda install -c conda-forge jupyterlab` or `pip install jupyterlab`
    
Please refer to the [PySyft documentation](https://openmined.github.io/PySyft/getting_started/index.html) for more detailed OSwise installation instructions.

### HAGRID CLI Tool

Hagrid is a command line tool used to deploy a Domain or Network node.

In [None]:
!pip install hagrid

In [2]:
!hagrid

[1;31mHA[0m[1;35mGrid[0m! 🧙 [1;31mEDITABLE DEV MODE[0m 🚨
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃[1m [0m[1mDependency         [0m[1m [0m┃[1m [0m[1mFound[0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│[35m [0m[35m📘 ansible-playbook[0m[35m [0m│    ✅ │
│[35m [0m[35m🐳 docker          [0m[35m [0m│    ✅ │
│[35m [0m[35m📁 git             [0m[35m [0m│    ✅ │
└─────────────────────┴───────┘
Usage: hagrid [1m[[0mOPTIONS[1m][0m COMMAND [1m[[0mARGS[1m][0m[33m...[0m

Options:
  --help  Show this message and exit.

Commands:
  check       Check health of an IP address/addresses or a resource group
  clean       Restore some part of the hagrid installation or deployment...
  debug       Show HAGrid debug information
  land        Stop a running PyGrid domain/network node.
  launch      Start a new PyGrid domain/network node!
  quickstart  Launch a Syft + Jupyter Session with a Notebook URL / Path
  ssh         SSH into the IP address or a resourc

### Install syft

In [None]:
!pip install syft --pre

### Launch a Domain Node

We use the launch command to start our private data server. 

The launch command is follows the given pattern:
`hagrid launch {name of the node} {type of the node: domain/network} --tag=latest`.

`tag=latest` means that we want to fetch the images (required by the PySyft) with tag=latest from dockerhub.

There are a bunch other options or flags which we can pass while launching a node. We can see them as below.

In [2]:
!hagrid launch --help

Usage: hagrid launch [OPTIONS] [ARGS]...

  Start a new PyGrid domain/network node!

Options:
  --username TEXT                 Optional: the username for provisioning the
                                  remote host
  --key_path TEXT                 Optional: the path to the key file for
                                  provisioning the remote host
  --password TEXT                 Optional: the password for provisioning the
                                  remote host
  --repo TEXT                     Optional: repo to fetch source from
  --branch TEXT                   Optional: branch to monitor for updates
  --tail TEXT                     Optional: don't tail logs on launch
  --headless TEXT                 Optional: don't start the frontend container
  --cmd TEXT                      Optional: print the cmd without running it
  --jupyter                       Optional: enable Jupyter Notebooks
  --build TEXT                    Optional: enable or disable forc

Now if you want to stop the running node or stack, we can use the `hagrid land` command.

To stop a particular domain/network node, we can specify the land command followed by the domain/network node name.

e.g. if the node is launch as follows:

`hagrid launch canada domain --tag=latest`

then, we can stop this domain as follows:

`hagrid land canada domain`

Moreover, if you stop all running domain/network nodes at once, we can call the command

`hagrid land all`

This kills all the running containers.


### Ctop

Ctop is a handy command line tool to view running containers/services. 

We can type `ctop` in the command line and we can see the list of containers running on the system.
We can navigate across different containers using arrow keys and press `q` on the keyboard to exit the tool.


For installation details click on the following link:
https://github.com/bcicen/ctop

### Login into Domain

In [81]:
import syft as sy

In [10]:
# Run the following cell to see the function signature / parameters that the login function takes.

sy.login?

In [82]:
domain_client = sy.login(url="localhost", port=8081, email="info@openmined.org", password="changethis")


Anyone can login as an admin to your node right now because your password is still the default PySyft username and password!!!

Connecting to localhost... done! 	 Logging into canada... done!


In [83]:
# View name
domain_client.name

'canada'

In [84]:
# List the datasets present on the domain
domain_client.datasets

In [85]:
# List the registered users that are there on the domain node
domain_client.users

Unnamed: 0,id,email,name,budget,verify_key,role,added_by,website,institution,daa_pdf,created_at,budget_spent
0,1,info@openmined.org,Jane Doe,5.55,82ff72f309716dafab2eef6d2dfcec9f784924aeb92848...,Owner,,,,,2022-08-25 11:34:36.552475,5.55


In [86]:
# List variables stored in the domain store
domain_client.store

In [87]:
# To check if our current domain is connected to any other node via VPN
domain_client.vpn_status()

{'status': 'ok', 'connected': False, 'host': {}, 'peers': []}

### Load Dataset

In [88]:
# Let's say we have a list of students and their total marks

import numpy as np
import pandas as pd

classA = [
    {"StudentName": "Bob", "TotalMarks": 65},
    {"StudentName": "Alice", "TotalMarks": 75},
    {"StudentName": "Sheldon", "TotalMarks": 90},
    {"StudentName": "Leonard", "TotalMarks": 80},
    {"StudentName": "Amy", "TotalMarks": 95},
]

classA_df = pd.DataFrame(classA)

In [89]:
classA_df

Unnamed: 0,StudentName,TotalMarks
0,Bob,65
1,Alice,75
2,Sheldon,90
3,Leonard,80
4,Amy,95


In [90]:
# Create a Syft Tensor

marks_tensor =  sy.Tensor(np.array(classA_df["TotalMarks"], dtype=np.int64))

In [91]:
marks_tensor, type(marks_tensor)

(Tensor(child=[65 75 90 80 95]), syft.core.tensor.tensor.Tensor)

In [8]:
# We need to add dp metadata to our marks_tensor,
# Let's check the function signature of the method below.

marks_tensor.annotated_with_dp_metadata?

In [92]:
# Add DP metadata to the marks_tensor

dp_annotated_marks_tensor = marks_tensor.annotated_with_dp_metadata(
    min_val=0, max_val=100, data_subjects=classA_df["StudentName"].values
)

Tensor annotated with DP Metadata


In [93]:
dp_annotated_marks_tensor.child.data_subjects

array([DataSubjectArray: {'Bob'}, DataSubjectArray: {'Alice'},
       DataSubjectArray: {'Sheldon'}, DataSubjectArray: {'Leonard'},
       DataSubjectArray: {'Amy'}], dtype=object)

In [94]:
dp_annotated_marks_tensor.child.min_vals, dp_annotated_marks_tensor.child.max_vals

(<lazyrepeatarray data: [0] -> shape: (5,)>,
 <lazyrepeatarray data: [100] -> shape: (5,)>)

In [92]:
# Next we will upload the marks data to the domain node.
# Let's check which method we can use and what is its function signature.

domain_client.load_dataset?

In [95]:
# Upload marks data to the domain

domain_client.load_dataset(
    assets={"Mathematics": dp_annotated_marks_tensor},
    name="Class A Sem 1 Performance Card",
    description="Semeter 1 Performance of class A",
)

Loading dataset...Loading dataset... checking assets...Loading dataset... checking dataset name for uniqueness...Loading dataset... checking dataset name for uniqueness...                                                                                                                    Loading dataset... checking asset types...                              Loading dataset... uploading...🚀                        

Uploading `Mathematics`: 100%|[32m████████████████████████████████████████[0m| 1/1 [00:00<00:00,  4.79it/s][0m

Dataset is uploaded successfully !!! 🎉

Run `<your client variable>.datasets` to see your new dataset loaded into your machine!





In [96]:
# List available datasets

domain_client.datasets

Idx,Name,Description,Assets,Id
[0],Class A Sem 1 Performance Card,Semeter 1 Performance of class A,"[""Mathematics""] ->",572942ed-f3b1-4b86-81cd-c22ae79f9a23


In [97]:
# Select the dataset

domain_client.datasets[0]

Dataset: Class A Sem 1 Performance Card
Description: Semeter 1 Performance of class A



Asset Key,Type,Shape
"[""Mathematics""]",,"(5,)"


In [100]:
# Select the Mathematics marks tensor
classA_marks_dataset = domain_client.datasets[0]

In [101]:
# View the maths_marks tensor
# We can see it lists synthetic data and not the real one.
maths_marks = classA_marks_dataset["Mathematics"]
maths_marks

array([36, 75, 53,  7, 84])

 (The data printed above is synthetic - it's an imitation of the real data.)

In [105]:
maths_marks.public_shape, maths_marks.public_dtype

((5,), 'int64')

In [106]:
maths_marks.id_at_location

<UID: 13574c2a804c4d0e8062447ce12c295f>

In [107]:
domain_client.store

Unnamed: 0,ID,Tags,Description,object_type
0,<UID: 13574c2a804c4d0e8062447ce12c295f>,[#Mathematics],,<class 'syft.core.tensor.tensor.Tensor'>


### Now its your turn !!! Go Ahead ...

- launch a domain node
- log into the domain
- upload a dataset