# Lesson 2: Remote Data Science Demo!

In the next 15 minutes, we're going to give you a rapid run through where you as a Data Owner, will prepare a dataset, deploy a domain node, upload that dataset to the domain node.
Finally, as a Data Scientist, you'll take this data and perform remote data science on it!

If any part of this demo doesn't seem clear right now, don't worry! We'll be walking you through each of these steps in more detail in subsequent lessons.

## Steps

Run the following commands in a Terminal:

### <b>1. Clone the PySyft repository, fetch and checkout the dev branch: </b>

```
git clone https://github.com/OpenMined/PySyft && cd PySyft
git fetch origin dev
git checkout dev
```

### <b>2. Launch <a href="https://www.docker.com/products/docker-desktop">Docker Desktop</a>, and ensure it has atleast 8GB of RAM available. </b>


### <b>3. Create a Virtual Environment with Python 3.9<b>

```
conda create -n lab python=3.9
conda activate lab
```

### <b>4. Install & Launch HAGrid</b>

```
cd packages/hagrid && pip install -e .
hagrid launch <node_name>
```

### <b>5. Install & Launch Syft </b>

```
cd packages/syft && pip install -e .
```

### <b>6. Login to the Domain!</b>


<i>Voila!</i> You now have a working domain node. To login, go to <b>localhost:"port_number"</b> in your browser (port_number is 8081 by default).

The default username and password are as follows:

- Username: <b> info@openmined.org </b>
- PW: <b> changethis </b>

### 7. Upload your Dataset to your Domain!

Open Python, and run the following:

In [None]:
import numpy as np
import syft as sy
from syft.core.adp.entity import Entity

raw_data = np.random.choice([0, 1], size=(10)).astype(np.int32)
dataset = {}

for person_index, test_result in enumerate(raw_data):
    data_owner = Entity(name=f'Patient #{person_index}')
    dataset[person_index] = sy.Tensor(np.ones(1, dtype=np.int32) * test_result).private(min_val=0, max_val=1, entities=data_owner)


domain_node = sy.login(email="info@openmined.org", password="changethis", port=8081)
domain_node.load_dataset(assets=dataset, name="COVID19 Test Results", description="Positive/Negative COVID19 Test results", metadata="No metadata")

In [68]:
domain_node.datasets

Idx,Name,Description,Assets,Id
[0],COVID19 Test Results,Positive/Negative COVID19 Test results,"[""0""] -> Tensor [""1""] -> Tensor [""2""] -> Tensor [""3""] -> Tensor [""4""] -> Tensor [""5""] -> Tensor [""6""] -> Tensor [""7""] -> Tensor [""8""] -> Tensor [""9""] -> Tensor",3de3d35a-5c22-44e8-8aef-b24df4c2c07d


## 8. Pretend to be a Customer
Create a data scientist account, and ask to use the dataset in this domain node!

In [70]:
domain_node.users.create(
    **{
        "name": "Sheldon Cooper",
        "email": "sheldon@caltech.edu",
        "password": "bazinga",
        "budget": 100
    }
)

Let's quickly double check that the Data Scientist account was created properly, by checking the list of users:

In [71]:
domain_node.users

Unnamed: 0,added_by,allocated_budget,budget,budget_spent,created_at,daa_pdf,email,id,institution,name,role,verify_key,website
0,<syft.lib.python._SyNone object at 0x7feaa814d...,0.0,5.55,0.0,2021-11-01 05:24:06.991358,<syft.lib.python._SyNone object at 0x7feaa814d...,info@openmined.org,1,<syft.lib.python._SyNone object at 0x7feaa814d...,Jane Doe,Owner,e07db5d214010770d9146551d4dc4f3daf1bb7d9c89e66...,<syft.lib.python._SyNone object at 0x7feaa814d...
1,Jane Doe,0.0,100.0,0.0,2021-11-01 05:27:59.352133,1,sheldon@caltech.edu,2,,Sheldon Cooper,Data Scientist,eff162ee3f57400fb11d204cdbfd625d58b229901fdb23...,


## 9. Log into the Domain Node, as the Data Scientist

In [72]:
ds_node = sy.login(email="sheldon@caltech.edu", password="bazinga", port=8081)

Connecting to http://localhost:8081... done! 	 Logging into adp... done!


## 10. View the available datasets on the Node

In [73]:
ds_node.datasets

Idx,Name,Description,Assets,Id
[0],COVID19 Test Results,Positive/Negative COVID19 Test results,"[""0""] -> Tensor [""1""] -> Tensor [""2""] -> Tensor [""3""] -> Tensor [""4""] -> Tensor [""5""] -> Tensor [""6""] -> Tensor [""7""] -> Tensor [""8""] -> Tensor [""9""] -> Tensor",3de3d35a-5c22-44e8-8aef-b24df4c2c07d


In [74]:
# Let's get a pointer to the dataset
dataset = ds_node.datasets[0]

<b> Let's try to find out how many people in this dataset had COVID19... </b>

In [79]:
dataset

Dataset: COVID19 Test Results
Description: Positive/Negative COVID19 Test results



Asset Key,Type,Shape
"[""0""]",Tensor,"(1,)"
"[""1""]",Tensor,"(1,)"
"[""2""]",Tensor,"(1,)"
"[""3""]",Tensor,"(1,)"
"[""4""]",Tensor,"(1,)"
"[""5""]",Tensor,"(1,)"
"[""6""]",Tensor,"(1,)"
"[""7""]",Tensor,"(1,)"
"[""8""]",Tensor,"(1,)"
"[""9""]",Tensor,"(1,)"


Let's try to see if we, as the Data Scientist, working with this remote dataset, can see what it looks like by printing it.

In [75]:
print(dataset)

<syft.core.node.common.client_manager.dataset_api.Dataset object at 0x7fea9a251040>


Now, normally, if we had a Tensor object (such as the ones from PyTorch or Tensorflow), we'd be able to print it and look at it. However, as you can see, this returns a Dataset object 

In [77]:
results = [dataset[f'{i}'] for i in range(10)]

In [82]:
from time import sleep
total_cases = 0
for result in results:
    ptr = result.publish()
    sleep(1)
    total_cases += ptr.get()

In [84]:
print(f'The total number of COVID19 cases are: {total_cases[0]}')

The total number of COVID19 cases are: 6.811603905014408


## 11. Shut down your Domain Node
Open a new terminal window with the virtual environment active:

```
conda activate lab
```

Run this command in your terminal:

```
hagrid land all
```

# <marquee> CONGRATULATIONS! </marquee>