# Medical Federated Learning Program- Data Science

Welcome!
At this point you have:
- Deployed a domain node using a `hagrid launch` command
- Annotated a dataset:

In [26]:
import syft as sy
import numpy as np

# Let's say this is our raw, private data:
data = np.array([1,2,3,4])

# These are the IDs of who it belongs to:
patients = np.array([901, 902, 903, 904])

# We add metadata
dataset = sy.Tensor(data).private(
    min_val=0, 
    max_val=5, 
    data_subjects=patients
)

- Uploaded this annotated dataset to the domain node

In [None]:
# We log on to a domain node:
domain_node = sy.login(email="info@openmined.org", password="changethis", port=8081)

domain_node.load_dataset(
    assets={"data": x},
    name="Example data",
    description="For illustrative purposes only"
)

- Connected this domain node to a Network node
- And created a Data Scientist account!

In [None]:
domain_node.create_user(
    name="Sam Carter",
    email="sam@stargate.net",
    password="changethis",
    budget=9999
)

In this notebook, we'll be demonstrating the basic building blocks that enable a data scientist to use this data, while preserving the privacy, security and integrity of the data you uploaded to your domain node.

Now let's walk through how a data scientist would go about using this data, and this infrastructure!

## Accessing data you don't have: Introducing Tensor Pointers

The raw, private data that you uploaded to your domain node never leaves your domain node.
How then do data scientists work with it? The answer is called a **Tensor Pointer.**

## Working with data you don't have: Remote Procedure Calls

So now we know that **Tensor Pointers** let you access data on another domain node without having to make a copy of it.
But this is only half the story; afterall, we don't just want to access data, we want to *work with it!* How do we do that?

The answer is something called **Remote Procedure Calls**. Let's start with our tensor pointer:

## Getting results you can see: Differential Privacy

To recap: we now know that **Tensor Pointers** give a data scientist the ability to work on data (using **Remote Procedure Calls**) without physically having it on their device!

Now let's say you've done your analysis. How do you actually get results? And how do we make sure the data scientist seeing the results of their analysis doesn't invade or violate anyone's privacy?

The answer lies in something called **Differential Privacy**.

In [None]:
domain_node.privacy_budget

In [None]:
# .get()

In [None]:
# .publish()

## Finding data you don't have: Network Nodes



## Combining data from many nodes: Secure Multiparty Computation

Okay- so we used a Tensor Pointer, used its remote procedure calls to conduct an experiment, and got the result by spending some privacy budget.
That sounds great- but do you have to do this, one by one, for every domain node you want to work with? Or is there any way you can use data from multiple domain nodes?

The answer is something called **Secure Multiparty Computation**.

## Everything combined: PySyft illustration

