# Hello Syft

PySyft is a python library containing a set of data serialization and remote code execution APIs which mimic existing popular Data Science tools while working interchangeably with existing popular data types. It enables data scientists query for their data related questions on sensitive or proprietary data in a secure and privacy-preserving way. The python package for PySyft is called `syft`. 

In this tutorial, we will cover the following workflows:

- Data Owner Workflow - Part 1
    - upload mock data
- Data Scientist Workflow  - Part 1
    - write query against mock data
    - submit code for review on the data owner side
- Data Owner Workflow - Part 2
    - review code and approve
    - share the real result with the data scientist
- Data Scientist Workflow - Part 2
    - fetch the real result

For more detailed tutorials for each subject, please refer to `data-owner` and `data-scientist` tutorials.

## Install `syft`

In [None]:
SYFT_VERSION = ">=0.8.1b0,<0.9"
package_string = f'"syft{SYFT_VERSION}"'
# %pip install {package_string} -f https://whls.blob.core.windows.net/unstable/index.html

In [None]:
import syft as sy
sy.requires(SYFT_VERSION)

## Launch a dummy server 

In this tutorial, for the sake of demonstration, we will be using in-memory workers as dummy servers. For details of deploying a server on your own using `syft` and `hagrid`, please refer to the `quickstart` tutorials.

In [None]:
node = sy.orchestra.launch(name="hello-syft-usa-server", port=9082, reset=True)
root_domain_client = node.login(email="info@openmined.org", password="changethis")
root_domain_client.register(name="Jane Doe", email="janedoe@caltech.edu",
                            password="abc123", institution="Caltech", website="https://www.caltech.edu/")

external_ds = node.login(email="janedoe@caltech.edu", password="abc123")

## Data owner - Part 1

### Upload Data to Domain

In [None]:
import pandas as pd

In [None]:
usa_data = {
      'Patient_ID': ['011', '015', '022', '034', '044'],
      'Age': [40, 39, 35, 60, 25]
}

Now based on the original data, the data owner will generate a synthetic or fake version of this dataset. They can add any amount of noise to the fake values. Let's say in this fake version, they are adding `+10` to each of the ages.

In [None]:
usa_mock_data = {
      'Patient_ID': ['1', '2', '3', '4', '5'],
      'Age': [50, 49, 45, 70, 35]
}

In [None]:
dataset = sy.Dataset(name="usa-mock-data",
                     asset_list=[sy.Asset(name="ages", data=pd.DataFrame(usa_data), mock=pd.DataFrame(usa_mock_data), mock_is_real=False)])
root_domain_client.upload_dataset(dataset)

## Data Scientist - Part 1

### Load Mock Data

In [None]:
asset = external_ds.datasets[-1].assets["ages"]

### Write Query on Mock Data

In [None]:
age_sum = asset.mock['Age'].mean()
print(age_sum)

In [None]:
@sy.syft_function(input_policy=sy.ExactMatch(df=asset),
                  output_policy=sy.SingleExecutionExactOutput())
def get_mean_age(df):
    return df['Age'].mean()

### Submit Code Request for Review

In [None]:
req = external_ds.code.request_code_execution(get_mean_age)
req

In [None]:
submitted_code = external_ds.code[0]
submitted_code

In [None]:
assert external_ds.code.get_all()

### Create and submit project

In [None]:
project = sy.Project(
    name="My data science code on USA cancer mock data",
    description="Hi, I would like to know the average age of cancer patients in your data.",
    members=[external_ds],
)
project

In [None]:
project.create_code_request(get_mean_age, external_ds)
project.start()

The code request is successfully submitted!

## Data Owner - Part 2

### Get Requests

In [None]:
root_domain_client = node.login(email="info@openmined.org", password="changethis")
project = root_domain_client.projects[0]

In [None]:
project.requests

In [None]:
request = project.requests[0]

### Review Code and Policies

In [None]:
func = request.changes[0].link
func

In [None]:
print(func.code)

### Execute function on real data

In [None]:
get_mean_age_user_function = func.unsafe_function

In [None]:
real_data = func.assets[0].data
real_result = get_mean_age_user_function(df=real_data)
print(real_result)

### Share the real result with the Data Scientist

In [None]:
result = request.accept_by_depositing_result(real_result)
print(result)
assert isinstance(result, sy.SyftSuccess)

## Data Scientist - Part 2

### Fetch Real Result

In [None]:
asset = external_ds.datasets[0].assets[0]
asset

In [None]:
external_ds.code[0].status

In [None]:
result_ptr = external_ds.code.get_mean_age(df=asset)
# result_ptr

In [None]:
real_result = result_ptr.get_from(external_ds)
print(real_result)

**That's a success!! The external data scientist was able to know the average age of breast cancer patients in a USA regional hospital, without having to access or even look at the real data.**

Once you are done with this tutorial, you can safely shut down the servers as following,

In [None]:
node.land()