# Hello Syft

PySyft is a python library containing a set of data serialization and remote code execution APIs which mimic existing popular Data Science tools while working interchangeably with existing popular data types. It enables data scientists query for their data related questions on sensitive or proprietary data in a secure and privacy-preserving way. The python package for PySyft is called `syft`. 

In this tutorial, we will cover the following workflows:

- Data Owner Workflow - Part 1
    - upload mock data
- Data Scientist Workflow  - Part 1
    - write query against mock data
    - submit code for review on the data owner side
- Data Owner Workflow - Part 2
    - review code and approve
    - share the real result with the data scientist
- Data Scientist Workflow - Part 2
    - fetch the real result

For more detailed tutorials for each subject, please refer to `data-owner` and `data-scientist` tutorials.

## Install `syft`

In [None]:
SYFT_VERSION = ">=0.8.2.b0,<0.9"
package_string = f'"syft{SYFT_VERSION}"'
# %pip install {package_string} -q

In [None]:
# syft absolute
import syft as sy

sy.requires(SYFT_VERSION)

## Launch a dummy server 

In this tutorial, for the sake of demonstration, we will be using in-memory workers as dummy servers. For details of deploying a server on your own using `syft`.

In [None]:
server = sy.orchestra.launch(name="hello-syft-usa-server", port=9000, reset=True)
root_datasite_client = server.login(email="info@openmined.org", password="changethis")
root_datasite_client.register(
    name="Jane Doe",
    email="janedoe@caltech.edu",
    password="abc123",
    password_verify="abc123",
    institution="Caltech",
    website="https://www.caltech.edu/",
)

ds_client = server.login(email="janedoe@caltech.edu", password="abc123")

## Data owner - Part 1

### Upload Data to Datasite

In [None]:
# third party
import pandas as pd

The first thing we do as a data owner is uploading our dataset. Based on the original data, the data owner will generate a synthetic or fake version of this dataset. They can add any amount of noise to the fake values. Let's say in this fake version, they are adding `+10` to each of the ages.

In [None]:
dataset = sy.Dataset(
    name="usa-mock-data",
    description="Dataset of ages",
    asset_list=[
        sy.Asset(
            name="ages",
            data=pd.DataFrame(
                {
                    "Patient_ID": ["011", "015", "022", "034", "044"],
                    "Age": [40, 39, 35, 60, 25],
                },
            ),
            mock=pd.DataFrame(
                {"Patient_ID": ["1", "2", "3", "4", "5"], "Age": [50, 49, 45, 70, 35]},
            ),
            mock_is_real=False,
        ),
    ],
)
root_datasite_client.upload_dataset(dataset)

## Data Scientist - Part 1

### Load Mock Data

The data scientist can get access to the `Assets` uploaded by the `Data Owner`, and the mock version of the data

In [None]:
asset = ds_client.datasets[-1].assets["ages"]

In [None]:
asset

In [None]:
mock = asset.mock
mock

### Write Query on Mock Data

We can use the mock to develop against

In [None]:
age_sum = mock["Age"].mean()
print(age_sum)

When we are done, we wrap the code into a function decorated with a `syft_function`, in this case the most simple version, `syft_function_single_use`. Read more about syft_functions in the data scientist tutorials.

In [None]:
@sy.syft_function_single_use(df=asset)
def get_mean_age(df):
    return df["Age"].mean()

### Submit Code Request for Review

In [None]:
req = ds_client.code.request_code_execution(get_mean_age)
req

The code request is successfully submitted!

## Data Owner - Part 2

### Get Requests

As a data owner, we can now view and approve the request

In [None]:
root_datasite_client.requests

In [None]:
request = root_datasite_client.requests[0]

In [None]:
str_changes = []
for change in request.changes:
    if change.id in request.current_change_state:
        print("A")
        str_change = (
            change.__repr_syft_nested__()
            if hasattr(change, "__repr_syft_nested__")
            else type(change)
        )
        str_change = f"{str_change}. "
        str_changes.append(str_change)
str_changes = "\n".join(str_changes)

### Review Code and Policies

Before we approve, we want to inspect the code and the policies

In [None]:
usercode = request.code

In [None]:
usercode

### Execute function on real data

Now that we have seen the code we can run it

In [None]:
get_mean_age_user_function = usercode.run

In [None]:
asset = usercode.assets[0]
real_result = get_mean_age_user_function(df=asset)
print(real_result)

### Approving the request

In [None]:
result = request.approve()
assert isinstance(result, sy.SyftSuccess)
result

## Data Scientist - Part 2

### Computing the Real Result

As a Data scientist, we can now fetch the result

In [None]:
asset = ds_client.datasets[0].assets[0]

In [None]:
ds_client.code[0].status

In [None]:
result_ptr = ds_client.code.get_mean_age(df=asset)

In [None]:
real_result = result_ptr.get()
print(real_result)

**That's a success!! The external data scientist was able to know the average age of breast cancer patients in a USA regional hospital, without having to access or even look at the real data.**

## Final note: autocomplete

Earlier in this tutorial, we used services defined on the client, such as `ds_client.code.request_code_execution`. To find out more about the available methods, like `.request_code_execution()`, and services, like `client.code` you can use autocomplete, simply type `ds_client.code.<tab>` or `ds_client.services.<tab>` for an example.

In [None]:
# autocompletion, but programtic. To test it out, just type client.services.<tab> instead in a new cell
autocompleter = get_ipython().Completer
_, completions1 = autocompleter.complete(text="ds_client.code.")
_, completions2 = autocompleter.complete(text="ds_client.services.")
_, completions3 = autocompleter.complete(text="ds_client.api.services.")
_, completions4 = autocompleter.complete(text="ds_client.api.")

In [None]:
assert all(
    [
        "ds_client.code.get_all" in completions1,
        "ds_client.services.code" in completions2,
        "ds_client.api.services.code" in completions3,
        "ds_client.api.code" in completions4,
        "ds_client.api.parse_raw" not in completions4,  # no pydantic completions on api
    ],
)

Once you are done with this tutorial, you can safely shut down the servers as following,

In [None]:
server.land()