# Hello Syft

PySyft is a python library containing a set of data serialization and remote code execution APIs which mimic existing popular Data Science tools while working interchangeably with existing popular data types. It enables data scientists query for their data related questions on sensitive or proprietary data in a secure and privacy-preserving way. The python package for PySyft is called `syft`. 

In this tutorial, we will cover the following workflows:

- Data Owner Workflow - Part 1
    - upload mock data
- Data Scientist Workflow  - Part 1
    - write query against mock data
    - submit code for review on the data owner side
- Data Owner Workflow - Part 2
    - review code and approve
    - share the real result with the data scientist
- Data Scientist Workflow - Part 2
    - fetch the real result

For more detailed tutorials for each subject, please refer to `data-owner` and `data-scientist` tutorials.

## Install `syft`

In [1]:
SYFT_VERSION = ">=0.8.2.b0,<0.9"
package_string = f'"syft{SYFT_VERSION}"'
# %pip install {package_string} -f https://whls.blob.core.windows.net/unstable/index.html

In [2]:
import syft as sy
sy.requires(SYFT_VERSION)

✅ The installed version of syft==0.8.2b2 matches the requirement >=0.8.2b0 and the requirement <0.9


## Launch a dummy server 

In this tutorial, for the sake of demonstration, we will be using in-memory workers as dummy servers. For details of deploying a server on your own using `syft` and `hagrid`, please refer to the `quickstart` tutorials.

In [3]:
node = sy.orchestra.launch(name="hello-syft-usa-server", port=9000, reset=True)
root_domain_client = node.login(email="info@openmined.org", password="changethis")
root_domain_client.register(name="Jane Doe", email="janedoe@caltech.edu",
                            password="abc123", institution="Caltech", website="https://www.caltech.edu/")

ds_client = node.login(email="janedoe@caltech.edu", password="abc123")

Starting hello-syft-usa-server server on 0.0.0.0:9000


Waiting for server to start Done.


Logged into hello-syft-usa-server as <info@openmined.org>


Logged into hello-syft-usa-server as <janedoe@caltech.edu>


## Data owner - Part 1

### Upload Data to Domain

In [4]:
import pandas as pd

The first thing we do as a data owner is uploading our dataset. Based on the original data, the data owner will generate a synthetic or fake version of this dataset. They can add any amount of noise to the fake values. Let's say in this fake version, they are adding `+10` to each of the ages.

In [5]:
dataset = sy.Dataset(name="usa-mock-data",
                     description="Dataset of ages",
                     asset_list=[
                         sy.Asset(name="ages",
                               data=pd.DataFrame(
                                   {
                                   'Patient_ID': ['011', '015', '022', '034', '044'],
                                   'Age': [40, 39, 35, 60, 25]
                                   }
                               ),
                               mock=pd.DataFrame(
                                   {
                                   'Patient_ID': ['1', '2', '3', '4', '5'],
                                   'Age': [50, 49, 45, 70, 35]
                                   }
                               ),
                               mock_is_real=False)
                     ]
                    )
root_domain_client.upload_dataset(dataset)

  0%|          | 0/1 [00:00<?, ?it/s]

Uploading: ages


100%|██████████| 1/1 [00:00<00:00,  5.58it/s]

100%|██████████| 1/1 [00:00<00:00,  5.54it/s]




## Data Scientist - Part 1

### Load Mock Data

The data scientist can get access to the `Assets` uploaded by the `Data Owner`, and the mock version of the data

In [6]:
asset = ds_client.datasets[-1].assets["ages"]

In [7]:
asset

Patient_ID,Age
Loading... (need help?),


In [8]:
mock = asset.mock
mock

Unnamed: 0,Patient_ID,Age
0,1,50
1,2,49
2,3,45
3,4,70
4,5,35


### Write Query on Mock Data

We can use the mock to develop against

In [9]:
age_sum = mock['Age'].mean()
print(age_sum)

49.8


When we are done, we wrap the code into a function decorated with a `syft_function`, in this case the most simple version, `syft_function_single_use`. Read more about syft_functions in the data scientist tutorials.

In [10]:
@sy.syft_function_single_use(data=asset)
def get_mean_age(df):
    return df['Age'].mean()

Syft function 'get_mean_age' successfully created. To add a code request, please create a project using `project = syft.Project(...)`, then use command `project.create_code_request`.


### Submit Code Request for Review

In [11]:
req = ds_client.code.request_code_execution(get_mean_age)
req

The code request is successfully submitted!

## Data Owner - Part 2

### Get Requests

As a data owner, we can now view and approve the request

In [12]:
root_domain_client.requests

In [13]:
request = root_domain_client.requests[0]

In [14]:
str_changes = []
for change in request.changes:
    if change.id in request.current_change_state:
        print("A")
        str_change = (
            change.__repr_syft_nested__()
            if hasattr(change, "__repr_syft_nested__")
            else type(change)
        )
        str_change = f"{str_change}. "
        str_changes.append(str_change)
str_changes = "\n".join(str_changes)

### Review Code and Policies

Before we approve, we want to inspect the code and the policies

In [15]:
usercode = request.code

In [16]:
usercode

```python
class UserCode
    id: str = 46d111b1bd4c45ab90a701787d93ed2e
    status.approved: str = False
    service_func_name: str = get_mean_age
    code:

@sy.syft_function_single_use(data=asset)
def get_mean_age(df):
    return df['Age'].mean()

```

### Execute function on real data

Now that we have seen the code we can run it

In [17]:
get_mean_age_user_function = usercode.unsafe_function



In [18]:
asset = usercode.assets[0]
real_result = get_mean_age_user_function(df=asset)
print(real_result)

39.8


### Share the real result with the Data Scientist

In [19]:
result = request.accept_by_depositing_result(real_result)
print(result)
assert isinstance(result, sy.SyftSuccess)

message='Request 76ad8952ac034ab18280f82e1caab4b8 changes applied'


## Data Scientist - Part 2

### Fetch Real Result

As a Data scientist, we can now fetch the result

In [20]:
asset = ds_client.datasets[0].assets[0]

In [21]:
ds_client.code[0].status

{NodeView(node_name='hello-syft-usa-server', verify_key=e5567bb8185ebb274c17ce8b423e955ae9e0ae2904da4fd8a4bea227e68bb26d): <UserCodeStatus.EXECUTE: 'execute'>}

In [22]:
result_ptr = ds_client.code.get_mean_age(df=asset)

In [23]:
real_result = result_ptr.get()
print(real_result)

39.8


**That's a success!! The external data scientist was able to know the average age of breast cancer patients in a USA regional hospital, without having to access or even look at the real data.**

Once you are done with this tutorial, you can safely shut down the servers as following,

In [24]:
node.land()

Stopping hello-syft-usa-server
