# Syft Functions

## Install

In [None]:
SYFT_VERSION = ">=0.9,<1.0.0"
package_string = f'"syft{SYFT_VERSION}"'
# %pip install {package_string} -q

In [None]:
# syft absolute
import syft as sy

sy.requires(SYFT_VERSION)

In [None]:
server = sy.orchestra.launch(
    name="syft-functions-example-datasite-1", port=7022, reset=True,
)

## Setup

Lets login with our root user.

In [None]:
# syft absolute

admin_client = server.login(email="info@openmined.org", password="changethis")

Create a dummy dataset for experimenting

In [None]:
# third party
import numpy as np

dataset = sy.Dataset(
    name="my dataset",
    asset_list=[
        sy.Asset(name="my asset", data=np.array([1, 2, 3]), mock=np.array([1, 1, 1])),
    ],
)
admin_client.upload_dataset(dataset)

Create a new user to use as a data scientist account

In [None]:
admin_client.register(
    name="Jane Doe",
    email="jane@caltech.edu",
    password="abc123",
    password_verify="abc123",
    institution="Caltech",
    website="https://www.caltech.edu/",
)

In [None]:
guest_client = server.client.login(email="jane@caltech.edu", password="abc123")

## Defining a Syft Function

Let's say you want to compute the mean of some numbers remotely with PySyft. How do you do that? Pretty easy actually:

In [None]:
@sy.syft_function_single_use()
def func():
    # run some computation
    data = list(range(100))
    return sum(data) / 100

## Input Policies

That's great but what if we want to run this function with some parameters? Maybe even some private data (why do remote data science without remote data?). Here's where Input Policies come into play. Their purpose is to define what rules will we follow when it comes to the inputs of a syft function. At the moment we provide what we call an `ExactMatch` policy which allows data scientists to specify a private asset they would like to use, just like this:

In [None]:
asset = guest_client.datasets[0].assets[0]

In [None]:
SYFT_VERSION = ">=0.9,<1.0.0"
package_string = f'"syft{SYFT_VERSION}"'
# %pip install {package_string} -q

In [None]:
# syft absolute
import syft as sy

sy.requires(SYFT_VERSION)

In [None]:
@sy.syft_function(
    input_policy=sy.ExactMatch(data=asset),
    output_policy=sy.SingleExecutionExactOutput(),
)
def mean(data):
    return sum(data) / len(data)

## Output Policies

You have probably noticed that in the last example we also specified the output policy. Its purpose has to do with the release of information for a given function and controlling the parameters that this release comes with. For example, if a data owner and a data scientist agree on the content of a function run on a datasite and on what private data that can be run on, their work might not be done yet. They might negotiate how many times that function can be run, whether or not the data scientist can have access  or what happens before releasing the output (maybe we add some noise like in the case of differential privacy). At the moment we have policies that allow data scientist to ask for a certain amount of runs on function, but the ones you will find most often is `SingleExecutionExactOutput` that ask for a single use on a function. We have used it so much that we came with the `syft_function_single_use` decorator that use by default that output policy. What is also cool is that you can pass the input for an input policy to this decorator to get a shorter version like this:

In [None]:
# same functionality as before, just faster to write


@sy.syft_function_single_use(data=asset)
def mean(data):  # noqa: F811
    return sum(data) / len(data)

We are working on extending the functionalities of these policies to truly accomplish the goals we have in mind for them. However, if you have a specific use case in mind and can't wait to use it in your remote data science pipeline, check the custom policies notebook that teaches you how to implement your own input and output policies (and also reuse other users' submitted policies)!

## Testing it Locally

"Right, so we have defined a function for remote use, but can I run it locally?" - you probably ask

Yeah, of course you can!  

In [None]:
func()

"Sure, but what about functions on the assets? That can't work!"

YEAH IT CAN!!

In [None]:
mean(data=asset)

If you paid attention when we defined the dataset, you probably noticed that for the asset we have added we specified both **the private data and the mock data, and this runs on the mock data**. We use the mock data to test function on the data scientist side. This mock data requires no special access or permissions, because it is public data. This can be data that only matches the structure of the private data or might even be synthetic data if the data owner provides it. Its main goal is to help data scientists to test their functions locally before submitting a request to filter noisy requests in the process. If you would like to learn more about the data owner experience, please check out the notebooks under the tutorials section.

## Submitting it for Approval

Now that we are sure our function works at intended on the mock data, we are ready to submit a request. The cleanest way to do that is to first create a project and attach your request there.

In [None]:
# Create a project
new_project = sy.Project(
    name="My Cool Project",
    description="""Hi, I want to calculate the mean of your private data,\
                    pretty please!""",
    members=[guest_client],
)
new_project

Now let's add a code request to the project:

In [None]:
new_project.create_code_request(mean, guest_client)

Now we can start our project by simply running 

In [None]:
project = new_project.send()
project

## Checking Approval

Very cool, now let's run our function with private data!

In [None]:
guest_client.code.mean(data=asset)

Right! Our code was not approved, so we should wait for the review from the data owner. As we also deployed the datasite, we will do that quickly here, but for more details on what is happening check the data owner sections under tutorials:

In [None]:
request = admin_client.notifications[-1].link.requests[0]
request

In [None]:
request.code

Now that we have inspected the code, we can approve it

In [None]:
request.approve()

## Executing your Function

Good, now we are finally ready to run the function on private data:

In [None]:
res = guest_client.code.mean(data=asset)
res

Notice that the result we see is still `1.0` which looks like the result on the mock data. That is because it actually is! The object returned is an `ActionObject` which here behaves like a pointer for the data on the datasite:

In [None]:
isinstance(res, sy.ActionObject)

In [None]:
type(res)

If we do not accept the result, the data owner calls

In [None]:
request.deny(reason="you cannot have access")

In [None]:
res_denied = guest_client.code.mean(data=asset)
res_denied

in that case our call returns a `SyftError`

## Downloading Results

To get the real data we need one more step:

In [None]:
real_res = res.get()
real_res

In [None]:
assert real_res == 2.0

We can check the type of the result to see it's real data:

In [None]:
type(real_res)