# Working with Private Datasets

## Install

In [None]:
SYFT_VERSION = ">=0.8.1b0,<0.9"
package_string = f'"syft{SYFT_VERSION}"'
# !pip install {package_string} -f https://whls.blob.core.windows.net/unstable/index.html

In [None]:
import syft as sy
sy.requires(SYFT_VERSION)

In [None]:
node = sy.orchestra.launch(name="pandas-test-domain-1", port=8083, reset=True)

## Setup

For the purpose of this tutorial we are creating a very simple dataset, which is created and owner by the root client

In [None]:
root_client = node.login(email="info@openmined.org", password="changethis")

In [None]:
import numpy as np

In [None]:
dataset = sy.Dataset(
    name="my dataset",
    asset_list=[
        sy.Asset(
        name="my asset",
        data=np.array([1,2,3]),
        mock=np.array([1,1,1])
    )]
)

In [None]:
root_client.upload_dataset(dataset)

In [None]:
root_client.register(name="Jane Doe", email="jane@caltech.edu",
                            password="abc123", institution="Caltech", website="https://www.caltech.edu/")

## Mocks

In [None]:
guest_client = node.client.login(email="jane@caltech.edu", password="abc123")

Lets inspect the datasets from the data scientists perspective

In [None]:
datasets = guest_client.api.services.dataset
datasets

Datasets have assets, in our case there is only 1 asset

In [None]:
asset = datasets[0].assets[0]
asset

When you get a refence to an asset as a datascientist using Pysyft, you are almost never getting the real data. Often you will get a mock object instead, which is an object with the same type and characteristics (e.g. list size), but with fake data instead. In Pysyft, you can access the mock objects in 2 ways. The first method is to call `Asset.mock_data`

In [None]:
mock = asset.mock_data

As we can see, the mock data is just a a native library type, and not a type created by PySyft

In [None]:
type(mock), mock

We can use mock objects to write code against the mock data, which we can then pass to a `@syft_function` to execute remotely. E.g.

In [None]:
x = mock + 3
y = x ** 2

In [None]:
@sy.syft_function(input_policy=sy.ExactMatch(inp=asset),
                  output_policy=sy.SingleExecutionExactOutput())
def add_pow(inp):
    x = inp + 3
    y = x ** 2
    return y

We wont go deeper into the flow for approving execution of this here, for more see the `syft function` tutorial

## Eager Execution

`@syft_functions` are useful, but have 2 downsides

- not every data owner wants to execute raw python code
- you can only remotely execute the code once you get approval

The second way to access a reference to our asset is via `Asset.pointer`. `Pointers` are objects that point to data on the server, they can contain mock data as well, but they rarely contain the real data. When you use a `Pointer` to do a computation PySyft does the following things

- a) the computation is performed locally on the mock data
- b) the client sends an `Action` to the server, which causes the computation to be performed on the server
- c) we create a new `Pointer` as a result, which contains the locally created mock data and points to the result on the server

We call B and C here side-effects

In [None]:
pointer = asset.pointer

In [None]:
pointer

In [None]:
pointer.sum()

So the `.sum` method we just called did a, b and c behind the scenes. This also happens for the so called dunder methods, these are methods that are implicitly called when we call for instance `pointer + 1`. Under the hood `pointer + 1` is syntactic sugar for `pointer.__add__(1)` which allows the Pointer to intercept this call and create the side effects.

In [None]:
pointer2 = pointer + 1
pointer2

Another thing to notice here, is that to call `__add__` with `1` as an argument, we also need to have `1` on the server. Therefore, when we are passing arguments to methods, Syft is pointerizing them as well as a side effect before the action is executed on the server.

This gives us a pretty complete picture of how we can execute methods on pointers. Sometimes we want to create objects from scratch, not merely as a result of a method. In eager execution land, this means creating a pointer on the server.

In [None]:
pointer3 = guest_client.api.lib.numpy.array([4,5,6])
pointer3

This also created a pointer. In this case, we can see the real data (not a mock), as we own this data. We can use the `client.api.lib.path` pattern for both functions and classes. Morover, we can combine it with the original pointer in the same was as before:

In [None]:
pointer3 = guest_client.api.lib.numpy.add(pointer, pointer3)

For methods, functions and classes, we can use autocomplete. In a jupyter notebook you can do this by typing the method and the opening brackets, and then calling `shift-tab`, e.g. pointer.max().

**step into the `()` and type shift-tab for auto complete**

In [None]:
pointer.max()

Note that the Same works for `guest_client.api.lib.numpy.some_function`.

When we are done with our computations, we can request the real result of the computation, instead of the mock. We can do this using the `Pointer.request()` method:

In [None]:
pointer3.request(guest_client)

Data owners can now approve this request

In [None]:
root_client = node.login(email="info@openmined.org", password="changethis")

In [None]:
requests = root_client.api.services.request.get_all()
requests

In [None]:
requests[0].approve_with_client(root_client)

Which allows the data scientists to download the result

In [None]:
pointer3.get_from(guest_client)

## Action Service

### Listing the Services

### Autocomplete Service Methods

### Viewing Method Signatures

## Simple Example

## Request the Result