In [1]:
import syft as sy

domain_client = sy.login(
    email="info@openmined.org",
    password="changethis",
    port=8081
)

  from .autonotebook import tqdm as notebook_tqdm



Anyone can login as an admin to your node right now because your password is still the default PySyft username and password!!!

Connecting to localhost... done! 	 Logging into mystifying_sophia... done!


In [2]:
domain_client.datasets

Idx,Name,Description,Assets,Id
[0],BreastCancerDataset,Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. The modified dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens scanned at 40x. Patches of size 50 x 50 were extracted from the original image. The labels 0 is non-IDC and 1 is IDC.,"[""train_images""] -> [""train_labels""] ->",bc534f80-df34-4475-b385-96ed41d02d38
[1],BreastCancerDataset2,Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. The modified dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens scanned at 40x. Patches of size 50 x 50 were extracted from the original image. The labels 0 is non-IDC and 1 is IDC.,"[""train_images""] -> [""train_labels""] ->",31378d5d-98c3-431f-ad47-8c815b91f93f


In [3]:
images = domain_client.datasets[1]["train_images"]
labels = domain_client.datasets[1]["train_labels"]

## User Flows for Remote Model Training

### I. Create Whole Model Remotely

1. User creates a remote model. This sends an empty model class to domain and returns a model pointer.

```python
from syft.core.tensor import nn

model_ptr = nn.Model(to=domain)
```

2. All operations here now are on the model pointer. Each operation results in a new pointer

```python
model_ptr.add(nn.Convolution(1, (3, 3), input_shape=(None, 1, 28, 28)))
model_ptr.add(nn.MaxPool((2, 2)))
model_ptr.add(nn.Convolution(2, (4, 4)))
model_ptr.add(nn.MaxPool((2, 2)))
model_ptr = model_ptr.compile(to=domain, loss=BinaryCrossEntropy(), optimizer=Adamax())
```

3. User calls .fit method on the model pointer and passes image_ptr and label_ptr as inputs

```python
model_ptr.fit(X_train_ptr, y_train_ptr, max_iter=10, validation_split=0.1, batch_size=100)
```
4. User call .weights methods to get weights pointer

```python
model_weights_ptr = model_ptr.weights()
published_weights = model_weights.publish()
public_weights = published_weights.get()
```

#### Pros and Cons
------------------
- One drawback is that this will be computationally costly in terms of requests time, since we will be sending a new request each time we perfom an operation like adding layers, compile, fit, etc. on the model_ptr. And serialization and deserialization of the model stored in database w.r.t to the model pointer would be addon.

### II. Create Model locally and train remotely

1. User creates the model locally

```python
from syft.core.tensor import nn

model_net = nn.Model()
model_net.add(nn.Convolution(1, (3, 3), input_shape=(None, 1, 28, 28)))
model_net.add(nn.MaxPool((2, 2)))
model_net.add(nn.Convolution(2, (4, 4)))
model_net.add(nn.MaxPool((2, 2)))
```

2. User call .compile to initialize model. This sends the model to domain and returns a model pointer.

```python
model_ptr = model_net.compile(to=domain, loss=BinaryCrossEntropy(), optimizer=Adamax())
```

3. User calls .fit method on the model pointer and passes image_ptr and label_ptr as inputs
```python
model_ptr.fit(X_train_ptr, y_train_ptr, max_iter=10, validation_split=0.1, batch_size=100)
```

4. Monitor training progress. This prints information like {"loss": "", "epoch": "", .....}
```python
model_ptr.progress()
```


5. User call .weights methods to get weights pointer

```python
model_weights_ptr = model_ptr.weights()
published_weights = model_weights.publish()
public_weights = published_weights.get()
```


#### Pros and Cons
-----------------
- User creates the model locally, so changes to model is faster. And they only send the model to domain (to receive a model ptr) when they call .compile ( which is equivalent to model weights initialization).
- We may need to track progress of each remote epoch and convey the same to the user during .fit operation. One easy way is to save details of each epoch in DB and create a progress endpoint to access it to via the python client, which we can poll from at regular intervals.
- The .fit method hides all code complexity, so more niche customization will be not possible. Although, we may extend this later. So building a high level APIs like keras and later providing more low level, later.

### III. Create Model locally and train remote but only per epoch

1. User creates the model locally.

```python
from syft.core.tensor import nn

model_net = nn.Model()
model_net.add(nn.Convolution(1, (3, 3), input_shape=(None, 1, 28, 28)))
model_net.add(nn.MaxPool((2, 2)))
model_net.add(nn.Convolution(2, (4, 4)))
model_net.add(nn.MaxPool((2, 2)))
```

2. User call .compile to initialize model. This sends the model to domain and returns a model pointer.

```python
model_ptr = model_net.compile(to=domain)
```

3. User calls .step method on the model pointer and loops through batches of data. Each step call will predict, calculate loss, backpropogate and update optimizer but only for one image. We keep track of total loss and total predictions, we can return them later.

```python

batch_size = 32
rows = train_Y.public_shape[0]
epochs = 10

loss, preds = [], []

for epoch in range(epochs):
    for i in range(rows // batch_size):
        batch_begin = i * batch_size
        batch_end = batch_begin + batch_size
        x_batch_ptr = train_X_ptr[batch_begin:batch_end]
        y_batch_ptr = train_Y_ptr[batch_begin:batch_end]

        loss, preds_ptr = model_ptr.step(
            x_batch_ptr, 
            y_batch_ptr, 
            validation_split=0.1, 
            batch_size=100
        )
        loss.append(loss)
```
4. User call .weights methods to get weights pointer

```python
model_weights_ptr = model_ptr.weights()
published_weights = model_weights.publish()
public_weights = published_weights.get()
```

#### Pros and Cons
-----------------
- User has more control on till what point they want to train their model.
- User creates the model locally, so changes to model is faster. And they only send the model to domain (to receive a model ptr) when they call .compile ( which is equivalent to model weights initialization).
- No need to track progress.
- Getting predictions at each stage, user can select which predictions to keep.
- Tracking total_preds and total_loss will add to storage.
- Added networking overload at each step.

### Make publish fast on model weights download: Dynamic PB Tracking of Model Weights

- Instead of calculating PB and everything at model_weights.publish i.e. the end, which will be computationally expensive, we can keep tracking the privacy budget at each step.
- We will keep of PB spend against the model weights at each backprop in the database.
- So when the user calls model_weights_ptr.publish(), we can simply deduct the PB and return the weights. Which would be quick.

This is similar to adding items to your cart in Amazon. Let's assume items are model weights and PB is cost of an item. For now assume, we either buy all items or None of them. Adding an item to your cart is like equivalent to running one round of forward and backward prop and calculating PB spent for one round against the model weights.
So, we may add x items to the cart (run model for x iterations) but never choose to checkout (never perform publish on the model_weights, in which case the calculated PB is never used). But if user chooses to do checkout we can quickly calculate total amount because we already have calculated the amount on each item addition. (In case of model training, we already have PB calculated at each step, so final step PB is quick and fast).
    