In [1]:
epochs= 25

# SplitNN for Vertically Partitioned Data

Recap: The previous tutorial looked at building a basic SplitNN, where an NN was split into two segments on two seperate hosts. However, what if clients have multi-modal multi-institutional collaboration?

<b>What is Vertically Partitioned Data?</b> Data is said to be vertically partitioned when several organizations own different attributes or modalities of information for the same set of entities.

<b>Why use Partitioned Data?</b> Partition allows for orgnizations holding different modalities of data to learn distributed models without data sharing. Partitioning scheme is traditionally used to reduce the size of data by splitting and distribute to each client.
 
<b>Description</b>This configuration allows for multiple clients holding different modalities of data to learn distributed models without data sharing. As a concrete example we walkthrough the case where radiology centers collaborate with pathology test centers and a server for disease diagnosis. Radiology centers holding imaging data modalities train a partial model upto the cut layer. In the same way the pathology test center having patient test results trains a partial model upto its own cut layer. The outputs at the cut layer from both these centers are then concatenated and sent to the disease diagnosis server that trains the rest of the model. This process is continued back and forth to complete the forward and backward propagations in order to train the distributed deep learning model without sharing each others raw data. In this tutorial, we split a single flatten image into two segments to mimic different modalities of data, you can also split it into arbitrary number.

<img src="config_1.png" width="40%">

In this tutorial, we demonstrate the SplitNN architecture with 2 segments[[1](https://arxiv.org/abs/1812.00564)].This time:

- <b>$Client_{1}$</b>
    - Has Model Segment 1
    - Has the handwritten images segment 1
- <b>$Client_{2}$</b>
    - Has model Segment 1
    - Has the handwritten images segment 2
- <b>$Server$</b> 
    - Has Model Segment 2
    - Has the image labels
    
Author:
- Abbas Ismail - github：[@abbas5253](https://github.com/abbas5253)

In [2]:
import torch
from torchvision import datasets, transforms
from torch import nn, optim
import syft as sy
import numpy as np

hook = sy.TorchHook(torch)

from distribute_data import Distribute_MNIST

Falling back to insecure randomness since the required custom op could not be found for the installed version of TensorFlow. Fix this by compiling custom ops. Missing file was '/home/ab_53/miniconda3/envs/PySyft/lib/python3.7/site-packages/tf_encrypted/operations/secure_random/secure_random_module_tf_1.15.3.so'





In [3]:
# Data preprocessing
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])
trainset = datasets.MNIST('mnist', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# create some workers
client_1 = sy.VirtualWorker(hook, id="client_1")
client_2 = sy.VirtualWorker(hook, id="client_2")

server = sy.VirtualWorker(hook, id= "server") 

data_owners = (client_1, client_2)
model_locations = [client_1, client_2, server]

#Split each image and send one part to client_1, and other to client_2
mnist = Distribute_MNIST(data_owners=data_owners, data_loader=trainloader)

#list of dict each having structure [{"alice":pointer to alice's image segment., "bob": pointer to bob's image segement},...] 
data_pointers = mnist.data_pointer
labels = mnist.labels

In [4]:
input_size= [14*14, 14*14]
hidden_sizes= {"client_1": [32, 64], "client_2":[32, 64], "server":[128, 64]}

#create model segment for each worker
models = {
    "client_1": nn.Sequential(
                nn.Linear(input_size[0], hidden_sizes["client_1"][0]),
                nn.ReLU(),
                nn.Linear(hidden_sizes["client_1"][0], hidden_sizes["client_1"][1]),
                nn.ReLU(),
    ),
    "client_2":  nn.Sequential(
                nn.Linear(input_size[1], hidden_sizes["client_2"][0]),
                nn.ReLU(),
                nn.Linear(hidden_sizes["client_2"][0], hidden_sizes["client_2"][1]),
                nn.ReLU(),
    ),
    "server": nn.Sequential(
                nn.Linear(hidden_sizes["server"][0], hidden_sizes["server"][1]),
                nn.ReLU(),
                nn.Linear(hidden_sizes["server"][1], 10),
                nn.LogSoftmax(dim=1)
    )
}



# Create optimisers for each segment and link to their segment
optimizers = [
    optim.SGD(models[location.id].parameters(), lr=0.05,)
    for location in model_locations
]



for location in model_locations:
    models[location.id].send(location)

In [5]:
def train(data_pointer, target, data_owners, models, optimizers, server):
    
    for opt in optimizers:
        opt.zero_grad()
     
    #clients output specified by the id
    client_output = {}
    remote_outputs = []
    
    for owner in data_owners:
        client_output[owner.id] = models[owner.id](data_pointer[owner.id].reshape([-1, 14*14]))
        remote_outputs.append(
            client_output[owner.id].move(server)
        )
    
    server_input = torch.cat(remote_outputs, 1)
    pred = models["server"](server_input)
    
    criterion = nn.NLLLoss()
    loss = criterion(pred, target.reshape(-1, 64)[0])
    
    loss.backward()

    for opt in optimizers:
        opt.step()
        
    return loss.detach().get()

In [6]:
for i in range(epochs):
    running_loss = 0
    for data_ptr, label in zip(data_pointers[:-1], labels[:-1]):
        label = label.send(server)
        loss = train(data_ptr, label, data_owners, models, optimizers, server)
        #print(loss)
        running_loss += loss

    else:
        print("Epoch {} - Training loss: {}".format(i, running_loss/len(trainloader)))

Epoch 0 - Training loss: 2.0991368293762207
Epoch 1 - Training loss: 1.551032304763794
Epoch 2 - Training loss: 1.2819230556488037
Epoch 3 - Training loss: 1.1513397693634033
Epoch 4 - Training loss: 1.0738531351089478
Epoch 5 - Training loss: 1.021932601928711
Epoch 6 - Training loss: 0.9837340116500854
Epoch 7 - Training loss: 0.9530386924743652
Epoch 8 - Training loss: 0.9278602004051208
Epoch 9 - Training loss: 0.9066053032875061
Epoch 10 - Training loss: 0.8880205750465393
Epoch 11 - Training loss: 0.8713458180427551
Epoch 12 - Training loss: 0.8560830354690552
Epoch 13 - Training loss: 0.8419126272201538
Epoch 14 - Training loss: 0.8285958170890808
Epoch 15 - Training loss: 0.8158738017082214
Epoch 16 - Training loss: 0.8035011291503906
Epoch 17 - Training loss: 0.7914702892303467
Epoch 18 - Training loss: 0.7797912955284119
Epoch 19 - Training loss: 0.7685437798500061
Epoch 20 - Training loss: 0.7577381134033203
Epoch 21 - Training loss: 0.7475399374961853
Epoch 22 - Training lo