# Part 10 - Secure Deep Learning Classification 

### Context 


Data is the driver behind Machine Learning. Companies, organisations or hospitals that create and collect data are able to build and train their own machine learning models in house. This allows them to offer it as a service (MLaaS) to outside collaborators which don't have access to as much data but would like to benefit from this model to gain insights on their own data. This data might be extremely sensitive (think about you sending a picture of your skin to detect skin cancer) and can't be sent in clear to a server. In return, it is likely that the model is also too critical from a business perspective to be sent directly to the client who could potentially steal it.

In this context, one possible solution is to encrypt both the model and the data and to perform the machine learning prediction in a completely encrypted setting. Several encryption schemes exist that allow for computation over encrypted data, among which Secure Multi-Party Computation (SMPC), Homomorphic Encryption (FHE/SHE) and Functional Encryption (FE). We will focus here on Multi-Party Computation (which have been introduced in Tutorial 5) which consists of private additive sharing and relies on the crypto protocols SecureNN and SPDZ, the details of which are given [in this excellent blog post](https://mortendahl.github.io/2017/09/03/the-spdz-protocol-part1/).

The exact setting in this tutorial is the following: consider that you are the server and you have some data. First, you define and train a model with this data. Second, you get in touch with a client which holds some data and would like to access your model to do some prediction. It encrypts this data by building private shares while you do the same with your model, and then you execute the private evaluation of your model. Finally the result of the prediction is sent back encrypted to the client so that the server (you) learns nothing about the client's data.


Author:
- Théo Ryffel - Twitter: [@theoryffel](https://twitter.com/theoryffel) · GitHub: [@LaRiffle](https://github.com/LaRiffle)

**Let's get started!**

### Imports and model specifications

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

Here are PySyft imports. In particular we define remote workers `alice`, `bob` and `charlie` along with a last one, the `crypto_provider` would give all the crypto primitive we may need (See Part 5 for more details).

In [2]:
import syft as sy  # <-- NEW: import the Pysyft library
hook = sy.TorchHook(torch)  # <-- NEW: hook PyTorch ie add extra functionalities to support Federated Learning
client = sy.VirtualWorker(hook, id="client") # <-- NEW: define remote workers charlie, bob & alice
bob = sy.VirtualWorker(hook, id="bob")
alice = sy.VirtualWorker(hook, id="alice")
crypto_provider = sy.VirtualWorker(hook, id="crypto_provider")  # <-- NEW: and crypto_provider

We define the setting of the learning task

In [3]:
class Arguments():
    def __init__(self):
        self.batch_size = 64
        self.test_batch_size = 10
        self.epochs = 10
        self.lr = 0.01
        self.momentum = 0.5
        self.log_interval = 100

args = Arguments()

### Data loading and sending to workers

In our setting, we assume that the server has access to some data to first train its model. Here it is the MNIST training set.

In [4]:
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args.batch_size, shuffle=True)

Second, there is a client which would like to have predictions on some data from the server's model. This client `charlie` encrypts its data by sharing it additively across two workers alice and bob. Here we use MNIST testing set and we simulate the fact that charlie has initially the data by sending it to him.

In [5]:
test_dataset = datasets.MNIST('../data', train=False, transform=transforms.Compose([
   transforms.ToTensor(),
   transforms.Normalize((0.1307,), (0.3081,))
]))

# The client which owns the data performs the data encryption through sharing
#TODO: send the data first to charlie
client_data = sy.BaseDataset(
        test_dataset.data[:100], test_dataset.targets[:100]
    ).fix_precision().share(alice, bob, crypto_provider=crypto_provider)

In [6]:
client_data_batches = list(zip(
    torch.split(client_data.data, args.test_batch_size),
    torch.split(client_data.targets, args.test_batch_size)
))

### Feed Forward Neural Network specification
Here is the network specification used by the server

In [7]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = x.view(-1, 784)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

### Launch the training
The training is done locally so this is pure local PyTorch training, nothing special here!

In [8]:
def train(args, model, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        output = F.log_softmax(output, dim=1)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * args.batch_size, len(train_loader) * args.batch_size,
                100. * batch_idx / len(train_loader), loss.item()))

In [9]:
model = Net()
optimizer = optim.SGD(model.parameters(), lr=args.lr)

for epoch in range(1, args.epochs + 1):
    train(args, model, train_loader, optimizer, epoch)




Our model is now trained and ready to be provided as a service

## Secure evaluation

Now, as the server, we send the model to the workers holding the data. Because the model is sensitive information (you've spent time optimizing it!), you don't want to disclose its weights so you secret share the model just like we did with the dataset earlier.

In [10]:
model.eval()
model.fix_precision().share(alice, bob, crypto_provider=crypto_provider)

Net(
  (fc1): Linear(in_features=784, out_features=500, bias=True)
  (fc2): Linear(in_features=500, out_features=10, bias=True)
)

This test function performs the encrypted evaluation. The model weights, the data inputs, the prediction and the target used for scoring are encrypted!

However, the syntax is very similar to pure PyTorch testing of a model, isn't it nice?!

The only thing we decrypt from the server side is the final score at the end to verify predictions were on average good.

In [11]:
def test(args, model, client_data_batches):
    print('Computing...')
    n_correct = 0
    n_total = 0
    with torch.no_grad():
        for data, target in client_data_batches:
            print('...')
            output = model(data)
            pred = output.argmax(dim=1)
            score = pred.eq(target.view_as(pred)).sum()
            n_correct += score
            n_total += args.test_batch_size

    n_correct = n_correct.get().float_precision().long().item()
    
    print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
        n_correct, n_total,
        100. * n_correct / n_total))
    #TODO: get back the model to local floats

In [12]:
test(args, model, client_data_batches)

Computing...
...
...
...
...
...
...
...
...
...
...

Test set: Accuracy: 72/100 (72%)



Et voilà! Here you are, you have done a completely secure prediction: the weights of the server's model have not leaked to the client and the server has no information about the data input nor the classification output!

## Conclusion

You have seen here how it is easy to leverage your knowledge of PyTorch to perform secure Machine Learning that protect users data without having to be a crypto expert!

More on this topic will come soon, especially on private encrypted training of neural networks, when a organisation resorts to external sensitive data to train its own model.

If you enjoyed this and would like to join the movement toward privacy preserving, decentralized ownership of AI and the AI supply chain (data), you can do so in the following ways! 

### Star PySyft on GitHub

The easiest way to help our community is just by starring the repositories! This helps raise awareness of the cool tools we're building.

- [Star PySyft](https://github.com/OpenMined/PySyft)

### Pick our tutorials on GitHub!

We made really nice tutorials to get a better understanding of what Federated and Privacy-Preserving Learning should look like and how we are building the bricks for this to happen.

- [Checkout the PySyft tutorials](https://github.com/OpenMined/PySyft/tree/master/examples/tutorials)


### Join our Slack!

The best way to keep up to date on the latest advancements is to join our community! 

- [Join slack.openmined.org](http://slack.openmined.org)

### Join a Code Project!

The best way to contribute to our community is to become a code contributor! If you want to start "one off" mini-projects, you can go to PySyft GitHub Issues page and search for issues marked `Good First Issue`.

- [Good First Issue Tickets](https://github.com/OpenMined/PySyft/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)

### Donate

If you don't have time to contribute to our codebase, but would still like to lend support, you can also become a Backer on our Open Collective. All donations go toward our web hosting and other community expenses such as hackathons and meetups!

- [Donate through OpenMined's Open Collective Page](https://opencollective.com/openmined)