# Encrypted Deep Learning PATE Framework

_The Private Aggregation of Teaher Ensembles_ (PATE) has been designed to allow unlabeled, unsensitive data to be automatically labeled through the combined opinion of multiple private teacher models [(Papernot et al. 2016)](https://arxiv.org/pdf/1610.05755.pdf). This framework ensures privacy for the teacher's data and models, but requires the student to share their unlabeled dataset for analysis. This may be unreasonable in cases where the student's data also requires privacy, as in the case of a Hospital labeling the X-rays of multiple patients, or a company wanting to extract information from product usage by sharing their data to their competitors. This notebook proposes considering the aggregated opinion of teacher models as an encrypted service, where both the teachers and the student data are protected using multi-party computation and additive secret sharing.

In [1]:
import numpy as np
import torch as th
import syft as sy
from torchvision import transforms, datasets
import os

hook = sy.TorchHook(th)

W0818 19:17:56.447226 4718720448 secure_random.py:26] Falling back to insecure randomness since the required custom op could not be found for the installed version of TensorFlow. Fix this by compiling custom ops. Missing file was '/anaconda3/envs/jupyter/lib/python3.7/site-packages/tf_encrypted/operations/secure_random/secure_random_module_tf_1.14.0.so'
W0818 19:17:56.590826 4718720448 deprecation_wrapper.py:119] From /anaconda3/envs/jupyter/lib/python3.7/site-packages/tf_encrypted/session.py:26: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.



## Get the dataset

For this example, we're going to use a simple dataset, such as MNIST. Since the objective of this demo is not to prove a high-performance model, but a privacy framework for multiple parties. The objective here is to use the testing dataset as our sensitive unlabeled data.

In [2]:
train_transforms = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomAffine(15),
    transforms.RandomPerspective(),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize((0.5,),(0.5,))
])

test_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,),(0.5,))
])

train_dataset = datasets.MNIST('./data/train', train=True, download=True, transform=train_transforms)
test_dataset = datasets.MNIST('./data/test', train=False, download=True, transform=test_transforms)
testloader = th.utils.data.DataLoader(test_dataset, shuffle=False, batch_size=len(test_dataset))

## Define the workers and distribute the data

We need to define both the teacher workers and a trusted party, which will be considered as a neutral worker not owned by any involved party. This worker will be responsible for the encryption of the unlabeled data, as well as the encryption of the teacher models.

In [3]:
num_teachers = 10
teachers = tuple(sy.VirtualWorker(id=str(i), hook=hook) for i in range(num_teachers))
secure_worker = sy.VirtualWorker(id="secure_worker", hook=hook)

# Split the dataset into equal-sized partitions for all the teachers
# trainsets = tuple(th.utils.data.random_split(train_dataset, [len(train_dataset)//num_teachers]*num_teachers))
idxs = np.random.permutation(len(train_dataset))
split_size = len(train_dataset)//num_teachers

trainsplits = tuple((train_dataset.data[idxs[i:i+split_size]],train_dataset.targets[idxs[i:i+split_size]]) for i in range(num_teachers))

## Define the teacher's models

Since we're going to later use this models for encrypted computation, we must make sure it's implemented as such. Because of this, we can't use `log_softmax` inside our model definition. We'll use it inside the training loops and use `argmax` instead during inferation.

In [4]:
from torch import nn, optim
import torch.nn.functional as F

class TeacherModel(nn.Module):
    def __init__(self):
        super(TeacherModel, self).__init__()
        
        # Input shape is (1,28,28) => 784
        self.fc1 = nn.Linear(784, 512)
        
        self.fc2 = nn.Linear(512, 256)
        
        self.fc3 = nn.Linear(256,128)
        
        self.fc4 = nn.Linear(128, 64)
        
        self.fc5 = nn.Linear(64, 32)
        
        self.fc6 = nn.Linear(32, 16)
        
        self.fc7 = nn.Linear(16, 10)
        
    def forward(self, x):
        # reshape the data for fc layers
        x = x.view(-1, 28*28)
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = F.relu(self.fc5(x))
        x = F.relu(self.fc6(x))
        
        # Get the linear output. Classification is done outside the model.
        x = self.fc7(x)
        
        return x

## Train the teacher models on disjoint data

In a real life situation, teachers would train privately in their local computers. In this case, we're going to train them all in our local machine.

In [5]:

trainsets = tuple(
    sy.BaseDataset(
        trainsplits[i][0].copy(), 
        trainsplits[i][1].copy(), 
        transform=train_transforms)
    for i in range
    (num_teachers))

teacher_models = list(
        # Models are stored in a list since we have to reassign them later
        TeacherModel()
    for i in range(num_teachers))

In [6]:
retrain = False

for i in range(num_teachers):
    if not os.path.exists(f'teacher_{i}_chkpt.pth') or retrain:
        epochs = 20
        trainloader = th.utils.data.DataLoader(trainsets[i], shuffle=True, batch_size=64)
        model = teacher_models[i]

        criterion = nn.NLLLoss()
        optimizer = optim.Adam(model.parameters(), lr=0.001)

        for e in range(epochs):
            running_loss = 0
            steps = 0
            for images, labels in trainloader:

                optimizer.zero_grad()

                # Use log_softmax for local classification
                log_ps = F.log_softmax(model(images))
                loss = criterion(log_ps, labels)
                loss.backward()
                optimizer.step()

                running_loss += loss.item()
                steps += 1

                if steps % 20 == 0:
                    print(f'Teacher {i}/{num_teachers} | Epoch: {e}/{epochs} | Loss: {np.round(running_loss/steps+1, 3)}')
        else:
            th.save(model.state_dict(), f'teacher_{i}_chkpt.pth')
            

## Simple PATE Demonstration

Now that we have the trained teacher models, we can label our unlabeled data (which for this demo is the MNIST test dataset) by combining the opinions of all the teachers.

### Reinitialize the models and move them to their designated teacher

Now that we're now inferring, we can send the models to our workers, and simulate a real life scenario were the teachers have their trained models inside their machines.

In [7]:
teacher_models = list(
        # Models are stored in a list since we have to reassign them later
        TeacherModel()
    for i in range(num_teachers))

for i in range(num_teachers):
    chkpt_path = f'teacher_{i}_chkpt.pth'
    if os.path.exists(chkpt_path):
        state_dict = th.load(chkpt_path)
        teacher_models[i].load_state_dict(state_dict)

for i in range(num_teachers):
    teacher_models[i] = teacher_models[i].send(teachers[i])

### Get noisy opinions from each of the teachers

Now we send our unlabeled data to the teachers, so that they can generate predictions for each datapoint. Note here two things:
1. We must send our data to the teacher. This means our data is compromised.
2. We move the teacher's opinion to a secure worker. This ensures the student (our local machine) has no access to the raw data. Therefore, privacy is conserved

To ensure that the opinions are differentially private, we add laplacian noise with a certain epsilon.

In [8]:
opinions = None

unlabeled_data, labels = next(iter(testloader))

for i in range(num_teachers):
    unlabeled_data = unlabeled_data.send(teachers[i]) # send the data to teacher
    
    ps = th.exp(teacher_models[i](unlabeled_data)) # get teacher's opinion
    _, top_class = ps.topk(1, dim=1)
    
    top_class.move(secure_worker) # Move the teacher's opinion to a secure worker.
    
    if opinions is None:
        opinions = top_class
    else:
        opinions = th.cat((opinions, top_class), dim=1) # concatenate all opinions
    
    unlabeled_data = unlabeled_data.get() # retrieve the data

Now the opinions tensor consists of a matrix representing all the opinions the teachers gave for each datapoint. In order to ensure privacy from this conclusions, we count the votes for each datapoint and return the value with the highest number of votes. Also, we add Laplacian noise to the vote counts so that we can ensure an epsilon-delta differential privacy.

<img src="https://miro.medium.com/max/700/1*BgnTR1pSBcJNNbuJxnShTQ.png">

In [9]:
def noisy_argmax(x, epsilon=0.1):

    # First get the vote count for each datapoint.
    count = th.stack([th.bincount(x_i, minlength=10).long() for x_i in th.unbind(x, dim=0)], dim=0)

    # Add Laplacian noise to the votecount.
    beta = 1 / epsilon
    noise = th.from_numpy(np.random.laplace(0, beta, count.shape))
    
    n_labels = count.double() + noise.send(count.location)

    # Then get the highest votecount index
    n_labels = th.argmax(n_labels, dim=1)
    return n_labels


In [10]:
noisy_labels = noisy_argmax(opinions, epsilon=1).get()

In [11]:
noisy_labels

tensor([7, 2, 1,  ..., 4, 5, 6])

### Check the accuracy of the noisy aggregated opinions

Since our data is already labeled, we can check how much the noise aggregation affected the accuracy of the predictions. Feel free to change the epsilon value on the `noisy_argmax()` query to see how the accuracy changes

In [12]:
equals = labels == noisy_labels
accuracy = th.mean(equals.float())
print(f"Noisy Argmax Accuracy: {int(accuracy*100)}%")

Noisy Argmax Accuracy: 91%


Now we can use these labels to train our student model without leaking privacy from neither the private datasets nor the teachers' models. 

## Making PATE Framework bidirectionally private

If you observe the code above, the student is required to send their data to the teachers. This implies that the student data can't hold private information. This is also pointed out by Papernot et. al. on the paper [*_Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data_*](https://arxiv.org/pdf/1610.05755.pdf), were is stated that _"using auxiliary, unlabeled non-sensitive data, a student model is trained on the aggregate output of the ensemble, such that the student learns to accurately mimic the ensemble"_. This is a strong assumption, since if the teacher datasets require privacy, most probably the student unlabeled dataset also holds this requirement. Following the hospitals example, the student dataset would also hold personal and private information, which cannot be legally leaked to any other hospital.

<img src="https://miro.medium.com/max/700/1*v76ZMnxkLo4RpdstQ-KGYw.png">

Diagram of the current scenario. The student's data must be shared with all the teachers, and therefore no privacy is guaranteed.

In order to ensure privacy both for the teachers and the student, we can modify the framework so that it behaves as an encrypted service. This implies using secret additive sharing to encrypt both the teacher models and the unlabeled dataset. By doing so, we're able to generate predictions on the dataset without leaking the raw student data to any of the teachers. 

### Reinitialize the models and move them to their designated teacher

Let's get back to the original situation. We have pre-trained models inside our teachers

In [13]:
teacher_models = list(
        # Models are stored in a list since we have to reassign them later
        TeacherModel()
    for i in range(num_teachers))

for i in range(num_teachers):
    chkpt_path = f'teacher_{i}_chkpt.pth'
    if os.path.exists(chkpt_path):
        state_dict = th.load(chkpt_path)
        teacher_models[i].load_state_dict(state_dict)

for i in range(num_teachers):
    model = teacher_models[i]
    # model = model.fix_precision().share(alice, bob, crypto_provider=secure_worker)
    teacher_models[i] = model.send(teachers[i])

### Federate the teacher models

We first need to federate the teachers' models into multiple workers. Even though this could be achieved using any of the workers already created, we're going to add `alice` and `bob` to maintain clarity.

Also, since Syft's encryption protocol is based on SecureNN's implementation, multi-party computation must be done between 3 workers: one crypto provider and 2 holding shares of the tensor.

In order to encrypt each model, we must use fixed precision and secret additive sharing.


In [14]:
alice = sy.VirtualWorker(id='alice', hook=hook)
bob = sy.VirtualWorker(id='bob', hook=hook)

As of PySyft 0.1.23a1, there seems to be a problem when doing `fix_precision()` with remote models. Because of this, it has to be done manually.

In [17]:
for i in range(num_teachers):
    model = teacher_models[i]
    # Need to do fix_precision manually because of a current bug
    for p in model.parameters():
        # p is a pointer tensor to the parameter value. Get remote id and location
        p.data = p.data.fix_precision().share(alice, bob, crypto_provider=secure_worker)
    
    # Expected way to encrypt the model
    # teacher_models[i] = model.fix_precision().share(alice, bob, crypto_provider=secure_worker)

Now if we inspect the parameters of any of these models, we'll see that they're shared among two teachers. The model is still present inside each teacher, so to see the encryption we have to do `copy().get()`.

In [18]:
list(teacher_models[0].copy().get().parameters())

[Parameter containing:
 Parameter>FixedPrecisionTensor>[AdditiveSharingTensor]
 	-> [PointerTensor | me:56940163585 -> alice:90056800002]
 	-> [PointerTensor | me:10223246370 -> bob:67538150503]
 	*crypto provider: secure_worker*, Parameter containing:
 Parameter>FixedPrecisionTensor>[AdditiveSharingTensor]
 	-> [PointerTensor | me:31363885691 -> alice:76340740066]
 	-> [PointerTensor | me:36125014073 -> bob:13067822833]
 	*crypto provider: secure_worker*, Parameter containing:
 Parameter>FixedPrecisionTensor>[AdditiveSharingTensor]
 	-> [PointerTensor | me:24764596241 -> alice:48250344215]
 	-> [PointerTensor | me:74301155642 -> bob:33908985386]
 	*crypto provider: secure_worker*, Parameter containing:
 Parameter>FixedPrecisionTensor>[AdditiveSharingTensor]
 	-> [PointerTensor | me:22056814223 -> alice:52481508602]
 	-> [PointerTensor | me:57345607035 -> bob:87143053927]
 	*crypto provider: secure_worker*, Parameter containing:
 Parameter>FixedPrecisionTensor>[AdditiveSharingTensor]
 

### Encrypt the unlabeled dataset and obtain the teachers opinions

Ideally, we would just encrypt the whole dataset and do the same procedure as before to obtain the predictions. Sadly, the encrypted calculation takes a long time and is computationally expensive, so we must work with batches. The idea is that we take a small batch from our dataset, encrypt it and send it to all teachers for analysis. The resulting labels are sent to the `secure_worker` to be decrypted. Once they're decrypted, the `noisy_argmax` mechanism is used, and the batch of resulting labels is sent to the student. Lastly, the student concatenates each batch together to obtain all the labels.

In [None]:
batch_size = 100
testloader = th.utils.data.DataLoader(test_dataset, shuffle=False, batch_size=batch_size)

opinions = None

for i, (images, _) in enumerate(testloader):
    
    # Create a buffer with dimensions (num_teachers, batch_size)
    # This batch will later on be transposed.
    batch_opinions = th.zeros((num_teachers, len(images)))
    
    # Encrypt the buffer
    batch_opinions = batch_opinions.fix_prec().share(alice, bob, crypto_provider=secure_worker)
    
    # Encrypt the images
    images = images.fix_prec().share(alice, bob, crypto_provider=secure_worker)

    # Get opinions on the batch along every teacher
    for t in range(num_teachers):
        print(f"Batch {i}/{len(test_dataset)//batch_size} | Getting predictions on teacher {t}", end="\r")
        model = teacher_models[t].copy() # Not sure why but I must copy the model, else it wont run
        teacher_opinions = None
        
        # Send the encrypted data to the teacher. Privacy is still preserved
        images = images.send(teachers[t])

        # Get teacher's opinion
        output = model(images) 
        pred = output.argmax(dim=1)
        
        # We have our predictions, let's retrieve the data
        images = images.get()
        
        # Let's now store our predictions inside our buffer
        
        # First move the buffer to the required worker
        batch_opinions = batch_opinions.send(teachers[t])
        
        # Assign the value at the worker index
        batch_opinions[t] = pred
        
        # We have stored the predictions, let's retrieve the buffer.
        # The buffer is still encrypted
        batch_opinions = batch_opinions.get()
        
    # By now we have all the teachers' opinions stored in our buffer
        
    # We're no longer indexing by teacher, let's transpose the buffer
    batch_opinions.transpose_(1, 0)
    
    # In order to apply noisy_argmax, we must unencrypt the buffer.
    # Let's first send it to secure_worker, so we can't see the raw predictions
    batch_opinions = batch_opinions.send(secure_worker)
    
    # Now let's remotely unencrypt with remote_get
    batch_opinions = batch_opinions.remote_get()
    
    # We must return the buffer into float_precision. Currently doing
    # float_prec() on pointers is not supported, so it must be done
    # manually.
    batch_opinions = batch_opinions.owner.send_command(
        batch_opinions.location, 
        ("float_prec", batch_opinions, (), {})
    ).wrap()
    
    # Now, to use noisy_argmax, our tensor must be of type LongTensor
    batch_opinions = batch_opinions.long()
    
    # Finally, let's get our noisy labels with noisy_argmax
    noisy_labels = noisy_argmax(batch_opinions, 1)
    
    # This labels are already differentially private, so we can see them now.
    noisy_labels = noisy_labels.get()
    
    # Concatenate all opinions
    if opinions is None:
        opinions = noisy_labels
    else:
        opinions = th.cat((opinions, noisy_labels)) 
        
    

In [30]:
print(opinions)

noisy_labels = opinions

tensor([7, 2, 1,  ..., 4, 5, 6])


### Check the accuracy of the noisy aggregated opinions

Now let's compare the accuracy of the original PATE Framework with the encrypted one.

In [31]:
equals = labels == noisy_labels
accuracy = th.mean(equals.float())
print(f"Noisy Argmax Accuracy: {int(accuracy*100)}%")

Noisy Argmax Accuracy: 90%


As we can see, there's no difference in terms of accuracy! By encrypting, we're compromising computational time for privacy guarantees with the student data. 

## Conclusion

PATE is an amazing framework that gives good results and achieves high privacy guarantees at the same time. However, in situations where public data is not present, conventional PATE may be unfeasible. When this happens, it is possible to add a layer of encryption that allows data to be processed without compromising its privacy. Still, the added complexity and time required to achieve this makes it an unreasonable approach for every-day scenarios. Additionally, since the student model was trained with private data, it is still vulnerable to attacks that may reveal more information about the dataset. Because of this, conventional PATE should be preferred when the privacy of the student’s dataset is not relevant or necessary.