# Lesson: Differential Privacy for Deep Learning

So in the last lessons you may have been wondering - what does all of this have to do with Deep Learning? Well, these same techniques we were just studying form the core primitives for how Differential Privacy provides guarantees in the context of Deep Learning. 

Previously, we defined perfect privacy as "a query to a database returns the same value even if we remove any person from the database", and used this intuition in the description of epsilon/delta. In the context of deep learning we have a similar standard.

Training a model on a dataset should return the same model even if we remove any person from the dataset.

Thus, we've replaced "querying a database" with "training a model on a dataset". In essence, the training process is a kind of query. However, one should note that this adds two points of complexity which database queries did not have:

    1. do we always know where "people" are referenced in the dataset?
    2. neural models rarely never train to the same output model, even on identical data

The answer to (1) is to treat each training example as a single, separate person. Strictly speaking, this is often overly zealous as some training examples have no relevance to people and others may have multiple/partial (consider an image with multiple people contained within it). Thus, localizing exactly where "people" are referenced, and thus how much your model would change if people were removed, is challenging.

The answer to (2) is also an open problem - but several interesitng proposals have been made. We're going to focus on one of the most popular proposals, PATE.

## An Example Scenario: A Health Neural Network

First we're going to consider a scenario - you work for a hospital and you have a large collection of images about your patients. However, you don't know what's in them. You would like to use these images to develop a neural network which can automatically classify them, however since your images aren't labeled, they aren't sufficient to train a classifier. 

However, being a cunning strategist, you realize that you can reach out to 10 partner hospitals which DO have annotated data. It is your hope to train your new classifier on their datasets so that you can automatically label your own. While these hospitals are interested in helping, they have privacy concerns regarding information about their patients. Thus, you will use the following technique to train a classifier which protects the privacy of patients in the other hospitals.

- 1) You'll ask each of the 10 hospitals to train a model on their own datasets (All of which have the same kinds of labels)
- 2) You'll then use each of the 10 partner models to predict on your local dataset, generating 10 labels for each of your datapoints
- 3) Then, for each local data point (now with 10 labels), you will perform a DP query to generate the final true label. This query is a "max" function, where "max" is the most frequent label across the 10 labels. We will need to add laplacian noise to make this Differentially Private to a certain epsilon/delta constraint.
- 4) Finally, we will retrain a new model on our local dataset which now has labels. This will be our final "DP" model.

So, let's walk through these steps. I will assume you're already familiar with how to train/predict a deep neural network, so we'll skip steps 1 and 2 and work with example data. We'll focus instead on step 3, namely how to perform the DP query for each example using toy data.

So, let's say we have 10,000 training examples, and we've got 10 labels for each example (from our 10 "teacher models" which were trained directly on private data). Each label is chosen from a set of 10 possible labels (categories) for each image.

In [1]:
import numpy as np

In [2]:
numTeachers = 10 #10 partner Hospitals
numExamples = 10000 #Size of our DataSet
numLabels = 10 #number of labels for our classifer and we want 1 out of these 10

# Synthetic Dataset

In [3]:
preds = (np.random.rand(numTeachers, numExamples) * numLabels).astype(int).transpose(1,0) #Fake Predictions

In [4]:
#Just to have an insight how random.rand works like
x = np.random.rand(1,2)
y = x * 10
z = y.astype(int)
print(x,y,z)

[[0.84784545 0.04762735]] [[8.47845447 0.47627352]] [[8 0]]


In [5]:
preds[0]#Prediction of an entry by all teachers

array([2, 6, 4, 3, 1, 4, 8, 0, 8, 5])

In [6]:
preds.shape

(10000, 10)

In [7]:
anImage = preds[0]

In [8]:
anImage

array([2, 6, 4, 3, 1, 4, 8, 0, 8, 5])

Now we gonna take the above vector and transfrom into single label. We will just take the prediction occuring most of the time

In [9]:
labelCounts = np.bincount(anImage, minlength=numLabels)

In [10]:
np.argmax(labelCounts)

4

In [11]:
labelCounts
#Remember we have to take the prediction with max occurance not just max number i.e. we did bincount

array([1, 1, 1, 1, 2, 1, 1, 0, 2, 0])

But this is not Differentially Private, so we are gonna add noise 

In [12]:
epsilon = 0.1
beta = 1 / epsilon

for i in range(len(labelCounts)):
    labelCounts[i] += np.random.laplace(0, beta, 1)

In [13]:
labelCounts

array([ -6, -10,  18,  43,  12, -14,  10,  -7,  -1,  12])

In [14]:
np.argmax(labelCounts)#dont worry that it changed the label, Our DNN would take care of it

3

In [15]:
newLabels = []
for anImage in preds:
    
    labelCounts = np.bincount(anImage, minlength=numLabels)
    
    epsilon = 0.1
    beta = 1 / epsilon
    
    for i in range(len(labelCounts)):
        labelCounts[i] += np.random.laplace(0, beta, 1)
        
    newLabel = np.argmax(labelCounts)
    newLabels.append(newLabel)

In [16]:
len(newLabels)

10000

In [17]:
from syft.frameworks.torch.differential_privacy import pate

In [19]:
numTeachers, numExamples, numLabels = (100,100,10)
preds = (np.random.rand(numTeachers, numExamples) * numLabels).astype(int)#fake preds
indices = (np.random.rand(numExamples) * numLabels).astype(int)#true answers

In [23]:
data_dep_eps, data_indep_eps = pate.perform_analysis(teacher_preds=preds, indices=indices, noise_eps=0.1,delta=1e-5)

In [25]:
print("Data Independent epsilon : ",data_indep_eps) #Slightly higher
print("Data Dependent epsilon : ",data_dep_eps)
#There's tini tiny tiny tiny agreement b/w the models. well it's not surprising coz as we randomly generated the data

Data Independent epsilon :  11.756462732485115
Data Dependent epsilon :  11.756462732485105


### Lets play around

In [26]:
preds.shape

(100, 100)

In [27]:
preds[:,0:5] *= 0

In [31]:
data_dep_eps, data_indep_eps = pate.perform_analysis(teacher_preds=preds, indices=indices, noise_eps=0.1,delta=1e-5)
print("Data Independent epsilon : ",data_indep_eps)
print("Data Dependent epsilon : ",data_dep_eps)
#Ohh this dataset leaks only 7.8 epsilons as opposed to 11.7 

Data Independent epsilon :  11.756462732485115
Data Dependent epsilon :  7.867987172744541


In [32]:
preds[:,0:50] *= 0

In [34]:
data_dep_eps, data_indep_eps = pate.perform_analysis(teacher_preds=preds, indices=indices, noise_eps=0.1,delta=1e-5)
print("Data Independent epsilon : ",data_indep_eps)
print("Data Dependent epsilon : ",data_dep_eps)
# -.-  Significantly better privacy leak

Data Independent epsilon :  11.756462732485115
Data Dependent epsilon :  1.52655213289881


More the predoctions agree with each other, the tighter the value of epsilon we get