The PATE analysis is a formal set of mechanisms that is capable of looking at the true labels without the added noise and come up with the epsilon budget

In [1]:
import numpy as np

In [3]:
# Here's how we processed predictions on one of our images without differential privacy(without adding noise to the number of
# times that label got predicted by the different models)
labels = np.array([9, 9, 3, 6, 9, 9, 9, 9, 8, 2])
counts = np.bincount(labels, minlength=10)
query_result = np.argmax(counts)
query_result

9

In [None]:
# Now we use pate analysis framework provided in the pysyft library to find the value of the minimum epsilon based on the
# agreement that the models have(data dependent epsilon) and the maximum epsilon if the data didn't agree

from syft.frameworks.torch.differential_privacy import pate

# first we're gonna generate a synthetic datasest and synthetic predictions randomly which would be having the maximum 
# disagreement. This will give almost equal data dependent and data independent epsilons

num_teachers, num_labels, num_examples = (100, 10, 100)
preds = (np.random.rand(num_teachers, num_examples)*num_labels).astype(int) # fake predictions
indices = (np.random.rand(num_examples)*num_labels).astype(int) # true answers

data_dep_eps, data_indep_eps = pate.perform_analysis(teacher_preds=preds, indices=indices, noise_eps=0.1, delta=1e-5)

assert data_dep_eps < data_indep_eps
print("Data Dependent Epsilon:"+str(data_dep_eps))
print("Data Independent Epsilon:"+str(data_indep_eps))

In [None]:
# In the above output we see that the values are almost equal
# Now we make the models more agreeable
preds[:, 0:50] *= 0

data_dep_eps, data_indep_eps = pate.perform_analysis(teacher_preds=preds, indices=indices, noise_eps=0.1, delta=1e-5, moments=20)
print("Data Dependent Epsilon:"+str(data_dep_eps))
print("Data Independent Epsilon:"+str(data_indep_eps))

Now we are able to get a real low value for our Data Dependent Epsilon