# This notebook teaches you how to calibrate noise to privacy budgets using `autodp.privacy_calibrator`

The hallmark of differential private algorithm design is to be able to **calibirate noise to privacy** requirements. 
`autodp.privacy_calibrator` provides a suite of modern tools that allow us to achieve this goal. 

We give a few examples below how to use these tools.

### 1. The three ways to calibrate Gaussian mechanism.
Suppose we want to run Gaussian mechanism, given a prescribed privacy budget: $\epsilon,\delta$, how do we know how much noise to add?

In [1]:
from autodp import privacy_calibrator

### Classic calibration
Let's first try the simplest approach --- the classical Gaussian mechanism --- which simply sets
$$\sigma = \frac{\sqrt{2\log(1.25/\delta)}}{\epsilon}$$

In [2]:
eps = 0.1
delta = 1e-8
params = privacy_calibrator.classical_gaussian_mech(eps,delta)
# Assume the L2 sensitivity is 1
print(f'eps,delta = ({eps},{delta}) ==> Noise level sigma=',params['sigma'])

eps,delta = (0.1,1e-08) ==> Noise level sigma= 61.063613216491824


### RDP-based calibration
The classical Guassian mechanism however should only be used for $0<\epsilon \leq 1$. The RDP-based Gaussian mechanism solves this problem very easily.

In [3]:
eps = 2
delta = 1e-8
params = privacy_calibrator.gaussian_mech(eps,delta)
print(f'eps,delta = ({eps},{delta}) ==> Noise level sigma=',params['sigma'])

eps,delta = (2,1e-08) ==> Noise level sigma= 3.0348542587703093


**Homework**: try RDP-based calibration for $\epsilon <1$ and compare with the classical calibration. 

### Analytical calibration.

This is the optimal calibration that dominates the classical or the RDP-based approaches.

See: Balle and Wang (ICML 2018) https://arxiv.org/pdf/1805.06530.pdf

In [4]:
eps = 0.1
delta = 1e-8
params = privacy_calibrator.ana_gaussian_mech(eps,delta)
# Assume the L2 sensitivity is 1
print(f'eps,delta = ({eps},{delta}) ==> Noise level sigma=',params['sigma'])

# Note that the classical Gaussian mechanism does not work with eps > 1.
# What we do in this case is to use the RDP approach to calibrate the noise
eps = 2
delta = 1e-8
params = privacy_calibrator.ana_gaussian_mech(eps,delta)
print(f'eps,delta = ({eps},{delta}) ==> Noise level sigma=',params['sigma'])

eps,delta = (0.1,1e-08) ==> Noise level sigma= 45.93737105307458
eps,delta = (2,1e-08) ==> Noise level sigma= 2.6529219041434144


Note that the analytical Gaussian mechanism always improves over the classical Gaussian mechanism, or the RDP-based calibration.

### 2. $(\epsilon,\delta)$-DP calibration for pure DP mechanisms.
Usually, Laplace mechanism and randomized responses are $\epsilon$-DP algorithms,  but what if we want to calibrate them for $(\epsilon,\delta)$-DP?

In [5]:
eps = 1
delta = 0
params = privacy_calibrator.laplace_mech(eps,delta)

# Assume the L1 sensitivity is 1
print(f'eps,delta = ({eps},{delta}) ==> Noise level b=',params['b'])

eps = 1
delta = 1e-6
params = privacy_calibrator.laplace_mech(eps,delta)
print(f'eps,delta = ({eps},{delta}) ==> Noise level b=',params['b'])

eps = 1
delta = 0.5
params = privacy_calibrator.laplace_mech(eps,delta)
print(f'eps,delta = ({eps},{delta}) ==> Noise level b=',params['b'])


eps,delta = (1,0) ==> Noise level b= 1.0
eps,delta = (1,1e-06) ==> Noise level b= 1.0
eps,delta = (1,0.5) ==> Noise level b= 0.9999996965588319


Note that the noise levels are identical between $\delta=0$ and $\delta=1e-6$. As $\delta$ gets larger, then we can then add a much smaller amount of noise. Clearly $\delta=0.5$ is not regarded as an acceptable privacy level, but when we compose multiple such Laplace mechanisms, the ranges of $\delta$ where we get a benefit become more reasonable.

### 3. Calibrating the noise for multiple rounds.

Suppose I know that I will be running $k$-rounds of Laplace mechanism. How should I calibrate the noise level for each one?

In [6]:

# Let k be the number of rounds

klist = [2,4,8,16,32,64,128,256,512,1024,2048]
b_list=[]
b_list0=[]

delta = 1e-6
eps = 1
for k in klist:
    params0 = privacy_calibrator.laplace_mech(eps,0,k=k)
    params = privacy_calibrator.laplace_mech(eps,delta,k=k)
    b_list0.append(params0['b'])
    b_list.append(params['b'])
    print(f'eps,delta = ({eps},0) over {k} rounds ==> Noise level b=',params0['b'])
    print(f'eps,delta = ({eps},{delta}) over {k} rounds ==> Noise level b=',params['b'])


eps,delta = (1,0) over 2 rounds ==> Noise level b= 2.0
eps,delta = (1,1e-06) over 2 rounds ==> Noise level b= 2.0
eps,delta = (1,0) over 4 rounds ==> Noise level b= 4.0
eps,delta = (1,1e-06) over 4 rounds ==> Noise level b= 4.0
eps,delta = (1,0) over 8 rounds ==> Noise level b= 8.0
eps,delta = (1,1e-06) over 8 rounds ==> Noise level b= 8.0
eps,delta = (1,0) over 16 rounds ==> Noise level b= 16.0
eps,delta = (1,1e-06) over 16 rounds ==> Noise level b= 16.0
eps,delta = (1,0) over 32 rounds ==> Noise level b= 32.0
eps,delta = (1,1e-06) over 32 rounds ==> Noise level b= 27.254774615541674
eps,delta = (1,0) over 64 rounds ==> Noise level b= 64.0
eps,delta = (1,1e-06) over 64 rounds ==> Noise level b= 40.35369243930351
eps,delta = (1,0) over 128 rounds ==> Noise level b= 128.0
eps,delta = (1,1e-06) over 128 rounds ==> Noise level b= 58.24937511027943
eps,delta = (1,0) over 256 rounds ==> Noise level b= 256.0
eps,delta = (1,1e-06) over 256 rounds ==> Noise level b= 83.20083600159232
eps,delta

### Let's plot it in (log-log plot) so we can see the scaling. 

In [7]:
import matplotlib.pyplot as plt
#%matplotlib inline 


plt.figure(num=1, figsize=(8, 6), dpi=80, facecolor='w', edgecolor='k')
plt.loglog(klist, b_list,'-o')
plt.loglog(klist, b_list0,'--x')
plt.grid()
plt.xlabel(r'Number of rounds $k$')
plt.ylabel(r'Calibrated Noise level $b$')
plt.legend([r'$\epsilon=1,\delta=1e-6$',r'$\epsilon=1,\delta = 0$'])
plt.show()

<matplotlib.figure.Figure at 0x110983f28>

As we can see, as number of rounds get large, having a non-zero delta allows us to get a way with adding only $O(\sqrt{k})$ noise rather than $O(k)$ noise. This is what we call a strong composition. Calculating strong composition and calibrating the noise to privacy for a sequence of mechanisms using strong composition is a pain. However, everything becomes extremely easy with `autodp.privacy_calibrator`.

### 4. Calibrating the noise of subsampled mechanisms
Subsampling a dataset before applying differentially private mechanisms has the known effect of amplifying privacy. Let $\gamma$ be the proportion of the data points that we randomly select in the subsampling step, before applying Gaussian mechanism. Then the question becomes how can we calibrate $b$ given $\epsilon,\delta,\gamma$. 

Here is how you can do it with `autodp.privacy_calibrator`.


In [8]:
# Specify the input
eps = 1
delta = 1e-6
gamma = 0.01

# First, apply subsampling lemma to calibrate the basic privacy needed
eps0,delta0 = privacy_calibrator.subsample_epsdelta_inverse(eps,delta,gamma)
# Then we can get the amount of noise needed from the base mechanism

print((eps0,delta0))
params = privacy_calibrator.gaussian_mech(eps0,delta0)
params2 = privacy_calibrator.laplace_mech(eps0,delta0)

print(f'Gaussian: eps,delta,gamma = ({eps},{delta},{gamma}) ==> Noise level sigma=',params['sigma'])
print(f'Laplace: eps,delta,gamma = ({eps},{delta},{gamma}) ==> Noise level b=',params2['b'])
print('The corresponding variances of the two approaches are: ',(params['sigma']**2, 2 * params2['b']**2))

(5.152297938244442, 9.999999999999999e-05)
Gaussian: eps,delta,gamma = (1,1e-06,0.01) ==> Noise level sigma= 0.8330131727674832
Laplace: eps,delta,gamma = (1,1e-06,0.01) ==> Noise level b= 0.19408815483615705
The corresponding variances of the two approaches are:  (0.6939109460041488, 0.07534042369540814)


The above exposition is somewhat interesting, as adding Laplace noise produces nearly an order of magnitude smaller variance than adding Gaussian noise! 


### 5. Calibrating the noise for multiple rounds of subsampled mechanisms

Sometimes, when we know that there is a fixed number of rounds that we want to apply our mechanisms, and we would like to calibrate the noise so we can achieve an overall prescribed privacy budgets. This is traditionally a challenging tasks because there are several internal parameters to optimize over.

With `autodp`, we can simply call, e.g., `privacy_calibrator.gaussian_mech` with two additional clauses.

This approach is most powerful when $k$ is large, in which case it is not even simpler but also often requires adding more than an order of  magnitude smaller amount of noise. 

See details in 
- Wang, Balle, Kasiviswanathan (2018): Subsampled Renyi Differential Privacy and Analytical Moment accountant.


In [9]:
# Specify the input
eps = 1.0
delta = 1e-6
gamma = 0.001

# Using RDP
params = privacy_calibrator.gaussian_mech(eps,delta,prob=gamma,k=10000)
print(f'eps,delta,gamma = ({eps},{delta},{gamma}) ==> Noise level sigma=',params['sigma'])




eps,delta,gamma = (1.0,1e-06,0.001) ==> Noise level sigma= 1.2912729645090977


Let's check what kind of privacy guaratee the classical advanced composition will give us.

In [10]:

from autodp import rdp_bank, dp_bank, dp_acct
import numpy as np
k=10000
def find_eps_classic(logdelta0):
    acct = dp_acct.DP_acct()
    func =  lambda x: rdp_bank.RDP_gaussian(params, x)
    eps0 = dp_bank.get_eps_rdp(func, np.exp(logdelta0))
    eps1,delta1 = privacy_calibrator.subsample_epsdelta(eps0,np.exp(logdelta0),gamma)
    for i in range(k):
        acct.update_DPlosses(eps1,delta1)

    return acct.get_eps(delta)

eps_alt = find_eps_classic(np.log(delta/11.0))

# 11.0 is a very carefully chosen number. One can 
print(f'The same mechanism with advanced composition only gives epsilon = ',eps_alt)

The same mechanism with advanced composition only gives epsilon =  74.9998135868238
