# Towards Evaluating The Differential Privacy of a Function

Intuitively, we want to be able to query our database and evaluate whether or not the result of the query is leaking "private" information. As mentioned previously, this is about evaluating whether the output of a query changes when we remove someone from the database. Specifically, we want to evaluate the *maximum* amount the query changes when someone is removed (maximum over all possible people who could be removed). So, in order to evaluate how much privacy is leaked, we're going to iterate over each person in the database and measure the difference in the output of the query relative to when we query the entire database. 

Just for the sake of argument, let's make our first "database query" a simple sum. Aka, we're going to count the number of 1s in the database.

In [1]:
import torch

In [2]:
def getParallelDB(db , removeIndex):
    return(torch.cat((db[:removeIndex] , db[removeIndex+1:])))

In [3]:
def getParallelDatabases(db):
    parallelDatabases = []
    for i in range(len(db)):
        pdb = getParallelDB(db , i)
        parallelDatabases.append(pdb)
    
    return parallelDatabases

In [4]:
def get_createDB_and_parallels(numOfEntries):
    db = torch.rand(numOfEntries)>0.5
    pdbs = getParallelDatabases(db)
    return(db , pdbs)

In [6]:
db , pdbs = get_createDB_and_parallels(20)#database with 20 entries

In [14]:
#db

In [9]:
#pdbs

In [10]:
def query(db):
    return db.sum()

In [13]:
query(db)#Looks like there are 12 1's in database

tensor(12)

In [16]:
query(pdbs[0]) , query(pdbs[1]) , query(pdbs[2]) , query(pdbs[3]) , query(pdbs[4])

(tensor(11), tensor(11), tensor(12), tensor(11), tensor(12))

So we can see the output of the query changes when we remove people from the database<br>
Let's see what is the maximum amount that this query changes when we remove someone from database

In [17]:
full_db_result = query(db)

In [20]:
#Lets iterate over every database
maxDistance = 0
for pdb in pdbs:
    pdb_result = query(pdb)
    
    #how much it differs
    db_distance = torch.abs(full_db_result - pdb_result)# L1 norm if we would have taken square then L2 norm
    
    if db_distance > maxDistance:
        maxDistance = db_distance

In [22]:
sensitivity = maxDistance

In [23]:
sensitivity

tensor(1)

In [24]:
# Sum has consistent sensitivity

# Project - Evaluating the Privacy of a Function

In the last section, we measured the difference between each parallel db's query result and the query result for the entire database and then calculated the max value (which was 1). This value is called "sensitivity", and it corresponds to the function we chose for the query. Namely, the "sum" query will always have a sensitivity of exactly 1. However, we can also calculate sensitivity for other functions as well.

Let's try to calculate sensitivity for the "mean" function.

In [29]:
def sensitivity(query , numOfEntries = 1000):
    
    db, pdbs = get_createDB_and_parallels(numOfEntries) 
    full_db_result = query(db)
    #Lets iterate over every database
    maxDistance = 0
    for pdb in pdbs:
        pdb_result = query(pdb)

        #how much it differs
        db_distance = torch.abs(full_db_result - pdb_result)# L1 norm if we would have taken square then L2 norm

        if db_distance > maxDistance:
            maxDistance = db_distance
    return maxDistance

In [30]:
def query(db):
    return db.sum()

In [31]:
sensitivity(query) # same as above

tensor(1)

In [36]:
#Lets change this query function
def query(db):
    return db.float().mean()

In [37]:
sensitivity(query)

tensor(0.0005)

In [38]:
#its is basically average value divided by numOfEntries
0.5 / 1000

0.0005

Wow! That sensitivity is WAY lower. Note the intuition here. "Sensitivity" is measuring how sensitive the output of the query is to a person being removed from the database. For a simple sum, this is always 1, but for the mean, removing a person is going to change the result of the query by rougly 1 divided by the size of the database (which is much smaller). Thus, "mean" is a VASTLY less "sensitive" function (query) than SUM.

# Project: Calculate L1 Sensitivity For Threshold

In this first project, I want you to calculate the sensitivity for the "threshold" function. 

- First compute the sum over the database (i.e. sum(db)) and return whether that sum is greater than a certain threshold.
- Then, I want you to create databases of size 10 and threshold of 5 and calculate the sensitivity of the function. 
- Finally, re-initialize the database 10 times and calculate the sensitivity each time.

In [99]:
def query(db, threshold=5):
    return (db.sum() > threshold).float()

In [117]:
db, pdbs = get_createDB_and_parallels(size)
db.sum() , query(db)

#Run this cell again & again

(tensor(5), tensor(0.))

In [118]:
bool(query(db))

False

In [40]:
size = 10

In [132]:
for i in range(10):
    
    sens = sensitivity(query , numOfEntries=10)
    print(sens)
    

0
0
0
0
0
tensor(1.)
0
tensor(1.)
0
0


# Lesson: A Basic Differencing Attack

Sadly none of the functions we've looked at so far are differentially private (despite them having varying levels of sensitivity). The most basic type of attack can be done as follows.

Let's say we wanted to figure out a specific person's value in the database. All we would have to do is query for the sum of the entire database and then the sum of the entire database without that person!

# Project: Perform a Differencing Attack on Row 10

In this project, I want you to construct a database and then demonstrate how you can use two different sum queries to explose the value of the person represented by row 10 in the database (note, you'll need to use a database with at least 10 rows)

In [150]:
db , _ = get_createDB_and_parallels(100)

In [151]:
pdb = getParallelDB(db, removeIndex=10)

In [155]:
sum(db)

tensor(51, dtype=torch.uint8)

In [152]:
#What was the value we removed
db[10]

tensor(1, dtype=torch.uint8)

> Differencing Attack using sum query

In [153]:
sum(db) - sum(pdb)#should give exact value 

tensor(1, dtype=torch.uint8)

> Differencing Attack using mean query

In [156]:
(sum(db).float() > 50) - (sum(pdb).float() > 50)

tensor(1, dtype=torch.uint8)

> Differencing Attack using Threshold query

As we form differential private techniques, we want them to specifically be immune to these kinds of attacks.