Here our goal is to find out the maximum amount by which a query on the database we generated in the previous
 project changes when a a value(person) is removed From the database

In [10]:
# So first we create our database and parallel database generating function

import torch

def create_parallel_db(db, remove_index):
    return torch.cat((db[0:remove_index], db[remove_index+1:]))

def create_db_and_pdbs(num_entries):
    
    db = torch.rand(num_entries) > 0.5
    pdbs = list()
    for i in range(len(db)):
        pdbs.append(create_parallel_db(db, i))
        
    return db, pdbs

In [28]:
db, pdbs = create_db_and_pdbs(20)

In [29]:
# Now we write a function to make a sum query on the database
# Note : We do not need to compute the privacy or the sensitivity of a function everytime we are in the software production 
# process as these are consistent values and are known prior

def sum_query(db):
    return db.sum()

In [30]:
full_db_result = sum_query(db)

In [31]:
sum_query(pdb[6])

tensor(8)

Here the output of the query changes as we remove a person from the database. That is, the output of th query is conditioned directly on the information from a lot of people in this database. The role of sensitivity here can be understood by considering that if the sensitivity of a query were zero , it will mean that the output of the query won't change regardless of who we remove from the database.

In [32]:
# Now we are ready to compute the maximum amount by which this query changes when we remove someone from the database

max_distance = 0
for pdb in pdbs:
    pdb_result = sum_query(pdb)
    db_distance = torch.abs(pdb_result - full_db_result) # because we're taking the difference here its also called L1 sensitivity if we take the square of distance it is L2 sensitivity
    if db_distance  > max_distance:
        max_distance = db_distance

In [33]:
max_distance

tensor(1)

As we can easily guess the value changes by 1 as all the data is binary and we're removing only one person , 
so either the query doesn't  change or at max changes by 1
Also the max distance is independent of the number of entries
Moreover, the computations we're performing here are rather useless because we know what kind of database this is, what function we're using as a query and the minimum and maximum values int the database.
This has more been used as a tool for teaching the foundation for future concepts rather than for arriving at a specific result. The max_distance here can also be called empirical sensitivity or L1 sensitivity.

In [36]:
# Now the next step is to write a function that takes a query function and no. of entries in the database as arguments and 
# returns the L1 sensitivity

def sensitivity(query, num_entries):
    db = torch.rand(num_entries)>0.5
    pdbs = list()
    for i in range(len(db)):
        pdbs.append(torch.cat((db[0:i], db[i+1:])))
    full_db_result = query(db)
    max_distance = 0
    for pdb in pdbs:
        pdb_result = query(pdb)
        db_distance = torch.abs(pdb_result - full_db_result)
        if db_distance > max_distance:
            max_distance = db_distance
    return max_distance

In [39]:
sensitivity(sum_query, 20) # should return 1

tensor(1)

In [40]:
# Now lets calculate the L1 sensitivity on a query for mean of the database

def mean_query(db):
    return db.float().mean()

In [42]:
sensitivity(mean_query, 1000)

tensor(0.0005)

So the sensitivity we are computing here by removing one value from the database isn' just related to removing a single value but we are supposed to remove all the information related to a person and then compute the sensitivity as will be the case when we are dealing with real world databases with multiples fields of information about a single individual.

Thus we are trying to find out how much the output value from the function sensitivity is using information from each individual or is it only an aggregation of information that multiple individuals are contributing to.

It will be apparent that its a lot easier to preserve privacy if the output of our function is actually information that multiple people are contributing to.