# Project: Local Differential Privacy

As you can see, the basic sum query is not differentially private at all! In truth, differential privacy always requires a form of randomness added to the query. Let me show you what I mean.

### Randomized Response (Local Differential Privacy)

Let's say I have a group of people I wish to survey about a very taboo behavior which I think they will lie about (say, I want to know if they have ever committed a certain kind of crime). I'm not a policeman, I'm just trying to collect statistics to understand the higher level trend in society. So, how do we do this? One technique is to add randomness to each person's response by giving each person the following instructions (assuming I'm asking a simple yes/no question):

- Flip a coin 2 times.
- If the first coin flip is heads, answer honestly
- If the first coin flip is tails, answer according to the second coin flip (heads for yes, tails for no)!

Thus, each person is now protected with "plausible deniability". If they answer "Yes" to the question "have you committed X crime?", then it might becasue they actually did, or it might be becasue they are answering according to a random coin flip. Each person has a high degree of protection. Furthermore, we can recover the underlying statistics with some accuracy, as the "true statistics" are simply averaged with a 50% probability. Thus, if we collect a bunch of samples and it turns out that 60% of people answer yes, then we know that the TRUE distribution is actually centered around 70%, because 70% averaged wtih 50% (a coin flip) is 60% which is the result we obtained. 

However, it should be noted that, especially when we only have a few samples, this comes at the cost of accuracy. This tradeoff exists across all of Differential Privacy. The greater the privacy protection (plausible deniability) the less accurate the results. 

Let's implement this local DP for our database before!

In [1]:
import torch

In [2]:
def getParallelDB(db , removeIndex):
    return(torch.cat((db[:removeIndex] , db[removeIndex+1:])))

In [3]:
def getParallelDatabases(db):
    parallelDatabases = []
    for i in range(len(db)):
        pdb = getParallelDB(db , i)
        parallelDatabases.append(pdb)
    
    return parallelDatabases

In [4]:
def get_createDB_and_parallels(numOfEntries):
    db = torch.rand(numOfEntries)>0.5
    pdbs = getParallelDatabases(db)
    return(db , pdbs)

In [7]:
db, pdbs = get_createDB_and_parallels(100)

In [10]:
trueResult = torch.mean(db.float())

In [11]:
trueResult

tensor(0.5600)

We want to noise our dataset with LocalDifferential Privacy and it is all about adding noise to the data itself
# -
So adding noise to data means replacing some of these values random values 

In [14]:
firstCoinFlip = (torch.rand(len(db)) > 0.5).float()

In [15]:
firstCoinFlip

tensor([0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1., 1., 1.,
        0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0., 1., 1.,
        0., 0., 0., 1., 0., 1., 0., 1., 0., 1., 1., 1., 0., 1., 1., 0., 0., 1.,
        1., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 1., 0., 1., 1.,
        0., 1., 0., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 1., 0.,
        1., 0., 0., 1., 1., 0., 1., 0., 1., 0.])

In [16]:
secondCoinFlip = (torch.rand(len(db)) > 0.5).float()

In [17]:
secondCoinFlip

tensor([1., 1., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 1., 1., 1., 0., 0.,
        0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 0., 0., 0., 1.,
        1., 0., 0., 0., 0., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., 0., 1.,
        0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 1.,
        0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 0., 1., 1., 0., 0., 1.,
        1., 0., 0., 0., 0., 1., 1., 0., 0., 1.])

If 1st coinFlip is head(that means 1) we keep the values that are inthe database<br>
So we can Multiply the database with firstCoinFlip<br>
After we'll having 0 at the places tail occured<br>
Now I can do (1 - firstCoinFlip) to interchange ones and zeroes and after I can multiply it with secondCoinFlip

In [18]:
db.float() * firstCoinFlip

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0.,
        0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0.,
        0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1.,
        0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 1.,
        0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1., 0.,
        1., 0., 0., 1., 0., 0., 1., 0., 1., 0.])

In [20]:
(1 - firstCoinFlip) * secondCoinFlip

tensor([1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 0., 0., 0.,
        0., 0., 0., 1., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0.,
        1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
        0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1.,
        0., 0., 0., 0., 0., 1., 0., 0., 0., 1.])

In [27]:
augmentedDatabase = db.float() * firstCoinFlip + (1-firstCoinFlip) * secondCoinFlip

In [28]:
m = torch.mean(augmentedDatabase.float())

In [31]:
# (meanOfDatabase + actual)/2 = augemented
# DeSkew

In [33]:
augmentedResult = (m * 2) - 0.5
augmentedResult

tensor(0.5600)

# Lets Package

In [38]:
def query(db):
    
    trueResult = torch.mean(db.float())
    
    firstCoinFlip = (torch.rand(len(db)) > 0.5).float()
    secondCoinFlip = (torch.rand(len(db)) > 0.5).float()
    
    augmentedDatabase = db.float() * firstCoinFlip + ((1-firstCoinFlip) * secondCoinFlip)
    
    dbResult = torch.mean(augmentedDatabase.float()) * 2 - 0.5
    
    return dbResult , trueResult

In [39]:
query(db)

(tensor(0.5400), tensor(0.5600))

In [62]:
db, pdbs = get_createDB_and_parallels(10)
withNoise, withoutNoise = query(db)
print("With Noise {} And Without Noise {}".format(withNoise,withoutNoise))

With Noise 0.30000001192092896 And Without Noise 0.6000000238418579


In [63]:
db, pdbs = get_createDB_and_parallels(100)
withNoise, withoutNoise = query(db)
print("With Noise {} And Without Noise {}".format(withNoise,withoutNoise))

With Noise 0.5800000429153442 And Without Noise 0.5199999809265137


In [64]:
db, pdbs = get_createDB_and_parallels(1000)
withNoise, withoutNoise = query(db)
print("With Noise {} And Without Noise {}".format(withNoise,withoutNoise))

With Noise 0.4819999933242798 And Without Noise 0.5289999842643738


In [65]:
db, pdbs = get_createDB_and_parallels(10000)
withNoise, withoutNoise = query(db)
print("With Noise {} And Without Noise {}".format(withNoise,withoutNoise))

With Noise 0.49299997091293335 And Without Noise 0.5026999711990356


# Project: Varying Amounts of Noise

In this project, I want you to augment the randomized response query (the one we just wrote) to allow for varying amounts of randomness to be added. Specifically, I want you to bias the coin flip to be higher or lower and then run the same experiment. 

Note - this one is a bit tricker than you might expect. You need to both adjust the likelihood of the first coin flip AND the de-skewing at the end (where we create the "augmented_result" variable).

# ------------------------------------

In [80]:
trueResult = torch.mean(db.float())
    
firstCoinFlip = (torch.rand(len(db)) > noise).float() #Biased coin
secondCoinFlip = (torch.rand(len(db)) > 0.5).float()

augmentedDatabase = db.float() * firstCoinFlip + ((1-firstCoinFlip) * secondCoinFlip)

#There will change in DeSkewing
skResult = torch.mean(augmentedDatabase.float()) 

#return dbResult , trueResult

In [72]:
trueMean = 0.7
noiseMean = 0.5 #second coin flip


In [73]:
#Initially what we are doing is . . .
augmentedMean = (trueMean + noiseMean)/2
augmentedMean

0.6

In [76]:
noise = 0.5 #First coin
#Which actually means 50% of the time we use trueMean and 50% of the time we use noiseMean

In [78]:
augmentedMean = (trueMean*noise) + (noiseMean*(1 - noise))
augmentedMean

0.6

We can get trueMean from above equation by simple algebra

In [81]:
privateResult = (skResult/noise) - ((0.5*(1-noise)) / noise) 

In [82]:
privateResult

tensor(0.5006)

In [84]:
skResult

tensor(0.5003)

# ------------------------------------

In [86]:
def query(db, noise=0.2):
    
    trueResult = torch.mean(db.float())
    
    firstCoinFlip = (torch.rand(len(db)) > noise).float() #Biased coin
    secondCoinFlip = (torch.rand(len(db)) > 0.5).float()
    
    augmentedDatabase = db.float() * firstCoinFlip + ((1-firstCoinFlip) * secondCoinFlip)
    
    #There will change in DeSkewing
    skResult = torch.mean(augmentedDatabase.float()) 
    
    privateResult = (skResult/noise) - ((0.5*(1-noise))/noise)
    return privateResult , trueResult

In [87]:
db, pdbs = get_createDB_and_parallels(100)
withNoise, withoutNoise = query(db, noise=0.1)
print("With Noise {} And Without Noise {}".format(withNoise,withoutNoise))

With Noise 0.40000009536743164 And Without Noise 0.47999998927116394


In [88]:
db, pdbs = get_createDB_and_parallels(100)
withNoise, withoutNoise = query(db, noise=0.2)
print("With Noise {} And Without Noise {}".format(withNoise,withoutNoise))

With Noise 0.7999999523162842 And Without Noise 0.5699999928474426


In [89]:
db, pdbs = get_createDB_and_parallels(100)
withNoise, withoutNoise = query(db, noise=0.4)
print("With Noise {} And Without Noise {}".format(withNoise,withoutNoise))

With Noise 0.6749999523162842 And Without Noise 0.5799999833106995


In [90]:
db, pdbs = get_createDB_and_parallels(100)
withNoise, withoutNoise = query(db, noise=0.8)
print("With Noise {} And Without Noise {}".format(withNoise,withoutNoise))

With Noise 0.42500001192092896 And Without Noise 0.47999998927116394


In [91]:
db, pdbs = get_createDB_and_parallels(10000)
withNoise, withoutNoise = query(db, noise=0.2)
print("With Noise {} And Without Noise {}".format(withNoise,withoutNoise))

With Noise 0.48149991035461426 And Without Noise 0.5004000067710876
