## Improvement of Impelemntation of Differential Privacy

### To open this notebook in Google Colab :
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FadyMorris/udacity-fb-private-ai-challenge/blob/master/contributions/differential_privacy_improved.ipynb)

In [1]:
import torch

## Generate Parallel Databases

In [2]:
def get_parallel_db(db, remove_index):
    return torch.cat((db[0:remove_index], db[remove_index+1:]))

In [3]:
def get_parallel_dbs(db):

    parallel_dbs = list()

    for i in range(len(db)):
        pdb = get_parallel_db(db, i)
        parallel_dbs.append(pdb)
    
    return parallel_dbs

In [4]:
def create_db_and_parallels(num_entries):
    
    db = torch.rand(num_entries) > 0.5
    pdbs = get_parallel_dbs(db)
    
    return db, pdbs

# Project: Varying Amounts of Noise

In _original query_ (`query_original`) the line :

`first_coin_flip = (torch.rand(len(db)) < noise).float()`  

creates `first_coin_flip` with probability equal to `noise`. The resulting error is not noticable when specifying `coin2_probability = 0.5` but it becomes very large and significant when setting it to other values.

The `noise` is introduced when the _first coin filp_ resuls in a **tail** not a **head**.  
Therefore _first_coin_flip_ and _noise_ are complementary.

$$p( \text{first_coin_flip}) = 1 - p(\text{noise})$$

I have modified the line in function `improved_query` to :  
`first_coin_flip = (torch.rand(len(db)) < (1-noise)).float()`

I've also modified the `query_original` and `query_improved` to accept `coin2_probability` as an input parameter, to test the effect of varaying _second coin_ bias on the final error.

## Defining Original and Improved Database Queries :

In [5]:
# Original Query from Course
def query_original(db, noise=0.2, coin2_probability=0.5):
    
    true_result = torch.mean(db.float())

    first_coin_flip = (torch.rand(len(db)) < noise).float()
    second_coin_flip = (torch.rand(len(db)) < coin2_probability).float()

    augmented_database = db.float() * first_coin_flip + (1 - first_coin_flip) * second_coin_flip

    sk_result = augmented_database.float().mean()

    private_result = ((sk_result / noise) - coin2_probability) * noise / (1 - noise)

    return private_result, true_result

In [107]:
# Query after correcting coin1 flip probability
def query_improved(db, noise=0.2, coin2_probability=0.5):
    
    true_result = torch.mean(db.float())

    first_coin_flip = (torch.rand(len(db)) < (1-noise)).float()
    second_coin_flip = (torch.rand(len(db)) < coin2_probability).float()

    augmented_database = db.float() * first_coin_flip + (1 - first_coin_flip) * second_coin_flip

    sk_result = augmented_database.float().mean()

    private_result = ((sk_result / noise) - coin2_probability) * noise / (1 - noise)

    return private_result, true_result

## Running Tests on Both Queries

To test the modification, create a _parallalel database_ (`db`) and run both the queries on it multiple times (run the code cell below multiple times), while also varying the `noise` and `coin2_probability` parameters, then test the final results(`difference` and `error` percentage.)

In [111]:
db, _ = create_db_and_parallels(10000)

noise = 0.2
coin2_probability = 0.7

private_result_original, true_result_original = query_original(db, noise=noise, coin2_probability=coin2_probability)
difference_original = true_result_original - private_result_original
error_original = difference_original / true_result_original * 100
print("With Noise(Original):" + str(private_result_original))
print("Without Noise(Original):" + str(true_result_original))
print("Difference(Original):" + str(difference_original))
print("Error(Original): %f %%" % error_original.double())


print("-"*50)

private_result_improved, true_result_improved = query_improved(db, noise=noise, coin2_probability=coin2_probability)
difference_improved = true_result_improved - private_result_improved
error_improved = difference_improved / true_result_improved * 100
print("With Noise(Improved Query):" + str(private_result_improved))
print("Without Noise(Improved Query):" + str(true_result_improved))
print("Difference(Improved Query):" + str(difference_improved))
print("Error(Improved Query): %f %%" % error_improved.double())


With Noise(Original):tensor(0.6554)
Without Noise(Original):tensor(0.4919)
Difference(Original):tensor(-0.1635)
Error(Original): -33.233395 %
--------------------------------------------------
With Noise(Improved Query):tensor(0.4926)
Without Noise(Improved Query):tensor(0.4919)
Difference(Improved Query):tensor(-0.0007)
Error(Improved Query): -0.147388 %
