<a href="https://colab.research.google.com/github/Mushrifah/Secure-and-Private-AI-Scholarship-Challenge/blob/master/Lesson%203-Introducing%20Differential%20Privacy/Lesson_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Lesson: Toy Differential Privacy - Simple Database Queries

In this section we're going to play around with Differential Privacy in the context of a database query. The database is going to be a VERY simple database with only one boolean column. Each row corresponds to a person. Each value corresponds to whether or not that person has a certain private attribute (such as whether they have a certain disease, or whether they are above/below a certain age). We are then going to learn how to know whether a database query over such a small database is differentially private or not - and more importantly - what techniques are at our disposal to ensure various levels of privacy
​
​
### First We Create a Simple Database
​
Step one is to create our database - we're going to do this by initializing a random list of 1s and 0s (which are the entries in our database). Note - the number of entries directly corresponds to the number of people in our database.

In [1]:
import torch

# the number of entries in our database
num_entries = 5000

db = torch.rand(num_entries) > 0.5
db

tensor([0, 1, 0,  ..., 1, 0, 1], dtype=torch.uint8)

## Project: Generate Parallel Databases

Key to the definition of differenital privacy is the ability to ask the question "When querying a database, if I removed someone from the database, would the output of the query be any different?". Thus, in order to check this, we must construct what we term "parallel databases" which are simply databases with one entry removed. 

In this first project, I want you to create a list of every parallel database to the one currently contained in the "db" variable. Then, I want you to create a function which both:

- creates the initial database (db)
- creates all parallel databases

### So to understand things better what we have to exactly do is to create 5000 other databases with eactly one value missing in each that is 5000 parallel databases of length 4999.

In [2]:
db1=torch.rand(num_entries)>0.5
db1

tensor([1, 0, 0,  ..., 0, 1, 0], dtype=torch.uint8)

In [3]:
db1[0:5]

tensor([1, 0, 0, 1, 0], dtype=torch.uint8)

In [0]:
def get_parallel_db(db,remove_index):
  return torch.cat((db[0:remove_index],db[remove_index+1:]))

In [5]:
get_parallel_db(db1,0)[0:5]

tensor([0, 0, 1, 0, 0], dtype=torch.uint8)

In [6]:
get_parallel_db(db1,0).shape

torch.Size([4999])

**If you pass an index that does not exists it does not shows an error but instead return the whole database**

In [7]:
get_parallel_db(db1,5366).shape

torch.Size([5000])

In [8]:
db=torch.rand(num_entries)>0.5# this is done to keep the values between 0 and 1 
db

tensor([0, 1, 1,  ..., 0, 0, 0], dtype=torch.uint8)

In [9]:
get_parallel_db(db,52352)

tensor([0, 1, 1,  ..., 0, 0, 0], dtype=torch.uint8)

In [0]:
def get_parallel_dbs(db):
  parallel_dbs=list()
  for i in range(len(db)):
    pdb=get_parallel_db(db,i)
    parallel_dbs.append(pdb)
  return parallel_dbs

In [11]:
pdbs=get_parallel_dbs(db)
pdbs[0:5]

[tensor([1, 1, 1,  ..., 0, 0, 0], dtype=torch.uint8),
 tensor([0, 1, 1,  ..., 0, 0, 0], dtype=torch.uint8),
 tensor([0, 1, 1,  ..., 0, 0, 0], dtype=torch.uint8),
 tensor([0, 1, 1,  ..., 0, 0, 0], dtype=torch.uint8),
 tensor([0, 1, 1,  ..., 0, 0, 0], dtype=torch.uint8)]

In [0]:
#this will help to have db and it's pdbs alongside together
def create_db_and_parallels(num_entries):
  db=torch.rand(num_entries)>0.5
  pdbs=get_parallel_dbs(db)
  return db,pdbs

In [0]:
db,pdbs=create_db_and_parallels(20)

In [14]:
db

tensor([1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
       dtype=torch.uint8)

In [15]:
pdbs

[tensor([0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1],
        dtype=torch.uint8),
 tensor([1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,

**So we have learned till now about how to create a function which let's us generate every parallel databse to a given input database**