## Introduction to PyDPÂ¶
The PyDP package provides a Python API into Google's Differential Privacy library. 


In [None]:
! pip install python-dp



In [None]:
import statistics # for calculating mean without applying differential privacy
import pydp as dp 
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from pydp.algorithms.laplacian import BoundedSum, BoundedMean, BoundedStandardDeviation, Count, Max, Min, Median

Boston housing dataset comes from the UCI Machine Learning Repository. This data was collected in 1978 and each of the 506 entries represents aggregate information about 14 features of homes from various suburbs located in Boston.

The features can be summarized as follows:


* CRIM per capita crime rate by town

* ZN proportion of residential land zoned for lots over 25,000 sq.ft.

* INDUS proportion of non-retail business acres per town

* CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

* NOX nitric oxides concentration (parts per 10 million)

* RM average number of rooms per dwelling

* AGE proportion of owner-occupied units built prior to 1940

* DIS weighted distances to five Boston employment centres

* RAD index of accessibility to radial highways

* TAX full-value property-tax rate per 10,000usd

* PTRATIO pupil-teacher ratio by town

* B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

* LSTAT % lower status of the population

In [None]:

boston = load_boston()

# Initializing the dataframe
data = pd.DataFrame(boston.data)

#Adding the feature names to the dataframe
data.columns = boston.feature_names

#Adding target variable to dataframe
data['PRICE'] = boston.target 
data.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,PRICE
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


## What is the minimum price of the house?

In [None]:
# calculates minimum price without applying differential privacy
def min_price():
    return data["PRICE"].min()

# calculates minimum price without applying differential privacy
def private_min_price(privacy_budget: float) -> float:
        x = Min(privacy_budget, lower_bound=0.1, upper_bound=90, dtype="float")
        return x.quick_result(list(data["PRICE"]))

print("Price in 1000s")
print("Minimum price (in 1000s): ", min_price())
print("Private minimum price (in 1000s): ", private_min_price(0.8))

Price in 1000s
Minimum price (in 1000s):  5.0
Private minimum price (in 1000s):  12.859622688382935


## What is the maximum price of the house?

In [None]:
# calculates maximum price without applying differential privacy
def max_price():
    return data["PRICE"].max()

# calculates maximum price without applying differential privacy
def private_max_price(privacy_budget: float) -> float:
        x = Max(privacy_budget, lower_bound=0.1, upper_bound=90, dtype="float")
        return x.quick_result(list(data["PRICE"]))

print("Price in 1000s")
print("Maximum price (in 1000s): ", max_price())
print("Private maximum price (in 1000s): ", private_max_price(0.8))

Price in 1000s
Maximum price (in 1000s):  50.0
Private maximum price (in 1000s):  36.40014109512706


## What is the mean price of the house?

In [None]:
def mean_price() -> float:
        return statistics.mean(list(data["PRICE"]))

def private_mean_price(privacy_budget: float) -> float:
        x = BoundedMean(privacy_budget, lower_bound=0.1, upper_bound=90, dtype="float")
        return x.quick_result(list(data["PRICE"]))

print("Mean: ", mean_price())
print("Private Mean: ", private_mean_price(0.8))

Mean:  22.532806324110673
Private Mean:  22.739571661158926


## What is the median price of the house?

In [None]:
def median_price() -> float:
  return data['PRICE'].median()


def private_median_price(privacy_budget: float) -> float:
        x = Median(privacy_budget, 0, 600, dtype="float")
        return x.quick_result(list(data['PRICE']))


#Without DP
print("Median Price: ", median_price())

#With DP
print("Private median price: ", private_median_price(0.8))

Median Price:  21.2
Private median price:  21.355028651120243


## What is the count of the house?

In [None]:
def count() -> int:
    return data.count()['PRICE']

In [None]:
def private_count(privacy_budget: float) -> int:
    x = Count(privacy_budget, dtype="float")
    #return x.quick_result(list(df["Age"]))
    return x.quick_result(list(data['PRICE']))

In [None]:
print("Total House:\t" + str(count()))
print("Private Total House:\t" + str(private_count(1)))

Total House:	506
Private Total House:	507
