
## 1.1 Introduction to PyDP
The PyDP package provides a Python API into [Google's Differential Privacy library](https://github.com/google/differential-privacy). This example uses the alpha 0.1 version of the package that has the following limitations:


*   Supports Linux (Windows coming soon)
*   Supports Python 3.6 only (more support coming soon)
*  Current implements an algorithm to support computing private mean using a Laplace noise generation technique.
* Supports only integer and floating point values



In [1]:
# Install the PyDP package
! pip install python-dp

Collecting python-dp
[?25l  Downloading https://files.pythonhosted.org/packages/dd/fb/7383d552f093c0cb0cc71a550eaa1c4bd504f3c62b267888c19ff0a8167d/python_dp-0.1.0-py2.py3-none-any.whl (3.6MB)
[K     |                                | 10kB 18.6MB/s eta 0:00:01[K     |▏                               | 20kB 3.3MB/s eta 0:00:02[K     |▎                               | 30kB 4.7MB/s eta 0:00:01[K     |▍                               | 40kB 3.1MB/s eta 0:00:02[K     |▌                               | 51kB 3.8MB/s eta 0:00:01[K     |▌                               | 61kB 4.5MB/s eta 0:00:01[K     |▋                               | 71kB 5.2MB/s eta 0:00:01[K     |▊                               | 81kB 4.1MB/s eta 0:00:01[K     |▉                               | 92kB 4.5MB/s eta 0:00:01[K     |█                               | 102kB 5.0MB/s eta 0:00:01[K     |█                               | 112kB 5.0MB/s eta 0:00:01[K     |█                               | 122kB 5.0MB/

In [0]:
import pydp as dp # by convention our package is to be imported as dp (for Differential Privacy!)
import pandas as pd
import statistics # for calculating mean without applying differential privacy

In [5]:
# get carrots data from our public github repo
url = 'https://raw.githubusercontent.com/OpenMined/PyDP/dev/examples/animals_and_carrots.csv'
df = pd.read_csv(url,sep=",", names=["animal", "carrots_eaten"])
df.head()

Unnamed: 0,animal,carrots_eaten
0,Aardvark,1
1,Albatross,88
2,Alligator,35
3,Alpaca,99
4,Ant,69


Taking Mean of all the entries in a normal fashion without Applying the DP library. This is the actual mean of all the records.

In [0]:
# calculates mean without applying differential privacy
def mean_carrots() -> float:
        return statistics.mean(list(df["carrots_eaten"]))

Private Mean uses Differential Privacy Library by Google to calculate the Mean. To preserve privacy, Laplacian mechanism is used.

The function takes the argument privacy_budget as input.

It is a number between 0 and 1, denoting privacy threshold

It measures the acceptable loss of privacy (with 0 meaning no loss is acceptable).

`dp.BoundedMean.result()` takes a List of integer/ float as an input and returns the list 


In [0]:
# calculates mean applying differential privacy
def private_mean(privacy_budget: float) -> float:
        x = dp.BoundedMean(privacy_budget)
        return x.result(list(df["carrots_eaten"]))

As you can see, the value of Private Mean varries compares to the Mean calculted using normal Statistical methods.

This difference in values refers to that privacy is actually preserved for individual records in it.

In [8]:
print("Mean: ", mean_carrots())
print("Private Mean: ", private_mean(0.8))

Mean:  53.01648351648352
Private Mean:  71.27272727272728
