From 1c6cd43567d88dce778d50f0655a21362823cb9f Mon Sep 17 00:00:00 2001 From: Shivay Lamba Date: Mon, 31 Aug 2020 14:20:17 +0530 Subject: [PATCH 1/5] Removed caps from headers --- docs/introduction.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/introduction.rst b/docs/introduction.rst index 9fffab73..ffdd2555 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -1,5 +1,5 @@ .. class:: center -INTRODUCTION TO DIFFERENTIAL PRIVACY +Introduction To Differential Privacy ============ @@ -90,7 +90,7 @@ In Global differential privacy the random noise is applied at the global level i Image Credits: Google Images -FORMAL DEFINITION OF DIFFERENTIAL PRIVACY +Formal Definition Of Differential Privacy ============ In the book, “`The Algorithmic Foundations of Differential Privacy `_” by Cynthia Dwork and Aaron Roth. Differential Privacy is formally defined as: @@ -107,7 +107,7 @@ The Epsilon *(ε)* and *Delta(δ)* parameters measure the threshold for leakage. This when both Epsilon and Delta is 0, it is called Perfect-Privacy. The values are set in such a way so that the privacy is maintained. This set of values is known as Privacy-Budget. -DIFFERENTIAL - PRIVACY IN REAL WORLD +Differential - Privacy In Real World ============ Differential Privacy ensures privacy of all sorts of data which can be used by anyone to draw insights which can help them run their business. In the present world, Differentially Private Data Analysis is widely used and these are implemented by using various libraries. From c2ede546c90e158af3201a65018017676e7d5669 Mon Sep 17 00:00:00 2001 From: Shivay Lamba Date: Mon, 31 Aug 2020 14:31:06 +0530 Subject: [PATCH 2/5] readme.rst docs --- docs/introduction.rst | 2 +- docs/readme.rst | 139 ++++++++++++++++++++++++++++++------------ 2 files changed, 102 insertions(+), 39 deletions(-) diff --git a/docs/introduction.rst b/docs/introduction.rst index ffdd2555..1f593067 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -144,7 +144,7 @@ Differential Privacy is playing an important role in building Privacy-protected -FURTHER READING +Further Reading ============ * `Secure and Private AI Course on Udacity by Andrew Trask `_ diff --git a/docs/readme.rst b/docs/readme.rst index 6d627a07..64bd5c50 100644 --- a/docs/readme.rst +++ b/docs/readme.rst @@ -1,53 +1,116 @@ -Introduction -============ +| |Tests| |Version| |License| -PyDP is a Python wrapper for Google’s `Differential Privacy`_ project. -The library provides a set of ε-differentially private algorithms, which -can be used to produce aggregate statistics over numeric data sets -containing private or sensitive information. +Introduction to PyDP +==== -PyDP is part of the OpenMined community, come join the movement on -`Slack`_. +In today's data-driven world, more and more researchers and data +scientists use machine learning to create better models or more innovative +solutions for a better future. -Instructions -============ +These models often tend to handle sensitive or personal data, which +can cause privacy issues. For example, some AI models can memorize details about the data they've been trained on and could potentially leak these +details later on. -If you’d like to contribute to this project please read these -`guidelines`_. +To help measure sensitive data leakage and reduce the possibility of +it happening, there is a mathematical framework called differential +privacy. -Usage ------ +In 2020, OpenMined created a Python wrapper for Google's `Differential +Privacy `_ project +called PyDP. The library provides a set of ε-differentially private algorithms, +which can be used to produce aggregate statistics over numeric data sets containing +private or sensitive information. Therefore, with PyDP you can control the +privacy guarantee and accuracy of your model written in Python. -As part of the 0.1.1 dev release, we have added all functions required in carrots demo. +**Things to remember about PyDP:** +- :rocket: Features differentially private algorithms including: + BoundedMean, + BoundedSum, Max, Count Above, Percentile, Min, Median, etc. +- All the computation methods mentioned above use Laplace noise only + (other + noise mechanisms will be added soon! :smiley:) +- :fire: Currently supports Linux and macOS (Windows support coming + soon + :smiley:) +- :star: Use Python 3.x. -To install the package: ``pip install python-dp`` +Installation +------------ -:: +To install PyDP, use the `PiPy `__ +package manager: - import pydp as dp # imports the DP library +.. code:: bash - # To calculate the Bounded Mean - # epsilon is a number between 0 and 1 denoting privacy threshold - # It measures the acceptable loss of privacy (with 0 meaning no loss is acceptable) - # If both the lower and upper bounds are specified, - # x = dp.BoundedMean(epsilon: double, lower: int, upper: int) - x = dp.BoundedMean(0.6, 1, 10) + pip install python-dp - # If lower and upper bounds are not specified, - # DP library automatically calculates these bounds - # x = dp.BoundedMean(epsilon: double) - x = dp.BoundedMean(0.6) +(If you have ``pip3`` separately for Python 3.x, use ``pip3 install python-dp``.) - # To get the result - # Currently supported data types are integer and float. Future versions will support additional data types - # Refer to examples/carrots.py for an introduction - x.result(input_data: list) +Examples +-------- -Known issue: If the privacy budget (epsilon is too less), we get a -StatusOR error in the command line. While this needs to be raised as an -error, right now, it’s just displayed as an error in logs. +Refer to the `curated list `__ of tutorials and sample code to learn more about the PyDP library. -.. _Differential Privacy: https://github.com/google/differential-privacy -.. _Slack: http://slack.openmined.org/ -.. _guidelines: https://github.com/OpenMined/PyDP/blob/master/contributing.md \ No newline at end of file +You can also get started with `an introduction to +PyDP `__ (a Jupyter notebook) and `the carrots demo `__ (a Python file). + +Example: calculate the Bounded Mean + +.. code:: python + + # Import PyDP + import pydp as dp + # Import the Bounded Mean algorithm + from pydp.algorithms.laplacian import BoundedMean + + # Calculate the Bounded Mean + # Structure: `BoundedMean(epsilon: double, lower: int, upper: int)` + # `epsilon`: a Double, between 0 and 1, denoting the privacy threshold, + # measures the acceptable loss of privacy (with 0 meaning no loss is acceptable) + # `lower` and `upper`: Integers, representing lower and upper bounds, respectively + x = BoundedMean(0.6, 1, 10) + + # If the lower and upper bounds are not specified, + # PyDP automatically calculates these bounds + # x = BoundedMean(epsilon: double) + x = BoundedMean(0.6) + + # Calculate the result + # Currently supported data types are integers and floats + # Future versions will support additional data types + # (Refer to https://github.com/OpenMined/PyDP/blob/dev/examples/carrots.py) + x.quick_result(input_data: list) + +Learning Resources +------------------ + +Go to `resources `__ to learn more about differential privacy. + +Support and Community on Slack +------------------------------ + +If you have questions about the PyDP library, join `OpenMined's Slack `__ and check the **#lib\_pydp** channel. To follow the code source changes, join **#code\_dp\_python**. + +Contributing +------------ + +To contribute to the PyDP project, read the `guidelines `__. + +Pull requests are welcome. If you want to introduce major changes, +please open an issue first to discuss what you would like to change. + +Please make sure to update tests as appropriate. + +.. raw:: html + + + +License +------- + +`Apache License 2.0 `__ + +.. |Tests| image:: https://img.shields.io/github/workflow/status/OpenMined/PyDP/Tests +.. |Version| image:: https://img.shields.io/github/v/tag/OpenMined/PyDP?color=green&label=pypi +.. |License| image:: https://img.shields.io/github/license/OpenMined/PyDP \ No newline at end of file From 32caf2fe45d451d657f666049099468a5af5d525 Mon Sep 17 00:00:00 2001 From: Shivay Lamba Date: Mon, 31 Aug 2020 14:40:39 +0530 Subject: [PATCH 3/5] readme.rst docs --- docs/readme.rst | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/docs/readme.rst b/docs/readme.rst index 64bd5c50..4b6148d8 100644 --- a/docs/readme.rst +++ b/docs/readme.rst @@ -25,14 +25,11 @@ privacy guarantee and accuracy of your model written in Python. **Things to remember about PyDP:** - :rocket: Features differentially private algorithms including: - BoundedMean, - BoundedSum, Max, Count Above, Percentile, Min, Median, etc. + BoundedMean, BoundedSum, Max, Count Above, Percentile, Min, Median, etc. - All the computation methods mentioned above use Laplace noise only - (other - noise mechanisms will be added soon! :smiley:) + (other noise mechanisms will be added soon! :smiley:) - :fire: Currently supports Linux and macOS (Windows support coming - soon - :smiley:) + soon :smiley:) - :star: Use Python 3.x. Installation From 57fb9093a0b9bcaedd210372d68e5498f8a1c661 Mon Sep 17 00:00:00 2001 From: Shivay Lamba Date: Mon, 31 Aug 2020 14:48:13 +0530 Subject: [PATCH 4/5] test --- docs/readme.rst | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/docs/readme.rst b/docs/readme.rst index 4b6148d8..11c14227 100644 --- a/docs/readme.rst +++ b/docs/readme.rst @@ -24,12 +24,9 @@ privacy guarantee and accuracy of your model written in Python. **Things to remember about PyDP:** -- :rocket: Features differentially private algorithms including: - BoundedMean, BoundedSum, Max, Count Above, Percentile, Min, Median, etc. -- All the computation methods mentioned above use Laplace noise only - (other noise mechanisms will be added soon! :smiley:) -- :fire: Currently supports Linux and macOS (Windows support coming - soon :smiley:) +- :rocket: Features differentially private algorithms including: BoundedMean, BoundedSum, Max, Count Above, Percentile, Min, Median, etc. +- All the computation methods mentioned above use Laplace noise only (other noise mechanisms will be added soon! :smiley:) +- :fire: Currently supports Linux and macOS (Windows support coming soon :smiley:) - :star: Use Python 3.x. Installation From 7b0fabde0c5e8c0032438b97b105be18c784c224 Mon Sep 17 00:00:00 2001 From: Shivay Lamba Date: Tue, 1 Sep 2020 22:44:00 +0530 Subject: [PATCH 5/5] warning removed --- docs/introduction.rst | 21 ++++++++++----------- docs/readme.rst | 13 +++++++------ 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/introduction.rst b/docs/introduction.rst index 1f593067..e08883c2 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -1,6 +1,5 @@ -.. class:: center Introduction To Differential Privacy -============ +==================================== Introduction @@ -9,7 +8,7 @@ Introduction The era where we are living in is data driven, tons and tons of data are being generated in every second. A lot of this data is being used to improve our own lifestyle - be it recommending the best series to watch after a tiring day at work, suggesting the best gifts to buy when it's our best friend's birthday or keeping our birthday party photos sorted so that we can cherish them years later. All big companies are using data to gain insights of their progress which drives their business. Machine Learning has made our life from easy to easier but is it just about improving our lifestyle? This raises a question can machine learning change the way we live ? Can it improve our healthcare? Can ML be friends to those who are lonely and have no one to talk with? The answer is “Yes” and also “No”. Machine Learning and Data -============ +========================= Machine Learning is extensively both data and research driven. The more the data is, better will be the research on that particular topic. Now, all data cannot be released for research, there is a lot of private information which once leaked can be misused. Take for example, to tackle a particular medical problem we need a lot of medical health records. These records are considered as private information as no person would love the fact that her/his medical records are identifiable by anyone on the internet. Hence, these are some real world issues that need immediate solutions but the hands of the researchers are tied due to the unavailability of data. So, is there a solution ? @@ -23,7 +22,7 @@ This is where “Differential Privacy” comes into the picture, a smarter way t (Privacy Preserving AI (Andrew Trask) | MIT Deep Learning Series ) Why is Differential Privacy so important ? -============ +========================================== The aim of any privacy algorithm is to keep one's private information safe and secured from external attacks. Differential privacy aims to keep an individual's identity secured even if their data is being used in research. An easy approach to maintain this kind of privacy is “Data Anonymization” which is a process of removing personally identifiable information from a dataset. It is seen that there are cons in following this approach: @@ -50,13 +49,13 @@ Despite the fact that the dataset was anonymized (no username or movie name was :align: center :figclass: align-center -They scraped the IMDB Website and by statistical analysis on these two datasets, they were able to identify the movie names and also the individual names. Ten years down the line they have published yet another `paper `_ where they have reviewed de-anonymization of datasets in the present world. There are other instances too where such attacks have been made which led to the leakage of private information. +They scraped the IMDB Website and by statistical analysis on these two datasets, they were able to identify the movie names and also the individual names. Ten years down the line they have published yet another `research paper `_ where they have reviewed de-anonymization of datasets in the present world. There are other instances too where such attacks have been made which led to the leakage of private information. Now, that we have learnt how important is “Differential Privacy”, let see how is the Differential Privacy actually implemented. How is Differential Privacy implemented ? -============ +========================================= According to `Cynthia Dwork `_- *“Differential privacy” describes a promise, made by a data holder, or curator, to a data subject: “You will not be affected, adversely or otherwise, by allowing your data to be used in any study or analysis, no matter what other studies, data sets, or information sources, are available.”* @@ -68,7 +67,7 @@ These algorithms add random noise to the queries and to the database. This is do * Global Differential Privacy Local Differential Privacy ------ +-------------------------- In local differential privacy the random noise is applied at the start of the process(local) level i.e when the data is sent to the data curator/aggregator. If the data is too confidential, generally the data generators do not want to trust the curator and hence add noise to the dataset beforehand. This is adopted when the Data Curator cannot be completely trusted. @@ -80,7 +79,7 @@ In local differential privacy the random noise is applied at the start of the pr Image Credit: Google Images Global Differential Privacy ------ +--------------------------- In Global differential privacy the random noise is applied at the global level i.e when the answer to a query is returned to the User. This type of differential privacy is adopted when the Data generators trusts the data curator completely and leaves it to the curator the amount of noise to be added to the results. This type of privacy results is more accurate as it involves lesser noise. .. figure:: https://user-images.githubusercontent.com/19529592/91381550-4ec2d400-e845-11ea-8f63-b7a3adb3fde8.png @@ -91,7 +90,7 @@ In Global differential privacy the random noise is applied at the global level i Image Credits: Google Images Formal Definition Of Differential Privacy -============ +========================================= In the book, “`The Algorithmic Foundations of Differential Privacy `_” by Cynthia Dwork and Aaron Roth. Differential Privacy is formally defined as: .. glossary:: @@ -108,7 +107,7 @@ The Epsilon *(ε)* and *Delta(δ)* parameters measure the threshold for leakage. This when both Epsilon and Delta is 0, it is called Perfect-Privacy. The values are set in such a way so that the privacy is maintained. This set of values is known as Privacy-Budget. Differential - Privacy In Real World -============ +==================================== Differential Privacy ensures privacy of all sorts of data which can be used by anyone to draw insights which can help them run their business. In the present world, Differentially Private Data Analysis is widely used and these are implemented by using various libraries. @@ -145,7 +144,7 @@ Differential Privacy is playing an important role in building Privacy-protected Further Reading -============ +=============== * `Secure and Private AI Course on Udacity by Andrew Trask `_ diff --git a/docs/readme.rst b/docs/readme.rst index 11c14227..1a1b0941 100644 --- a/docs/readme.rst +++ b/docs/readme.rst @@ -1,7 +1,7 @@ | |Tests| |Version| |License| Introduction to PyDP -==== +==================== In today's data-driven world, more and more researchers and data scientists use machine learning to create better models or more innovative @@ -24,10 +24,12 @@ privacy guarantee and accuracy of your model written in Python. **Things to remember about PyDP:** -- :rocket: Features differentially private algorithms including: BoundedMean, BoundedSum, Max, Count Above, Percentile, Min, Median, etc. -- All the computation methods mentioned above use Laplace noise only (other noise mechanisms will be added soon! :smiley:) -- :fire: Currently supports Linux and macOS (Windows support coming soon :smiley:) -- :star: Use Python 3.x. +- ::rocket: Features differentially private algorithms including: BoundedMean, BoundedSum, Max, Count Above, Percentile, Min, Median, etc. + + - All the computation methods mentioned above use Laplace noise only (other noise mechanisms will be added soon! :smiley:). + +- ::fire: Currently supports Linux and macOS (Windows support coming soon :smiley:) +- ::star: Use Python 3.x. Installation ------------ @@ -96,7 +98,6 @@ please open an issue first to discuss what you would like to change. Please make sure to update tests as appropriate. -.. raw:: html