Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is this Python project?
Qrlew (/ˈkɝlu/) is the open source library that rewrites SQL queries into privacy-preserving variants using Differential Privacy (DP).
Use Qrlew if you want to bring privacy guarantees to your SQL pipelines. It is:
What's the difference between this Python project and similar ones?
There are a few existing open-source libraries for differential privacy.
Some libraries focus on deep learning and DP-SGD, such as: Opacus, Tensorflow Privacy or Optax's DP-SGD. Qrlew has a very different goal: analytics and SQL.
GoogleDP is a library implementing many differentially private mechanisms in various languages (C++, Go and Java).
IBM's diffprivlib is also a rich library implementing a wide variety of DP primitives in python and in particular many DP versions of classical machine learning algorithms.
These libraries provide the bricks for experts to build DP algorithms. Qrlew has a very different approach, it is a high level tool designed to take queries written in SQL by a data practitioner with no expertise in privacy and to rewrite them into DP equivalent able to run on any SQL-enabled data store. Qrlew implemented very few DP mechanisms to date, but automated the whole process of rewriting a query, while these library offer a rich variety of DP mechanism, and give full control to the user to use them as they wish.
Google built several higher-level tools on top of.
PrivacyOnBeam is a framework to run DP jobs written in Apache Beam with its Go SDK.
PipelineDP is a framework that let analysts write Beam-like or Spark-like programs and have them run on Apache Spark or Apache Beam as back-end. It focuses on the Beam and Spark ecosystem, while Qrlew tries to provide an SQL interface to the analyst and runs on SQL-enabled back-ends (including Spark, a variety of data warehouses, and more traditional databases).
ZetaSQL, gives the user a way to write SQL-like queries and have them executed on tables using GoogleDB custom code, so it is not compatible with any SQL data store and support relatively simple queries only.
OpenDP is a powerful Rust library with a python bindings. It offers many possibilities of building complex DP computations by composing basic elements. Nonetheless, it require both expertise in privacy and to learn a new API to describe a query. Also, the computations are handled by the Rust core, so it does not integrate easily with existing data stores and may not scale well either.
Tumult Analytics shares many of the nice composable design of OpenDP, but runs on Apache Spark, making it a scalable alternative to OpenDP. Still, it require the learning of a specific API (close to that of Spark) and cannot leverage any SQL back-end.
SmartNoise SQL is a library that share some of the design choices of Qrlew. An analyst can write SQL queries, but the scope of possible queries is relatively limited: no
JOIN
s, no sub-queries, no CTEs (WITH
) that Qrlew supports. Also, it does not run the full computation in the DB so the integration with existing systems may not be straightforward.Other systems such as PINQ and Chorus are prototypes that do not seem to be actively maintained. Chorus shares many of the design goals of Qrlew, but requires post-processing outside of the DB, which can make the integration more complex on the data-owner side (as the computation happens in two distinct places).
Beyond that, Qrlew brings unique functionalities, such as:
WHERE x < b
orWHERE x IN (1,2,3)
to give hints to the Qrlew;--
Anyone who agrees with this pull request could submit an Approve review to it.