From 63f3bcbf50e6c44bdc3735d7356ec17a03d7197c Mon Sep 17 00:00:00 2001 From: Edon Gashi Date: Mon, 23 May 2022 13:51:50 +0200 Subject: [PATCH 1/3] Remove notice from docs --- README.md | 6 ------ docs/admin_guide.md | 6 ------ docs/admin_tutorial.md | 6 ------ docs/analyst_guide.md | 7 ------- 4 files changed, 25 deletions(-) diff --git a/README.md b/README.md index ce6e12d9..3733b0ba 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,3 @@ -# Important notice - -This is a pre-release version of the extension and is not intended for general use yet. -It may be unstable and documentation is limited. -If you have any questions, please contact us at [hello@open-diffix.org](mailto:hello@open-diffix.org). - # PG Diffix `pg_diffix` is a PostgreSQL extension for strong dynamic anonymization. It ensures that answers to simple SQL queries are anonymous. For more information, visit the [Open Diffix](https://www.open-diffix.org/) website. diff --git a/docs/admin_guide.md b/docs/admin_guide.md index 73fa0828..71614354 100644 --- a/docs/admin_guide.md +++ b/docs/admin_guide.md @@ -1,9 +1,3 @@ -# Important notice - -This is a pre-release version of the extension and is not intended for general use yet. -It may be unstable and documentation is limited. -If you have any questions, please contact us at [hello@open-diffix.org](mailto:hello@open-diffix.org). - # Configuration This document provides detailed information about the configuration, behavior and recommended usage of `pg_diffix`. diff --git a/docs/admin_tutorial.md b/docs/admin_tutorial.md index dd8f6033..5ab51859 100644 --- a/docs/admin_tutorial.md +++ b/docs/admin_tutorial.md @@ -1,9 +1,3 @@ -# Important notice - -This is a pre-release version of the extension and is not intended for general use yet. -It may be unstable and documentation is limited. -If you have any questions, please contact us at [hello@open-diffix.org](mailto:hello@open-diffix.org). - # Admin tutorial This document provides an example on how to install and configure `pg_diffix` to expose a simple dataset diff --git a/docs/analyst_guide.md b/docs/analyst_guide.md index 94c3a373..f27aa97d 100644 --- a/docs/analyst_guide.md +++ b/docs/analyst_guide.md @@ -1,9 +1,3 @@ -# Important notice - -This is a pre-release version of the extension and is not intended for general use yet. -It may be unstable and documentation is limited. -If you have any questions, please contact us at [hello@open-diffix.org](mailto:hello@open-diffix.org). - # Analyst guide This document describes features and restrictions of `pg_diffix` for users with anonymized access to a database. @@ -12,7 +6,6 @@ mechanisms that Diffix Elm uses to protect personal data. ## Table of Contents -- [Important notice](#important-notice) - [Analyst guide](#analyst-guide) - [Table of Contents](#table-of-contents) - [Access levels](#access-levels) From e95ed657bffee88f43e730c62b510ec06d178833 Mon Sep 17 00:00:00 2001 From: Edon Gashi Date: Mon, 23 May 2022 14:09:20 +0200 Subject: [PATCH 2/3] Fix code blocks in docs --- README.md | 42 +++++++++++++++++++++++-------- docs/admin_guide.md | 8 +++--- docs/admin_tutorial.md | 56 ++++++++++++++++++++++++++++-------------- 3 files changed, 73 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index 3733b0ba..4711d7ee 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,9 @@ every session start for restricted users. This can be accomplished by configurin For example, to automatically load the `pg_diffix` extension for all users connecting to a database, you can execute the following command: -`ALTER DATABASE db_name SET session_preload_libraries TO 'pg_diffix';` +``` +ALTER DATABASE db_name SET session_preload_libraries TO 'pg_diffix'; +``` Once loaded, the extension logs information to `/var/log/postgresql/postgresql-13-main.log` or equivalent. @@ -42,7 +44,9 @@ You might also need to remove the extension from the list of preloaded libraries For example, to reset the list of preloaded libraries for a database, you can execute the following command: -`ALTER DATABASE db_name SET session_preload_libraries TO DEFAULT;` +``` +ALTER DATABASE db_name SET session_preload_libraries TO DEFAULT; +``` ## Testing the extension @@ -61,7 +65,10 @@ or if available, just make your usual PostgreSQL user a `SUPERUSER`. Or you can use the [PGXN Extension Build and Test Tools](https://github.com/pgxn/docker-pgxn-tools) Docker image: -`docker run -it --rm --mount "type=bind,src=$(pwd),dst=/repo" pgxn/pgxn-tools sh -c 'cd /repo && apt update && apt install -y jq && pg-start 13 && pg-build-test'`. +``` +docker run -it --rm --mount "type=bind,src=$(pwd),dst=/repo" pgxn/pgxn-tools sh -c \ + 'cd /repo && apt update && apt install -y jq && pg-start 13 && pg-build-test' +``` ## Docker images @@ -76,15 +83,21 @@ The example below shows how to build the image and run a minimally configured co Build the image: -`make image` +``` +make image +``` Run the container in foreground and expose in port 10432: -`docker run --rm --name pg_diffix -e POSTGRES_PASSWORD=postgres -p 10432:5432 pg_diffix` +``` +docker run --rm --name pg_diffix -e POSTGRES_PASSWORD=postgres -p 10432:5432 pg_diffix +``` From another shell you can connect to the container via `psql`: -`psql -h localhost -p 10432 -d postgres -U postgres` +``` +psql -h localhost -p 10432 -d postgres -U postgres +``` For more advanced usage see the [official image reference](https://hub.docker.com/_/postgres). @@ -102,16 +115,25 @@ Three users are created, all of them with password `demo`: Build the image: -`make demo-image` +``` +make demo-image +``` Run the container in foreground and expose in port 10432: -`docker run --rm --name pg_diffix_demo -e POSTGRES_PASSWORD=postgres -e BANKING_PASSWORD=demo -p 10432:5432 pg_diffix_demo` +``` +docker run --rm --name pg_diffix_demo -e POSTGRES_PASSWORD=postgres -e BANKING_PASSWORD=demo -p 10432:5432 pg_diffix_demo +``` Connect to the banking database (from another shell) for anonymized access: -`psql -h localhost -p 10432 -d banking -U trusted_user` +``` +psql -h localhost -p 10432 -d banking -U trusted_user +``` To keep the container running you can start it in detached mode and with a restart policy: -`docker run -d --name pg_diffix_demo --restart unless-stopped -e POSTGRES_PASSWORD=postgres -e BANKING_PASSWORD=demo -p 10432:5432 pg_diffix_demo` +``` +docker run -d --name pg_diffix_demo --restart unless-stopped \ + -e POSTGRES_PASSWORD=postgres -e BANKING_PASSWORD=demo -p 10432:5432 pg_diffix_demo +``` diff --git a/docs/admin_guide.md b/docs/admin_guide.md index 71614354..c1648301 100644 --- a/docs/admin_guide.md +++ b/docs/admin_guide.md @@ -36,7 +36,7 @@ Trusted users have fewer SQL restrictions than untrusted users, and therefore ha For example, the command to assign the access level `anonymized_untrusted` to the role `public_access` is: -```SQL +``` CALL diffix.mark_role('public_access', 'anonymized_untrusted'); ``` @@ -69,12 +69,12 @@ __NOTE:__ if AID columns are not correctly labeled, the extension may fail to an The procedure `diffix.mark_personal(table_name, aid_columns...)` is used to label a table as personal and to label its AID columns. For example: -```SQL +``` CALL diffix.mark_personal('employee_info', 'employee_id'); ``` labels the table `employee_info` as personal, and labels the `employee_id` column as an AID column. -```SQL +``` CALL diffix.mark_personal('transactions', 'sender_acct', 'receiver_acct'); ``` labels the table `transactions` as personal, and labels the `sender_acct` and `receiver_acct` columns as AID columns. @@ -186,7 +186,7 @@ Given that AIDs may not be perfect, some care must be taken in the selection of For example, imagine the following query in a table where `account_number` is the AID column: -```sql +``` SELECT last_name, religion, count(*) FROM table GROUP BY last_name, religion diff --git a/docs/admin_tutorial.md b/docs/admin_tutorial.md index 5ab51859..42f39d64 100644 --- a/docs/admin_tutorial.md +++ b/docs/admin_tutorial.md @@ -8,52 +8,70 @@ containing a column named `id`, which uniquely identifies protected entities (th ## Installation -1. Install the packages required for building the extension: +1\. Install the packages required for building the extension: -`sudo apt-get install make jq gcc postgresql-server-dev-14` +``` +sudo apt-get install make jq gcc postgresql-server-dev-14 +``` -2. Install PGXN Client tools: +2\. Install PGXN Client tools: -`sudo apt-get install pgxnclient` +``` +sudo apt-get install pgxnclient +``` -3. Install the extension: +3\. Install the extension: -`sudo pgxn install pg_diffix` +``` +sudo pgxn install pg_diffix +``` ## Activation -1. Connect to the database as a superuser: +1\. Connect to the database as a superuser: -`sudo -u postgres psql test_db` +``` +sudo -u postgres psql test_db +``` -2. Activate the extension for the current database: +2\. Activate the extension for the current database: -`CREATE EXTENSION pg_diffix;` +``` +CREATE EXTENSION pg_diffix; +``` -3. Automatically load the extension for all users connecting to the database: +3\. Automatically load the extension for all users connecting to the database: -`ALTER DATABASE test_db SET session_preload_libraries TO 'pg_diffix';` +``` +ALTER DATABASE test_db SET session_preload_libraries TO 'pg_diffix'; +``` ## Configuration -1. Label the test data as personal (requiring anonymization): +1\. Label the test data as personal (requiring anonymization): -`CALL diffix.mark_personal('test_table', 'id');` +``` +CALL diffix.mark_personal('test_table', 'id'); +``` -2. Create an account for the analyst: +2\. Create an account for the analyst: -`CREATE USER analyst_role WITH PASSWORD 'some_password';` +``` +CREATE USER analyst_role WITH PASSWORD 'some_password'; +``` -3. Give the analyst read-only access to the test database: +3\. Give the analyst read-only access to the test database: ``` GRANT CONNECT ON DATABASE test_db TO analyst_role; GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst_role; ``` -4. Label the analyst as restricted and trusted: +4\. Label the analyst as restricted and trusted: -`CALL diffix.mark_role('analyst_role', 'anonymized_trusted');` +``` +CALL diffix.mark_role('analyst_role', 'anonymized_trusted'); +``` __That's it!__ The analyst can now connect to the database and issue (only) anonymizing queries against the test dataset. From 761b3f676e5f9e93846f030e11c055fc8917ce9f Mon Sep 17 00:00:00 2001 From: Edon Gashi Date: Mon, 23 May 2022 14:22:10 +0200 Subject: [PATCH 3/3] Update cross-links in docs --- README.md | 8 ++++++-- docs/admin_guide.md | 12 +----------- 2 files changed, 7 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 4711d7ee..eb827364 100644 --- a/README.md +++ b/README.md @@ -2,8 +2,12 @@ `pg_diffix` is a PostgreSQL extension for strong dynamic anonymization. It ensures that answers to simple SQL queries are anonymous. For more information, visit the [Open Diffix](https://www.open-diffix.org/) website. -Check out the [Admin Tutorial](docs/admin_tutorial.md) for an example on how to set up `pg_diffix`. -See the [Admin Guide](docs/admin_guide.md) for details on configuring and using the extension. +**For administrators:** Check out the [admin tutorial](docs/admin_tutorial.md) for an example on how to set up `pg_diffix`. +See the [admin guide](docs/admin_guide.md) for details on configuring and using the extension. +To install from source, see the [installation](#installation) section. + +**For analysts:** The [banking notebook](docs/banking.ipynb) provides example queries against a real dataset. +The [analyst guide](docs/analyst_guide.md) describes the SQL features and limitations imposed by `pg_diffix`. ## Installation diff --git a/docs/admin_guide.md b/docs/admin_guide.md index c1648301..73bc4880 100644 --- a/docs/admin_guide.md +++ b/docs/admin_guide.md @@ -152,17 +152,7 @@ Default value is `*`. Any user can change this setting. ## Restricted features and extensions -**TODO:** I think this kind of information is better put in the notebook tutorial? Or if you want it here it seems incomplete or something. Needs work... - -For users other than `direct`, various data and features built into PostgreSQL are restricted. Among others: - -1. Issue utility statements like `COPY` and `ALTER TABLE`, beside a few allowlisted ones, are not allowed. -2. Some of the data in `pg_catalog` tables like `pg_user_functions` is not accessible. -3. Selected subset of less frequently used PostgreSQL query features like `EXISTS` or `NULLIF` are disabled. -4. Inheritance involving a personal table is not allowed. -5. Some of the output of `EXPLAIN` for queries involving a personal table is censored. - -**NOTE** If any of the currently blocked features is necessary for your use case, open an issue and let us know. +For a detailed description of supported SQL features and restrictions, see the [analyst guide](analyst_guide.md). Row level security (RLS) can be enabled and used on personal tables. It is advised that the active policies are vetted from the point of view of anonymity.