Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 38 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
# Important notice

This is a pre-release version of the extension and is not intended for general use yet.
It may be unstable and documentation is limited.
If you have any questions, please contact us at [hello@open-diffix.org](mailto:hello@open-diffix.org).

# PG Diffix

`pg_diffix` is a PostgreSQL extension for strong dynamic anonymization. It ensures that answers to simple SQL queries are anonymous. For more information, visit the [Open Diffix](https://www.open-diffix.org/) website.

Check out the [Admin Tutorial](docs/admin_tutorial.md) for an example on how to set up `pg_diffix`.
See the [Admin Guide](docs/admin_guide.md) for details on configuring and using the extension.
**For administrators:** Check out the [admin tutorial](docs/admin_tutorial.md) for an example on how to set up `pg_diffix`.
See the [admin guide](docs/admin_guide.md) for details on configuring and using the extension.
To install from source, see the [installation](#installation) section.

**For analysts:** The [banking notebook](docs/banking.ipynb) provides example queries against a real dataset.
The [analyst guide](docs/analyst_guide.md) describes the SQL features and limitations imposed by `pg_diffix`.

## Installation

Expand All @@ -34,7 +32,9 @@ every session start for restricted users. This can be accomplished by configurin
For example, to automatically load the `pg_diffix` extension for all users connecting to a database,
you can execute the following command:

`ALTER DATABASE db_name SET session_preload_libraries TO 'pg_diffix';`
```
ALTER DATABASE db_name SET session_preload_libraries TO 'pg_diffix';
```

Once loaded, the extension logs information to `/var/log/postgresql/postgresql-13-main.log` or equivalent.

Expand All @@ -48,7 +48,9 @@ You might also need to remove the extension from the list of preloaded libraries

For example, to reset the list of preloaded libraries for a database, you can execute the following command:

`ALTER DATABASE db_name SET session_preload_libraries TO DEFAULT;`
```
ALTER DATABASE db_name SET session_preload_libraries TO DEFAULT;
```

## Testing the extension

Expand All @@ -67,7 +69,10 @@ or if available, just make your usual PostgreSQL user a `SUPERUSER`.

Or you can use the [PGXN Extension Build and Test Tools](https://github.com/pgxn/docker-pgxn-tools) Docker image:

`docker run -it --rm --mount "type=bind,src=$(pwd),dst=/repo" pgxn/pgxn-tools sh -c 'cd /repo && apt update && apt install -y jq && pg-start 13 && pg-build-test'`.
```
docker run -it --rm --mount "type=bind,src=$(pwd),dst=/repo" pgxn/pgxn-tools sh -c \
'cd /repo && apt update && apt install -y jq && pg-start 13 && pg-build-test'
```

## Docker images

Expand All @@ -82,15 +87,21 @@ The example below shows how to build the image and run a minimally configured co

Build the image:

`make image`
```
make image
```

Run the container in foreground and expose in port 10432:

`docker run --rm --name pg_diffix -e POSTGRES_PASSWORD=postgres -p 10432:5432 pg_diffix`
```
docker run --rm --name pg_diffix -e POSTGRES_PASSWORD=postgres -p 10432:5432 pg_diffix
```

From another shell you can connect to the container via `psql`:

`psql -h localhost -p 10432 -d postgres -U postgres`
```
psql -h localhost -p 10432 -d postgres -U postgres
```

For more advanced usage see the [official image reference](https://hub.docker.com/_/postgres).

Expand All @@ -108,16 +119,25 @@ Three users are created, all of them with password `demo`:

Build the image:

`make demo-image`
```
make demo-image
```

Run the container in foreground and expose in port 10432:

`docker run --rm --name pg_diffix_demo -e POSTGRES_PASSWORD=postgres -e BANKING_PASSWORD=demo -p 10432:5432 pg_diffix_demo`
```
docker run --rm --name pg_diffix_demo -e POSTGRES_PASSWORD=postgres -e BANKING_PASSWORD=demo -p 10432:5432 pg_diffix_demo
```

Connect to the banking database (from another shell) for anonymized access:

`psql -h localhost -p 10432 -d banking -U trusted_user`
```
psql -h localhost -p 10432 -d banking -U trusted_user
```

To keep the container running you can start it in detached mode and with a restart policy:

`docker run -d --name pg_diffix_demo --restart unless-stopped -e POSTGRES_PASSWORD=postgres -e BANKING_PASSWORD=demo -p 10432:5432 pg_diffix_demo`
```
docker run -d --name pg_diffix_demo --restart unless-stopped \
-e POSTGRES_PASSWORD=postgres -e BANKING_PASSWORD=demo -p 10432:5432 pg_diffix_demo
```
26 changes: 5 additions & 21 deletions docs/admin_guide.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
# Important notice

This is a pre-release version of the extension and is not intended for general use yet.
It may be unstable and documentation is limited.
If you have any questions, please contact us at [hello@open-diffix.org](mailto:hello@open-diffix.org).

# Configuration

This document provides detailed information about the configuration, behavior and recommended usage of `pg_diffix`.
Expand Down Expand Up @@ -42,7 +36,7 @@ Trusted users have fewer SQL restrictions than untrusted users, and therefore ha

For example, the command to assign the access level `anonymized_untrusted` to the role `public_access` is:

```SQL
```
CALL diffix.mark_role('public_access', 'anonymized_untrusted');
```

Expand Down Expand Up @@ -75,12 +69,12 @@ __NOTE:__ if AID columns are not correctly labeled, the extension may fail to an
The procedure `diffix.mark_personal(table_name, aid_columns...)` is used to label a table as personal and
to label its AID columns. For example:

```SQL
```
CALL diffix.mark_personal('employee_info', 'employee_id');
```
labels the table `employee_info` as personal, and labels the `employee_id` column as an AID column.

```SQL
```
CALL diffix.mark_personal('transactions', 'sender_acct', 'receiver_acct');
```
labels the table `transactions` as personal, and labels the `sender_acct` and `receiver_acct` columns as AID columns.
Expand Down Expand Up @@ -158,17 +152,7 @@ Default value is `*`. Any user can change this setting.

## Restricted features and extensions

**TODO:** I think this kind of information is better put in the notebook tutorial? Or if you want it here it seems incomplete or something. Needs work...

For users other than `direct`, various data and features built into PostgreSQL are restricted. Among others:

1. Issue utility statements like `COPY` and `ALTER TABLE`, beside a few allowlisted ones, are not allowed.
2. Some of the data in `pg_catalog` tables like `pg_user_functions` is not accessible.
3. Selected subset of less frequently used PostgreSQL query features like `EXISTS` or `NULLIF` are disabled.
4. Inheritance involving a personal table is not allowed.
5. Some of the output of `EXPLAIN` for queries involving a personal table is censored.

**NOTE** If any of the currently blocked features is necessary for your use case, open an issue and let us know.
For a detailed description of supported SQL features and restrictions, see the [analyst guide](analyst_guide.md).

Row level security (RLS) can be enabled and used on personal tables.
It is advised that the active policies are vetted from the point of view of anonymity.
Expand All @@ -192,7 +176,7 @@ Given that AIDs may not be perfect, some care must be taken in the selection of

For example, imagine the following query in a table where `account_number` is the AID column:

```sql
```
SELECT last_name, religion, count(*)
FROM table
GROUP BY last_name, religion
Expand Down
62 changes: 37 additions & 25 deletions docs/admin_tutorial.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
# Important notice

This is a pre-release version of the extension and is not intended for general use yet.
It may be unstable and documentation is limited.
If you have any questions, please contact us at [hello@open-diffix.org](mailto:hello@open-diffix.org).

# Admin tutorial

This document provides an example on how to install and configure `pg_diffix` to expose a simple dataset
Expand All @@ -14,52 +8,70 @@ containing a column named `id`, which uniquely identifies protected entities (th

## Installation

1. Install the packages required for building the extension:
1\. Install the packages required for building the extension:

`sudo apt-get install make jq gcc postgresql-server-dev-14`
```
sudo apt-get install make jq gcc postgresql-server-dev-14
```

2. Install PGXN Client tools:
2\. Install PGXN Client tools:

`sudo apt-get install pgxnclient`
```
sudo apt-get install pgxnclient
```

3. Install the extension:
3\. Install the extension:

`sudo pgxn install pg_diffix`
```
sudo pgxn install pg_diffix
```

## Activation

1. Connect to the database as a superuser:
1\. Connect to the database as a superuser:

`sudo -u postgres psql test_db`
```
sudo -u postgres psql test_db
```

2. Activate the extension for the current database:
2\. Activate the extension for the current database:

`CREATE EXTENSION pg_diffix;`
```
CREATE EXTENSION pg_diffix;
```

3. Automatically load the extension for all users connecting to the database:
3\. Automatically load the extension for all users connecting to the database:

`ALTER DATABASE test_db SET session_preload_libraries TO 'pg_diffix';`
```
ALTER DATABASE test_db SET session_preload_libraries TO 'pg_diffix';
```

## Configuration

1. Label the test data as personal (requiring anonymization):
1\. Label the test data as personal (requiring anonymization):

`CALL diffix.mark_personal('test_table', 'id');`
```
CALL diffix.mark_personal('test_table', 'id');
```

2. Create an account for the analyst:
2\. Create an account for the analyst:

`CREATE USER analyst_role WITH PASSWORD 'some_password';`
```
CREATE USER analyst_role WITH PASSWORD 'some_password';
```

3. Give the analyst read-only access to the test database:
3\. Give the analyst read-only access to the test database:

```
GRANT CONNECT ON DATABASE test_db TO analyst_role;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst_role;
```

4. Label the analyst as restricted and trusted:
4\. Label the analyst as restricted and trusted:

`CALL diffix.mark_role('analyst_role', 'anonymized_trusted');`
```
CALL diffix.mark_role('analyst_role', 'anonymized_trusted');
```


__That's it!__ The analyst can now connect to the database and issue (only) anonymizing queries against the test dataset.
7 changes: 0 additions & 7 deletions docs/analyst_guide.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
# Important notice

This is a pre-release version of the extension and is not intended for general use yet.
It may be unstable and documentation is limited.
If you have any questions, please contact us at [hello@open-diffix.org](mailto:hello@open-diffix.org).

# Analyst guide

This document describes features and restrictions of `pg_diffix` for users with anonymized access to a database.
Expand All @@ -12,7 +6,6 @@ mechanisms that Diffix Elm uses to protect personal data.

## Table of Contents

- [Important notice](#important-notice)
- [Analyst guide](#analyst-guide)
- [Table of Contents](#table-of-contents)
- [Access levels](#access-levels)
Expand Down