From b302b8cecaf796a0bdb2d66d7983ad33024d56fe Mon Sep 17 00:00:00 2001 From: James Sadler Date: Thu, 10 Oct 2024 22:50:49 +1100 Subject: [PATCH 1/6] First pass at `ste_vec` docs for JSONB containment indexing --- README.md | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 99 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 2bfe7114..cfb0f5c1 100644 --- a/README.md +++ b/README.md @@ -120,6 +120,7 @@ EQL provides specialized functions to interact with encrypted data: - **`cs_match_v1(val JSONB)`**: Enables basic full-text search. - **`cs_unique_v1(val JSONB)`**: Retrieves the unique index for enforcing uniqueness. - **`cs_ore_v1(val JSONB)`**: Retrieves the Order-Revealing Encryption index for range queries. +- **`cs_ste_vec_v1(val JSONB)`**: Retrieves the Structured Encryption Vector for containment queries. ### 3.3 Index functions @@ -149,6 +150,7 @@ Supported types: - big_int - boolean - date + - jsonb ###### match opts @@ -205,6 +207,92 @@ If you're using n-gram as a token filter, then a token that is already shorter t However, if that same short string only appears as a part of a larger token, then it will not match that record. In general, therefore, you should try to ensure that the string you search for is at least as long as the `tokenLength` of the index, except in the specific case where you know that there are shorter tokens to match, _and_ you are explicitly OK with not returning records that have that short string as part of a larger token. +###### ste_vec opts + +An ste_vec index on a encrypted JSONB column enables the use of Postgres's `@>` and `<@` containment operators. + +An ste_vec index requires one piece of configuration: the `prefix` (a string) which is functionally similar to a salt for the hashing process. + +Within a dataset, encrypted columns indexed using an ste_vec that use different prefixes can never compare as equal and containment queries that manage to mix index terms from multiple columns will never return a positive result. This is by design. + +The index is generated from a JSONB document by first flattening the structure of the document such that a hash can be generated for each unique path prefix to a node. + +For a document like this: + +```json +{ + "account": { + "email": "alice@example.com", + "name": { + "first_name": "Alice", + "last_name": "McCrypto", + }, + "roles": [ + "admin", + "owner", + ] + } +} +``` + +Hashes would be produced from the following list of entries: + +```json +[ + [Obj, Key("account"), Obj, Key("email"), String("alice@example.com")], + [Obj, Key("account"), Obj, Key("name"), Obj, Key("first_name"), String("Alice")], + [Obj, Key("account"), Obj, Key("name"), Obj, Key("last_name"), String("McCrypto")], + [Obj, Key("account"), Obj, Key("roles"), Array, String("admin")], + [Obj, Key("account"), Obj, Key("roles"), Array, String("owner")], +] +``` + +Using the first entry to illustrate how an entry is converted to hashes: + +```json +[Obj, Key("account"), Obj, Key("email"), String("alice@example.com")] +``` + +The hashes would be generated for all prefixes of the full path to the leaf node. + +```json +[ + [Obj], + [Obj, Key("account")], + [Obj, Key("account"), Obj], + [Obj, Key("account"), Obj, Key("email")], + [Obj, Key("account"), Obj, Key("email"), String("alice@example.com")], + // (remaining leaf nodes omitted) +] +``` + +Query terms are processed in the same manner as the input document. + +A query prior to encrypting & indexing looks like a structurally similar subset of the encrypted document, for example: + +```json +{ "account": { "email": "alice@example.com", "roles": "admin" }} +``` + +The expression `cs_ste_vec_v1(encrypted_account) @> cs_ste_vec_v1($query)` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "alice@example.com". + +When reduced to a prefix list, it would look like this: + +```json +[ + [Obj], + [Obj, Key("account")], + [Obj, Key("account"), Obj], + [Obj, Key("account"), Obj, Key("email")], + [Obj, Key("account"), Obj, Key("email"), String("alice@example.com")] + [Obj, Key("account"), Obj, Key("roles")], + [Obj, Key("account"), Obj, Key("roles"), Array], + [Obj, Key("account"), Obj, Key("roles"), Array, String("admin")] +] +``` + +Which is then turned into an ste_vec of hashes which can be directly queries against the index. + #### 3.3.2 cs_modify_index ```sql @@ -262,6 +350,15 @@ cs_ore_v1(val jsonb) Extracts an ore index from the `jsonb` value. Returns `null` if no ore index is present. +#### 3.3.5 cs_ste_vec_v1 + +```sql +cs_ste_vec_v1(val jsonb) +``` + +Extracts an ste_vec index from the `jsonb` value. +Returns `null` if no ste_vec index is present. + ### 3.4 Data Format Encrypted data is stored as `jsonb` with a specific schema: @@ -310,7 +407,8 @@ Cipherstash proxy handles the encoding, and EQL provides the functions. | c | Ciphertext | Ciphertext value. Encrypted by proxy. Required if kind is plaintext/pt or encrypting/et. | m.1 | Match index | Ciphertext index value. Encrypted by proxy. | o.1 | ORE index | Ciphertext index value. Encrypted by proxy. -| u.1 | Uniqueindex | Ciphertext index value. Encrypted by proxy. +| u.1 | Unique index | Ciphertext index value. Encrypted by proxy. +| sv.1 | STE vector index | Ciphertext index value. Encrypted by proxy. #### 3.4.1 Helper packages From 98692bb462e83e300e74ca05972191f8ff1ab53d Mon Sep 17 00:00:00 2001 From: James Sadler Date: Thu, 10 Oct 2024 23:08:51 +1100 Subject: [PATCH 2/6] Add doc for which JSON types are supported by ste_vec --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index cfb0f5c1..15d47b33 100644 --- a/README.md +++ b/README.md @@ -217,6 +217,14 @@ Within a dataset, encrypted columns indexed using an ste_vec that use different The index is generated from a JSONB document by first flattening the structure of the document such that a hash can be generated for each unique path prefix to a node. +The complete set of JSON types is supported by the indexer. Null values are ignored by the indexer. + +- Object `{ ... }` +- Array `[ ... ]` +- String `"abc"` +- Boolean `true` +- Number `123.45` + For a document like this: ```json From bb99384c2634b95980e1c80e2182ee32dc454399 Mon Sep 17 00:00:00 2001 From: James Sadler Date: Thu, 10 Oct 2024 23:13:59 +1100 Subject: [PATCH 3/6] Minor tweaks --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 15d47b33..e29314fc 100644 --- a/README.md +++ b/README.md @@ -37,6 +37,7 @@ A variety of searchable encryption techniques are available, including: - **Matching** - Equality or partial matches - **Ordering** - comparison operations using order revealing encryption - **Uniqueness** - enforcing unique constraints +- **Containment** - containment queries using structured encryption ### 1.1 What is encryption in use? @@ -138,8 +139,7 @@ cs_add_index(table_name text, column_name text, index_name text, cast_as text, o | column_name | Name of target column | Required | index_name | The index kind | Required. | cast_as | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` -| opts | Index options | Optional for `match` indexes (see below) - +| opts | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) ###### cast_as From 171b77865ccb5c97dccc467dfaaee5f4ee82201f Mon Sep 17 00:00:00 2001 From: CJ Brewer Date: Thu, 10 Oct 2024 08:24:47 -0600 Subject: [PATCH 4/6] Update README.md Co-authored-by: Lindsay Holmwood --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e29314fc..e7e66b89 100644 --- a/README.md +++ b/README.md @@ -209,7 +209,7 @@ In general, therefore, you should try to ensure that the string you search for i ###### ste_vec opts -An ste_vec index on a encrypted JSONB column enables the use of Postgres's `@>` and `<@` containment operators. +An ste_vec index on a encrypted JSONB column enables the use of PostgreSQL's `@>` and `<@` [containment operators](https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSONB-OP-TABLE). An ste_vec index requires one piece of configuration: the `prefix` (a string) which is functionally similar to a salt for the hashing process. From 95924bc3ae7d589e4b97ae332bc70f9e7681854b Mon Sep 17 00:00:00 2001 From: CJ Brewer Date: Thu, 10 Oct 2024 08:24:58 -0600 Subject: [PATCH 5/6] Update README.md Co-authored-by: Lindsay Holmwood --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e7e66b89..e06b7450 100644 --- a/README.md +++ b/README.md @@ -213,7 +213,9 @@ An ste_vec index on a encrypted JSONB column enables the use of PostgreSQL's `@> An ste_vec index requires one piece of configuration: the `prefix` (a string) which is functionally similar to a salt for the hashing process. -Within a dataset, encrypted columns indexed using an ste_vec that use different prefixes can never compare as equal and containment queries that manage to mix index terms from multiple columns will never return a positive result. This is by design. +Within a dataset, encrypted columns indexed using an ste_vec that use different prefixes can never compare as equal. +Containment queries that manage to mix index terms from multiple columns will never return a positive result. +This is by design. The index is generated from a JSONB document by first flattening the structure of the document such that a hash can be generated for each unique path prefix to a node. From ff847acae29afbc82054f6fba92fccb70793e88f Mon Sep 17 00:00:00 2001 From: CJ Brewer Date: Thu, 10 Oct 2024 08:25:11 -0600 Subject: [PATCH 6/6] Update README.md Co-authored-by: Lindsay Holmwood --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e06b7450..3db8a3ea 100644 --- a/README.md +++ b/README.md @@ -219,7 +219,8 @@ This is by design. The index is generated from a JSONB document by first flattening the structure of the document such that a hash can be generated for each unique path prefix to a node. -The complete set of JSON types is supported by the indexer. Null values are ignored by the indexer. +The complete set of JSON types is supported by the indexer. +Null values are ignored by the indexer. - Object `{ ... }` - Array `[ ... ]`