Skip to content
Merged
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -215,5 +215,9 @@ eql--*.sql
# Generated SQLx migration (built from src/, never commit)
tests/sqlx/migrations/001_install_eql.sql

# Large generated test data files
tests/ste_vec_vast.sql
tests/ste_vec_*M.sql*

# Rust build artifacts (using sccache)
tests/sqlx/target/
86 changes: 85 additions & 1 deletion docs/reference/database-indexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ EQL supports PostgreSQL B-tree indexes on `eql_v2_encrypted` columns to improve
- [Query Patterns That Don't Use Indexes](#query-patterns-that-dont-use-indexes)
- [Index Limitations](#index-limitations)
- [Best Practices](#best-practices)
- [GIN Indexes for JSONB Containment](#gin-indexes-for-jsonb-containment)

---

Expand Down Expand Up @@ -217,7 +218,7 @@ B-tree indexes **only work** with:

They **do not work** with:
- `bf` (bloom_filter) - pattern matching
- `sv` (ste_vec) - JSONB containment
- Data with `sv` field (ste_vec) - JSONB containment uses GIN indexes instead (see [GIN Indexes](#gin-indexes-for-jsonb-containment))
- Data without any index terms

### 2. Index Creation Timing
Expand Down Expand Up @@ -326,6 +327,89 @@ DROP INDEX IF EXISTS idx_users_encrypted_email;

---

## GIN Indexes for JSONB Containment

While B-tree indexes don't support `ste_vec` (JSONB containment), you can use PostgreSQL GIN indexes for efficient containment queries on encrypted JSONB columns.

### When to Use GIN Indexes

Use GIN indexes when:
- You need to perform JSONB containment queries (`@>`, `<@`)
- The table has a significant number of rows (500+ recommended)
- Query performance on containment operations is important

### Creating a GIN Index

Create a GIN index using the `jsonb_array()` function, which extracts the encrypted JSONB as a native `jsonb[]` array:

```sql
CREATE INDEX idx_encrypted_jsonb_gin
ON table_name USING GIN (eql_v2.jsonb_array(encrypted_column));

ANALYZE table_name;
```

**Important:** Always run `ANALYZE` after creating the index so PostgreSQL's query planner has accurate statistics.

### Query Patterns for GIN Indexes

There are two approaches to write containment queries that use GIN indexes:

#### Approach 1: Using jsonb_array() Function

Convert both sides to `jsonb[]` and use the native containment operator:

```sql
SELECT * FROM table_name
WHERE eql_v2.jsonb_array(encrypted_column) @>
eql_v2.jsonb_array($1::eql_v2_encrypted);
```

#### Approach 2: Using Helper Function

Use the convenience function which handles the conversion internally:

```sql
SELECT * FROM table_name
WHERE eql_v2.jsonb_contains(encrypted_column, $1::eql_v2_encrypted);
```

Both approaches produce the same result and use the GIN index.

### Verifying Index Usage

Use `EXPLAIN` to verify the GIN index is being used:

```sql
EXPLAIN SELECT * FROM table_name
WHERE eql_v2.jsonb_array(encrypted_column) @>
eql_v2.jsonb_array($1::eql_v2_encrypted);
```

**Expected output:**
```
Bitmap Heap Scan on table_name
Recheck Cond: (jsonb_array(encrypted_column) @> jsonb_array(...))
-> Bitmap Index Scan on idx_encrypted_jsonb_gin
Index Cond: (jsonb_array(encrypted_column) @> jsonb_array(...))
```

If you see `Seq Scan`, ensure:
1. The index exists
2. `ANALYZE` has been run
3. The table has enough rows (PostgreSQL may choose sequential scan for very small tables)

### GIN vs B-tree Index Comparison

| Feature | B-tree Index | GIN Index |
|---------|-------------|-----------|
| **Use case** | Equality, range queries | JSONB containment |
| **Index terms** | `hm`, `b3`, `ob` | `sv` (via jsonb_array) |
| **Operators** | `=`, `<`, `>`, `<=`, `>=` | `@>`, `<@` |
| **Function** | Direct column reference | `eql_v2.jsonb_array()` |

---

## Troubleshooting

### Index Not Being Used
Expand Down
76 changes: 60 additions & 16 deletions docs/reference/index-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@
> If you are using Protect.js, see the [Protect.js schema](https://github.com/cipherstash/protectjs/blob/main/docs/reference/schema.md).

The following functions allow you to configure indexes for encrypted columns.
All these functions modify the `eql_v2_configuration` table in your database, and are added during the EQL installation.
All these functions modify the `public.eql_v2_configuration` table in your database, and are added during the EQL installation.

> **IMPORTANT:** When you modify or add search configuration index, you must re-encrypt data that's already been stored in the database.
> The CipherStash encryption solution will encrypt the data based on the current state of the configuration.

### Configuring search (`eql_v2.add_search_config`)

Add an index to an encrypted column.
Add an index to an encrypted column. Returns the updated configuration as JSONB.

```sql
SELECT eql_v2.add_search_config(
Expand All @@ -31,6 +31,7 @@ SELECT eql_v2.add_search_config(
| `index_name` | The index kind | Required |
| `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` |
| `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) |
| `migrating` | Skip auto-migration if true | Optional. Defaults to `false`. Set to `true` for batch operations |

#### Option (`cast_as`)

Expand Down Expand Up @@ -60,33 +61,33 @@ The default match index options are:
"tokenizer": {
"kind": "ngram",
"token_length": 3
}
"token_filters": {
"kind": "downcase"
}
},
"token_filters": [
{"kind": "downcase"}
]
}
```

- `tokenFilters`: a list of filters to apply to normalize tokens before indexing.
- `token_filters`: a list of filters to apply to normalize tokens before indexing.
- `tokenizer`: determines how input text is split into tokens.
- `m`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
- `bf`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
- `k`: The maximum number of bits set in the bloom filter per term. Defaults to `6`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading about Bloom filters, k is usually described as the number of hash functions (each of which will produce a hash that maps to one bit).

Not a hill I'll die on, a minor nit really.

Might not hurt to link to a Bloom filter calculator though, so that users can set the config to appropriate values.

e.g. https://di-mgt.com.au/bloom-calculator.html


**Token filters**

There are currently only two token filters available: `downcase` and `upcase`. These are used to normalise the text before indexing and are also applied to query terms. An empty array can also be passed to `tokenFilters` if no normalisation of terms is required.
The `downcase` token filter is available to normalise text before indexing and is also applied to query terms. An empty array can also be passed to `token_filters` if no normalisation of terms is required.

**Tokenizer**

There are two `tokenizer`s provided: `standard` and `ngram`.
`standard` simply splits text into tokens using this regular expression: `/[ ,;:!]/`.
`ngram` splits the text into n-grams and accepts a configuration object that allows you to specify the `tokenLength`.

**m** and **k**
**bf** and **k**

`k` and `m` are optional fields for configuring [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) that back full text search.
`k` and `bf` are optional fields for configuring [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) that back full text search.

`m` is the size of the bloom filter in bits. `filterSize` must be a power of 2 between `32` and `65536` and defaults to `2048`.
`bf` is the size of the bloom filter in bits. It must be a power of 2 between `32` and `65536` and defaults to `2048`.

`k` is the number of hash functions to use per term.
This determines the maximum number of bits that will be set in the bloom filter per term.
Expand All @@ -103,7 +104,9 @@ Try to ensure that the string you search for is at least as long as the `tokenLe

#### Options for ste_vec indexes (`opts`)

An ste_vec index on a encrypted JSONB column enables the use of PostgreSQL's `@>` and `<@` [containment operators](https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSONB-OP-TABLE).
An ste_vec index on an encrypted JSONB column enables the use of PostgreSQL's `@>` and `<@` [containment operators](https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSONB-OP-TABLE).

> **Note:** The `@>` and `<@` operators work directly on `eql_v2_encrypted` types, allowing simple query syntax like `encrypted_col @> search_term`.

An ste_vec index requires one piece of configuration: the `prefix` (a string) which is passed as an info string to a MAC (Message Authenticated Code).
This ensures that all of the encrypted values are unique to that prefix.
Expand Down Expand Up @@ -204,7 +207,7 @@ A query prior to encrypting and indexing looks like a structurally similar subse
}
```

The expression `cs_ste_vec_v2(encrypted_account) @> cs_ste_vec_v2($query)` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "alice@example.com".
The expression `encrypted_account @> $query` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "alice@example.com".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of an equivalent plaintext Postgres query would be illustrative.


When reduced to a prefix list, it would look like this:

Expand All @@ -224,9 +227,26 @@ When reduced to a prefix list, it would look like this:

Which is then turned into an ste_vec of hashes which can be directly queries against the index.

#### GIN indexing for ste_vec

For efficient containment queries on large tables, you can create a GIN index using the `eql_v2.jsonb_array()` function:

```sql
-- Create GIN index for containment queries
CREATE INDEX idx_encrypted_jsonb ON mytable USING GIN (eql_v2.jsonb_array(encrypted_col));

-- Query using containment (will use the GIN index)
SELECT * FROM mytable WHERE encrypted_col @> $1::eql_v2_encrypted;
```

The following helper functions are available for GIN-indexed containment queries:
- `eql_v2.jsonb_array(val)` - Extracts encrypted JSONB as an array for GIN indexing
- `eql_v2.jsonb_contains(a, b)` - GIN-indexable containment check (`a @> b`)
- `eql_v2.jsonb_contained_by(a, b)` - GIN-indexable "is contained by" check (`a <@ b`)
Comment on lines +243 to +245
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to review how the mapper handles the transformation and ensure that it invokes the correct EQL functions.


### Modifying an index (`eql_v2.modify_search_config`)

Modifies an existing index configuration.
Modifies an existing index configuration. Returns the updated configuration as JSONB.
Accepts the same parameters as `eql_v2.add_search_config`

```sql
Expand All @@ -240,9 +260,22 @@ SELECT eql_v2.modify_search_config(
);
```

**Example:**

```sql
-- Update match index options to increase bloom filter size
SELECT eql_v2.modify_search_config(
'users',
'email',
'match',
'text',
'{"bf": 4096, "k": 8}'::jsonb
);
```

### Removing an index (`eql_v2.remove_search_config`)

Removes an index configuration from the column.
Removes an index configuration from the column. Returns the updated configuration as JSONB.

```sql
SELECT eql_v2.remove_search_config(
Expand All @@ -253,6 +286,17 @@ SELECT eql_v2.remove_search_config(
);
```

**Example:**

```sql
-- Remove the match index from the email column
SELECT eql_v2.remove_search_config(
'users',
'email',
'match'
);
```

---

### Didn't find what you wanted?
Expand Down
44 changes: 43 additions & 1 deletion docs/reference/json-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,24 @@ WHERE jsonb_column @> '{"top":{"nested":["a"]}}';

**Note:** The `@>` operator checks if the left value contains the right value. The `<@` operator checks the reverse (if left is contained in right).

#### Indexed Containment Queries

For better performance on large tables, create a GIN index and use the `jsonb_array()` function:

```sql
-- Create GIN index
CREATE INDEX idx_encrypted_jsonb_gin
ON examples USING GIN (eql_v2.jsonb_array(encrypted_json));
ANALYZE examples;

-- Query using the GIN index
SELECT * FROM examples
WHERE eql_v2.jsonb_array(encrypted_json) @>
eql_v2.jsonb_array($1::eql_v2_encrypted);
```

See [GIN Indexes for JSONB Containment](./database-indexes.md#gin-indexes-for-jsonb-containment) for complete setup instructions.

### Field extraction (`jsonb_path_query`)

Extract fields from encrypted JSONB using selector hashes. Selectors are generated during encryption and identify specific JSON paths.
Expand Down Expand Up @@ -138,10 +156,15 @@ Use standard PostgreSQL JSON operators on encrypted columns:
-- Extract field by selector (returns eql_v2_encrypted)
SELECT encrypted_json->'selector_hash' FROM examples;

-- Extract field as text (returns ciphertext)
-- Extract field as text (returns encrypted value as text)
SELECT encrypted_json->>'selector_hash' FROM examples;

-- Extract array element by index (0-based, returns eql_v2_encrypted)
SELECT encrypted_array->0 FROM examples;
```

**Note:** The `->` operator supports integer array indexing (e.g., `encrypted_array->0`), but the `->>` operator does not. Use `->` to access array elements by index.

### Array operations

EQL supports array operations on encrypted JSONB arrays:
Expand Down Expand Up @@ -200,6 +223,9 @@ GROUP BY eql_v2.jsonb_path_query_first(encrypted_json, 'color_selector');

### Core Functions

- **`eql_v2.ste_vec(val jsonb) RETURNS eql_v2_encrypted[]`**
- Extracts the ste_vec index array from a JSONB payload

- **`eql_v2.ste_vec(val eql_v2_encrypted) RETURNS eql_v2_encrypted[]`**
- Extracts the ste_vec index array from an encrypted value

Expand Down Expand Up @@ -243,6 +269,22 @@ GROUP BY eql_v2.jsonb_path_query_first(encrypted_json, 'color_selector');
- **`eql_v2.selector(val eql_v2_encrypted) RETURNS text`**
- Extracts the selector hash from an encrypted value

### GIN-Indexable Functions

These functions enable efficient GIN-indexed containment queries. See [GIN Indexes for JSONB Containment](./database-indexes.md#gin-indexes-for-jsonb-containment) for index setup.

- **`eql_v2.jsonb_array(val eql_v2_encrypted) RETURNS jsonb[]`**
- Extracts encrypted JSONB as native PostgreSQL jsonb array for GIN indexing
- Create GIN indexes on this function for indexed containment queries

- **`eql_v2.jsonb_contains(a eql_v2_encrypted, b eql_v2_encrypted) RETURNS boolean`**
- GIN-indexed containment check: returns true if a contains b
- Alternative to `jsonb_array(a) @> jsonb_array(b)`

- **`eql_v2.jsonb_contained_by(a eql_v2_encrypted, b eql_v2_encrypted) RETURNS boolean`**
- GIN-indexed reverse containment: returns true if a is contained by b
- Alternative to `jsonb_array(a) <@ jsonb_array(b)`

### Aggregate Functions

- **`eql_v2.grouped_value(jsonb) RETURNS jsonb`**
Expand Down
Loading