fix: persist GSI queue to PostgreSQL for crash safety by LeeroyHannigan · Pull Request #128 · ExtendDB/extenddb

LeeroyHannigan · 2026-05-25T16:06:39Z

What

Replaces the in-memory VecDeque GSI propagation queue with a PostgreSQL-backed persistent queue (gsi_pending table). Pending GSI updates are now inserted within the same transaction as the base table write and processed by workers that claim ready rows using DELETE ... RETURNING with FOR UPDATE SKIP LOCKED.

Key changes:

New gsi_pending table in the data database (migration 002_gsi_pending.sql)
enqueue_gsi_pending() inserts inside the base write transaction, zero crash window
Propagation delay enforced by ready_at timestamp, not by sleeping inside transactions
Workers never hold connections idle, they only touch rows whose delay has elapsed
Index metadata cached per table_id with 30s TTL to avoid repeated catalog queries

Why

Closes #125

The previous in-memory queue lost all pending GSI updates on process crash or restart, causing permanent GSI inconsistency with no recovery path. The only workaround was re-touching every item in the base table, effectively data loss at scale.

Testing done

cargo fmt --all -- --check — clean
cargo clippy --all-targets -- -D warnings — clean
cargo test --workspace — 375 tests pass
Verified gsi_pending table schema created correctly via \d gsi_pending
Existing test_gsi_async.py validates propagation delay behavior end-to-end

Checklist

I have read CONTRIBUTING.md
All tests pass (cargo test --workspace)
Code is formatted (cargo fmt --check)
Clippy is clean (cargo clippy -- -W clippy::pedantic)
I have added or updated tests for new functionality
I have updated documentation if behavior changed
Breaking changes are noted below (if any)

Breaking changes

None. The gsi_pending table is created automatically via the data migration on first init or server startup. Existing deployments gain crash safety transparently.

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache License 2.0 and I agree to the Developer Certificate of
Origin (DCO). See CONTRIBUTING.md for details.

Closes #125

jcshepherd · 2026-05-26T22:33:06Z

+             SELECT id FROM gsi_pending \
+             WHERE ready_at <= NOW() \
+             ORDER BY id \
+             LIMIT $1 \


Is LIMIT applied before or after ORDER BY?

I did a bit of reading ... It looks like Postgres will sort the entire resultset first, and then apply the limit. So if "ready_at" is far enough in the past on a busy table, the sort could be expensive, though the index on ready_at should help mitigate that. My guess is that BATCH_SIZE is small enough to discourage the planner from believing that a full table scan would be cheaper than an index scan.

jcshepherd · 2026-05-26T22:44:33Z

-                        pk_hash(pk_text.as_ref()),
-                        &key_info.account_id,
-                        &key_info.table_name,
+                if has_async_indexes(&indexes, sys_delay) {


This could probably be called once at the beginning (right after indexes is populated).

jcshepherd · 2026-05-26T22:47:59Z

+-- Inserted atomically within the base write transaction, consumed by
+-- background workers. Survives process crash/restart.
+
+CREATE TABLE IF NOT EXISTS gsi_pending (


I don't recall where it's buried, but somewhere there is a catalog version identifier that should be updated when the metadata/system table schema is updated, so that extenddb migrate will know that there is a migration to perform. I believe it's in storage-postgres somewhere.

jcshepherd

Couple initial questions/comments. Probably the one I'm most concerned with is the evaluation order of LIMIT and ORDER BY. If I were a gambler, I'd wager results are LIMITed before ORDER BY, which may not given you the ordering guarantees you want.

fix: persist GSI queue to PostgreSQL for crash safety

2d67591

Closes #125

jcshepherd reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: persist GSI queue to PostgreSQL for crash safety#128

fix: persist GSI queue to PostgreSQL for crash safety#128
LeeroyHannigan wants to merge 1 commit into
mainfrom
fix/persistent-gsi-queue

LeeroyHannigan commented May 25, 2026

Uh oh!

jcshepherd May 26, 2026

Uh oh!

jcshepherd May 27, 2026 •

edited

Loading

Uh oh!

jcshepherd May 26, 2026

Uh oh!

jcshepherd May 26, 2026

Uh oh!

jcshepherd left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LeeroyHannigan commented May 25, 2026

What

Why

Testing done

Checklist

Breaking changes

Uh oh!

jcshepherd May 26, 2026

Choose a reason for hiding this comment

Uh oh!

jcshepherd May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcshepherd May 26, 2026

Choose a reason for hiding this comment

Uh oh!

jcshepherd May 26, 2026

Choose a reason for hiding this comment

Uh oh!

jcshepherd left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jcshepherd May 27, 2026 •

edited

Loading