Skip to content

fix: add index on GSI base key columns for efficient propagation deletes#127

Open
LeeroyHannigan wants to merge 1 commit into
mainfrom
fix/gsi-base-key-index
Open

fix: add index on GSI base key columns for efficient propagation deletes#127
LeeroyHannigan wants to merge 1 commit into
mainfrom
fix/gsi-base-key-index

Conversation

@LeeroyHannigan
Copy link
Copy Markdown
Collaborator

@LeeroyHannigan LeeroyHannigan commented May 25, 2026

What

Add a B-tree index on (base_pk, base_sk_*) to every GSI/LSI table, both at creation time and retroactively on server startup for existing tables.

Why

Closes #115, Closes #124

When an item with GSI keys is updated, ExtendDB deletes the old GSI row using WHERE base_pk = $1 AND base_sk_s = $2. The existing indexes on GSI tables lead with the GSI partition key (pk), so PostgreSQL cannot use them for this lookup, it falls back to a sequential scan of the entire GSI table. At 20GB this produces 730ms median latency; at 200GB it times out entirely.

Testing done

  • cargo fmt --all -- --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test --workspace — 375 tests pass
  • Verified DDL path: created table with GSI, confirmed ddb_base_key_idx index on (base_pk, base_sk_s) via pg_indexes
  • Verified startup migration path: dropped the index manually, restarted server, confirmed ensure_gsi_base_key_indexes recreated it using CREATE INDEX CONCURRENTLY IF NOT EXISTS

Checklist

  • I have read CONTRIBUTING.md
  • All tests pass (cargo test --workspace)
  • Code is formatted (cargo fmt --check)
  • Clippy is clean (cargo clippy -- -W clippy::pedantic)
  • I have added or updated tests for new functionality
  • I have updated documentation if behavior changed
  • Breaking changes are noted below (if any)

Breaking changes

None. The change is purely additive, an extra index on existing columns. No schema, wire protocol, or behavioral changes.


By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache License 2.0 and I agree to the Developer Certificate of
Origin (DCO). See CONTRIBUTING.md for details.

@parsnips
Copy link
Copy Markdown

this resolves #115

Copy link
Copy Markdown
Collaborator

@jcshepherd jcshepherd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the secondary index approach in ddl,rs. Happy to debate further if other folks have concerns or alternate proposals.

I actually think the auto-upgrade in lib.rs opens a can of worms. As a one-off for a 0.11 release: fine, whatever. I don't know that we want this lurking in the code forever though. And I suspect it's not going to be the last time we need to change something about the schema of a user-data table.

Should we be looking at a way to migrate user-data tables to new schema, just like we have for metadata tables? That way the user can pick when they want the migration to happen (because they perform it using the CLI), and also more easily see what is going to change.

Any thoughts trying that approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Major Performance degradation with 20GB dataset. Timeouts with 200GB dataset. [Bug] GSI missing index for efficient propagation

3 participants