fix: add index on GSI base key columns for efficient propagation deletes#127
fix: add index on GSI base key columns for efficient propagation deletes#127LeeroyHannigan wants to merge 1 commit into
Conversation
|
this resolves #115 |
jcshepherd
left a comment
There was a problem hiding this comment.
I'm okay with the secondary index approach in ddl,rs. Happy to debate further if other folks have concerns or alternate proposals.
I actually think the auto-upgrade in lib.rs opens a can of worms. As a one-off for a 0.11 release: fine, whatever. I don't know that we want this lurking in the code forever though. And I suspect it's not going to be the last time we need to change something about the schema of a user-data table.
Should we be looking at a way to migrate user-data tables to new schema, just like we have for metadata tables? That way the user can pick when they want the migration to happen (because they perform it using the CLI), and also more easily see what is going to change.
Any thoughts trying that approach?
What
Add a B-tree index on (base_pk, base_sk_*) to every GSI/LSI table, both at creation time and retroactively on server startup for existing tables.
Why
Closes #115, Closes #124
When an item with GSI keys is updated, ExtendDB deletes the old GSI row using WHERE base_pk = $1 AND base_sk_s = $2. The existing indexes on GSI tables lead with the GSI partition key (pk), so PostgreSQL cannot use them for this lookup, it falls back to a sequential scan of the entire GSI table. At 20GB this produces 730ms median latency; at 200GB it times out entirely.
Testing done
Checklist
Breaking changes
None. The change is purely additive, an extra index on existing columns. No schema, wire protocol, or behavioral changes.
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache License 2.0 and I agree to the Developer Certificate of
Origin (DCO). See CONTRIBUTING.md for details.