fix: Use consistent snapshot for ExportTableToPointInTime#80
Conversation
ExportTableToPointInTime previously ran a paginated scan loop without a transaction, so items written or deleted between pages could alter the final export. The resulting file was not a consistent snapshot of the table. With this change, the ExportTableToPointInTime full table scan will now be run inside a REPEATABLE READ READ ONLY transaction, guaranteeing every page reflects the table state at the time of the first read. Closes #57
jcshepherd
left a comment
There was a problem hiding this comment.
My overarching concern with this is that on a large table (e.g. 1M or 1B items) with a lot of concurrent write activity, maintaining read isolation for a full table scan within a single transaction may lead to a large amount of storage space being consumed for item versions created after the transaction starts, which vacuum won't be able to reclaim space for until the transaction ends. This is potentially many GBs of additional storage consumed by a large, hot table that export is running on. It looks like the check for the item count happens outside the transaction (correct me if I'm wrong), so even with an export limit, it'll still do a full table scan. A partial mitigation would be to move the item count limit check inside the transaction and end the operation early if it's reached. Is that feasible?
… count check inside snapshot transaction
Yeah that's a good call. Made that change in the latest commit to move the item count check into the on_page callback. Now when the count limit is exceeded, the callback would return a validation error, which short circuits scan_full_table_snapshot_impl without committing, and releases the REPEATABLE READ snapshot so vacuum can resume reclaiming space |
What
With this change, the ExportTableToPointInTime full table scan will now be run inside a REPEATABLE READ transaction, guaranteeing every page reflects the table state at the time of the first read.
Why
Closes #57
ExportTableToPointInTime previously ran a paginated scan loop without a transaction, so items written or deleted between pages could alter the final export. The resulting file was not a consistent snapshot of the table.
Testing done
Checklist
cargo test --workspace)cargo fmt --check)cargo clippy -- -W clippy::pedantic)By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache License 2.0 and I agree to the Developer Certificate of
Origin (DCO). See CONTRIBUTING.md for details.