refactor: improve execution pipeline and TPCC backends#313
Merged
Conversation
- add LMDB storage backend - restore scalar subqueries in WHERE to join-aware binding - enforce scalar subquery cardinality at execution time - return NULL for empty scalar subqueries and error on multi-row results
c9564fa to
d4063b0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
This branch is carrying a fairly large refactor, and the old PR body was no longer describing the actual review surface.
At a high level, this PR improves three areas together:
Issue link:
What is changed and how it works?
This PR is best understood as three related tracks.
1. Execution / storage pipeline refactor
The main effect of this part is lower overhead and cleaner ownership boundaries across planning, execution, and storage.
2. Binder / expression / normalization improvements
same_columnhelper on column refs and extends expression equality helpers so rules can safely ignore harmless column-ref slot differences.The important reviewer takeaway here is that this is not just code motion: it makes normalization and rebinding materially less brittle for real query shapes, especially around correlated and scalar-subquery cases.
3. TPCC backend split, runner, and docs refresh
balancedandpractical) so the benchmark matrix can compare multiple SQLite operating modes directly.scripts/run_tpcc_matrix.shto run the performance matrix in one shot and write timestamped raw logs plus a summary file undertpcc/results/<timestamp>/.history.h_datecollision pattern so a single backend can be rerun from a fresh database without restarting the full matrix.README.mdandtpcc/README.mdwith the latest measured results and documents the benchmark runner and the duplicate-key caveat.Latest 720s comparison currently documented in the branch:
53510 TpmC32248 TpmC36273 TpmC35516 TpmCCode changes
Check List
Tests
Manual test:
cargo build -p tpcc --releaseTPCC_DUPLICATE_RETRY=1 ./scripts/run_tpcc_matrix.shSide effects
Note for reviewer
This PR is large, but the changes cluster fairly cleanly. A good review order is:
Representative files for each area:
src/execution/**,src/storage/**,src/db.rssrc/binder/select.rs,src/expression/mod.rs,src/catalog/column.rs,src/optimizer/rule/normalization/**,src/optimizer/heuristic/**tpcc/src/backend/**,tpcc/src/main.rs,scripts/run_tpcc_matrix.sh,tpcc/README.md,README.md