Pipelined 2PC with durability barrier#4728
Merged
aasoni merged 5 commits intojdetter/tpccfrom Mar 30, 2026
Merged
Conversation
Split the 2PC protocol into two consecutive rounds: Round 1 (Memory): commit to in-memory datastore, release lock immediately. Round 2 (Persist): persist to disk without holding locks. Key changes: Participant (B): - Signal PREPARED immediately (no disk I/O before responding) - On COMMIT: flush st_2pc_state marker (reducer inputs only) as PREPARE PERSIST, set durability barrier, then commit reducer changes to memory. The barrier defers the reducer's TxData from reaching disk. - After PREPARE PERSIST is durable: signal PREPARED_TO_PERSIST - After COMMIT_PERSIST: clear barrier (reducer changes flush as COMMIT PERSIST), wait for durability, delete marker. Coordinator (A): - Set durability barrier before commit (coordinator log deferred from disk) - Send COMMIT without waiting for durability - Wait for PREPARED_TO_PERSIST from all participants - Clear barrier (coordinator log flushes), wait for durability - Send COMMIT_PERSIST to participants Durability barrier: - Transactions at or below the barrier offset pass through to the durability worker; transactions above are accumulated in a pending list. - Supports multiple concurrent barriers (BTreeSet of active offsets). - clear_durability_barrier flushes pending up to the new minimum. - abort_durability_barrier discards all pending (pipeline flush on abort). New HTTP endpoints: - POST /2pc/prepared-to-persist/:prepare_id (B notifies A) - POST /2pc/commit-persist/:prepare_id (A tells B to finalize) PreparedTransactions moved from ModuleHost to ReplicaContext so both actor code and HTTP handlers can access it. Added Round 2 channels (commit_persist_sender) and coordinator-side persist waiters. Recovery uses Shub's approach: re-run reducer from stored inputs in st_2pc_state, query coordinator for decision. TODOs: - Trigger module restart on persistence abort to flush tainted in-memory state - Retry limit for send_prepared_to_persist_to_coordinator - Handle prepare_id mismatch in recovery (new vs original ID)
When Round 2 of pipelined 2PC aborts (participant or coordinator), abort_durability_barrier discards all deferred transactions, then panic triggers module restart via the existing on_panic/defer_on_unwind mechanism. On next access, the module is re-created from the commitlog, which does not contain the tainted data.
cloutiertyler
commented
Mar 30, 2026
| let auth_token = replica_ctx.call_reducer_auth_token.clone(); | ||
| { | ||
| let handle = tokio::runtime::Handle::current(); | ||
| block_on_scoped(&handle, send_prepared_to_persist_to_coordinator( |
Contributor
Author
There was a problem hiding this comment.
Shouldn't be blocking here, because I'm going to lock up the whole thread.
cloutiertyler
commented
Mar 30, 2026
| // Step 4: wait for coordinator's decision (B never aborts on its own). | ||
| let commit = Self::wait_for_2pc_decision(decision_rx, &prepare_id, coordinator_identity, &replica_ctx); | ||
| // Step 10: wait for COMMIT_PERSIST from coordinator. | ||
| let persist_commit = Self::wait_for_commit_persist( |
Contributor
Author
There was a problem hiding this comment.
Also should not be blocking.
cloutiertyler
commented
Mar 30, 2026
| // Without this, A could delete its coordinator log entry while B's commit | ||
| // is still in-memory — a B crash at that point would leave the tx uncommitted | ||
| // with no way to recover (A has already forgotten it committed). | ||
| // Step 12: wait for COMMIT PERSIST durability (offset N+1 fsynced). |
Contributor
Author
There was a problem hiding this comment.
This should not also be blocking.
cloutiertyler
commented
Mar 30, 2026
| // ═══ WRITE LOCK RELEASED ═══════════════════════════════ | ||
| // ── Round 2: Persistence Commit ──────────────────────── | ||
|
|
||
| // Step 8: wait for PREPARE PERSIST durability (offset N fsynced). |
Contributor
Author
There was a problem hiding this comment.
This should not be blocking.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TODOs
Test plan