compute: move MV sink persist I/O off Timely thread#35328
compute: move MV sink persist I/O off Timely thread#35328antiguru wants to merge 6 commits intoMaterializeInc:mainfrom
Conversation
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
teskje
left a comment
There was a problem hiding this comment.
The intention checks out to me. I'll note that for me this makes it noticeably harder to follow the logic and be sure that it's still correct, so we should only make this change if we have measured performance improvements. Aiui, the thinking is that Timely activations are expensive enough to justify the overhead of the additional tokio tasks and channels. It'd be great to have some numbers!
f1e5e14 to
c5b5a72
Compare
Prepare shared infrastructure needed for converting async Timely operators to sync operators with Tokio tasks: * Add `StartSignal::into_send_future()` to extract a Send-safe future * Change `InternalCommandSender` to use `SyncActivator` (Send-safe) instead of `Activator` (thread-local) * Add `+ Send` bounds to `persist_source` params (`listen_sleep`, `start_signal`) so callers can pass them to Tokio tasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
bugbot run |
PR SummaryMedium Risk Overview Introduces Written by Cursor Bugbot for commit 4ceac22. This will update automatically on new commits. Configure here. |
7d1eb5d to
ecb3d35
Compare
Convert the MV sink's mint, write, and append operators from AsyncOperatorBuilder to OperatorBuilderRc (sync) with dedicated Tokio tasks for persist I/O. Key changes: * mint: Tokio task watches persist upper, operator manages capabilities * write: Tokio task owns WriteHandle and CorrectionBuffer, operator sends batches via channel * append: Tokio task performs compare_and_append, operator collects results * ArcActivator coalesces redundant cross-thread activations * CorrectionLogger tracks net batch/record/size counts and retracts outstanding logging state on drop * ChannelLogging bridges Correction events from Tokio task to Timely thread for introspection logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
I think there's an additional issue that is maybe addressed that the async operators would not only schedule themselves often, but would also yield quite happily. I believe we figured this was the most likely candidate for "why Frank's catalog server became permanently wedged". The operators do need to be designed to handle overload gracefully, and not fall permanently behind.
+1 on this. It may be that this is a recognized issue in a tracker somewhere, but if not definitely worth socializing what's up and why. |
Add `enable_compute_sync_mv_sink` dyncfg flag (default: false) to select between the existing async Timely operator implementation and the new sync+Tokio implementation of the MV sink. This enables gradual rollout and quick rollback without redeployment. The sync implementation is extracted into `materialized_view_v2.rs`. The async implementation is restored as the default code path, adapted to use `ChannelLogging`/`CorrectionLogger` instead of the removed `Logging` type. Dispatch happens at `persist_sink()` based on the flag value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the new dyncfg flag to FlipFlagsAction in parallel-workload and to get_variable_system_parameters in mzcompose for CI coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The maybe_mint_batch_description function was incorrectly rewritten instead of being copied from the sync commit. Restore the correct logic that uses persist_frontier as lower and desired_frontier as upper. Also add the missing cap_set.try_downgrade in the else branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Convert the MV sink's mint, write, and append operators from
AsyncOperatorBuildertoOperatorBuilderRc(sync) with dedicated Tokio tasks for persist I/O.This removes async overhead from the Timely scheduling loop while keeping persist latency isolated from dataflow scheduling.
Key changes:
SyncActivator; operator manages capabilities and batch description mintingWriteHandleandCorrectionBuffer; operator sends commands via channel;ArcActivatorcoalesces redundant cross-thread activationscompare_and_append; operator forwards descriptions, batches, and frontier updatesCorrectionLoggertracks net batch/record/size counts and retracts outstanding logging state on drop, fixing stale introspection rows after dataflow cancellationChannelLoggingbridgesCorrectionevents from Tokio task to Timely thread for introspection loggingabort_on_dropfor prompt cleanup on dataflow shutdownShares the infra commit (
compute/storage: add Send-safe infrastructure for sync Timely operators) with #35450.Rebase before merging to deduplicate.
🤖 Generated with Claude Code