compute: make correction buffer reads slice-proportional by antiguru · Pull Request #36898 · MaterializeInc/materialize

antiguru · 2026-06-04T09:05:06Z

Motivation

MVs using the v2 correction buffer can stall during hydration when the input has to catch up through many distinct timestamps.
CorrectionV2::updates_before restructures the entire buffer on every call — Rc-wrapping every chunk, merging to the upper, rebuilding remainders, and re-merging chains for the chain invariant — so each read costs O(total buffered chunks) instead of O(drained slice), and a catch-up through T timestamps pays O(T²/chunk_capacity).
A micro-benchmark reproduces this: at 65536 timestamps (one day of 1s ticks at 16 updates/tick) a stepwise drain takes 13.5s per worker against CorrectionV1's 0.47s, with the ratio growing in T (~T^1.8).

Description

Restructures CorrectionV2 so reads only do work proportional to the drained slice, keeping the public interface, metrics, introspection logging, and dyncfgs unchanged.
CorrectionV1 remains the fallback via enable_correction_v2. Rebased on top of #36577, so chunks use the columnar chunk region storage.

Times at or beyond the largest read upper live in a BucketChain of exponentially growing time-range buckets, each holding (time, data)-sorted chains under the chain invariant. Reads peel only the buckets below their upper; far-future updates (e.g. temporal-filter retractions) are never touched.
Chains split at the upper at chunk granularity (Chain::split_at_time), reusing whole chunks and copying at most one straddling chunk per chain.
Emitted-but-not-yet-cancelled updates live in a designated emitted chain outside any invariant, so reads never re-merge future chains. (We also evaluated a destructive take_before/insert_batch interface; it measured CPU-neutral because feedback cancellation forces the same merge either way, so the interface stays as-is.)
Times below the since are advanced with a hybrid strategy, decided by a bounded count of distinct stale times: few stale times — the steady state, since the previously emitted chain was written just before the since advanced past it — use the existing Cursor::advance_by run-splitting; many stale times (a since jump, e.g. a sink restarting with an old as-of) are collapsed in one sort-and-consolidate pass. The always-bulk variant regressed the ReplicaExpiration feature benchmark by 31%; with the hybrid it is 3.3% faster than the merge base.
BucketChain exposes bucket iterators for metrics reporting, and restore is invoked with bounded fuel per buffer operation — incomplete restoration is picked up by the next operation, so reads never stall the operator. Builds on the restore fast path from timely-util: avoid rebuilding well-formed bucket chains in restore #36897.

Benchmark results (stepwise drain with persist feedback, 16 updates per timestamp, medians, on top of #36577's columnar chunks):

T	v1	v2 before	v2 after
1024	5.2 ms	32 ms	~16 ms
4096	21 ms	104 ms	~64 ms
16384	85 ms	858 ms	270 ms

Scaling is now linear in T (was ~T^1.8); the since-jump scenario improves ~3× and inserts are unchanged. ("v2 before" measured pre-#36577; the columnar chunk storage adds a constant on this microbenchmark but does not change the asymptotics.)

Verification

New micro-benchmark cargo bench -p mz-compute --features bench --bench correction covering insert, stepwise drain with feedback, since jumps, and a temporal-filter pattern; the bench feature only gates pub visibility.
New tests assert emission equivalence with CorrectionV1 under a stepwise-drain-with-feedback workload, since-jump collapse onto the since, and that reads never observe times at or beyond their upper.
The ReplicaExpiration feature benchmark scenario was verified locally against the merge base (3.3% faster, less memory).

🤖 Generated with Claude Code

CorrectionV2's updates_before restructured the entire buffer on every call: it converted all chains to Rc-wrapped cursors, merged up to the upper, rebuilt the remainders, and re-merged chains to restore the chain invariant. Each read cost O(total buffered chunks) rather than O(drained slice), so an MV sink catching up through T distinct timestamps paid O(T^2 / chunk_capacity). At 65536 timestamps (a day of 1s ticks) a stepwise drain took 13.5s against CorrectionV1's 0.47s, matching observed hydration stalls. Restructure the buffer so reads only touch the drained slice: * A BucketChain partitions times at or beyond the largest read upper into buckets of exponentially growing time ranges, each holding chains maintained with the chain invariant. Reads peel only the buckets below their upper; far-future updates are left alone. * Chains split at the upper at chunk granularity (Chain::split_at_time) reusing whole chunks; at most one straddling chunk per chain is copied. The Rc round trip through cursors is gone. * Updates emitted by a read stay in a designated emitted chain, outside any invariant, until persist feedback cancels them. The previous read's emitted chain is merged with the newly drained updates, so future chains are never re-merged by reads. * A since jump across many distinct buffered timestamps is collapsed with one sort-and-consolidate pass over the affected updates instead of merging one cursor run per distinct stale time. This removes Cursor's limit/overwrite_ts machinery (advance_by, skip_time, set_limit, split_at_time). The public interface, metrics, introspection logging, and dyncfgs are unchanged; CorrectionV1 remains the fallback via enable_correction_v2. BucketChain::restore gains a fast path that skips rebuilding the bucket map when the chain is already well-formed, and BucketChain exposes bucket iterators for metrics reporting. A new micro-benchmark (cargo bench -p mz-compute --features bench --bench correction) drives both implementations through insert, stepwise-drain-with-feedback, and since-jump scenarios. The bench feature only gates pub visibility of the sink correction modules. Stepwise drain at 16384 timestamps drops from 858ms to 240ms and now scales linearly in the timestamp count (previously ~T^1.8); the since jump drops from 250ms to 87ms, below CorrectionV1's 104ms. New tests assert emission equivalence with CorrectionV1 under a stepwise-drain-with-feedback workload, since-jump collapse, and the upper-not-beyond-since edge case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

antiguru force-pushed the correction-v2-bucketed branch 6 times, most recently from 56b7242 to ac04452 Compare June 4, 2026 14:17

antiguru force-pushed the correction-v2-bucketed branch from ac04452 to feb4b09 Compare June 4, 2026 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute: make correction buffer reads slice-proportional#36898

compute: make correction buffer reads slice-proportional#36898
antiguru wants to merge 1 commit into
MaterializeInc:mainfrom
antiguru:correction-v2-bucketed

antiguru commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

antiguru commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

antiguru commented Jun 4, 2026 •

edited

Loading