Skip to content

testdrive: Fix flaky savings check in expected_group_size_tuning#36791

Merged
def- merged 1 commit into
MaterializeInc:mainfrom
def-:fix-flaky-expected-group-size-tuning
May 30, 2026
Merged

testdrive: Fix flaky savings check in expected_group_size_tuning#36791
def- merged 1 commit into
MaterializeInc:mainfrom
def-:fix-flaky-expected-group-size-tuning

Conversation

@def-
Copy link
Copy Markdown
Contributor

@def- def- commented May 29, 2026

The `SELECT COUNT(savings > 0) = <num advice rows>` assertions in
expected_group_size_tuning.td were flaky, failing intermittently in
nightly runs with `expected [["6"]] got [["0"]]` (e.g.
https://buildkite.com/materialize/nightly/builds/16627, observed
repeatedly on main).

Root cause: `mz_expected_group_size_advice.savings` is `SUM` of the
arrangement heap sizes from `mz_arrangement_sizes.size`. That column is
fed by the `mz_arrangement_heap_size_raw` logging stream, which is
independent from the `mz_arrangement_records_raw` stream behind the
structural advice columns (levels/to_cut/hint) and is sampled
asynchronously and best-effort. As a result the structural columns can
already be correct while `size` -- and therefore `savings` -- is still
NULL for the advice arrangements. Under the heavy-load `--replicas=4
--slow` nightly variant the size accounting did not always converge
within the (already bumped, see MaterializeInc#35060) timeout, leaving all `savings`
NULL.

Note also that `COUNT(savings > 0)` counts rows where `savings > 0` is
non-NULL, i.e. it really only required `size` to be logged for every
advice arrangement, not that the saving was actually positive -- so the
assertion was both flaky and weaker than its comment implied.

Replace it with the deterministic, NULL-tolerant invariant that no
advice entry ever reports a non-positive saving
(`SELECT COUNT(*) ... WHERE savings <= 0` = 0). Rows whose size has not
been logged yet have NULL savings and are excluded by the predicate, so
the check no longer races the best-effort size logging, while still
catching a genuine regression that computed a zero/negative saving. The
exhaustive structural assertions above continue to cover the advice
logic (including the `cuts` CTE that produces `savings`).

Modifies test: test/testdrive/expected_group_size_tuning.td

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@def- def- requested a review from bosconi May 29, 2026 15:16
@def- def- enabled auto-merge (squash) May 30, 2026 00:51
Copy link
Copy Markdown
Member

@bosconi bosconi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@def- def- merged commit 1b9cf39 into MaterializeInc:main May 30, 2026
45 checks passed
@def- def- deleted the fix-flaky-expected-group-size-tuning branch May 30, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants