testdrive: Fix flaky savings check in expected_group_size_tuning by def- · Pull Request #36791 · MaterializeInc/materialize

def- · 2026-05-29T15:14:26Z

Flake seen in https://buildkite.com/materialize/nightly/builds/16627

Test run: https://buildkite.com/materialize/nightly/builds/16634

The `SELECT COUNT(savings > 0) = <num advice rows>` assertions in expected_group_size_tuning.td were flaky, failing intermittently in nightly runs with `expected [["6"]] got [["0"]]` (e.g. https://buildkite.com/materialize/nightly/builds/16627, observed repeatedly on main). Root cause: `mz_expected_group_size_advice.savings` is `SUM` of the arrangement heap sizes from `mz_arrangement_sizes.size`. That column is fed by the `mz_arrangement_heap_size_raw` logging stream, which is independent from the `mz_arrangement_records_raw` stream behind the structural advice columns (levels/to_cut/hint) and is sampled asynchronously and best-effort. As a result the structural columns can already be correct while `size` -- and therefore `savings` -- is still NULL for the advice arrangements. Under the heavy-load `--replicas=4 --slow` nightly variant the size accounting did not always converge within the (already bumped, see MaterializeInc#35060) timeout, leaving all `savings` NULL. Note also that `COUNT(savings > 0)` counts rows where `savings > 0` is non-NULL, i.e. it really only required `size` to be logged for every advice arrangement, not that the saving was actually positive -- so the assertion was both flaky and weaker than its comment implied. Replace it with the deterministic, NULL-tolerant invariant that no advice entry ever reports a non-positive saving (`SELECT COUNT(*) ... WHERE savings <= 0` = 0). Rows whose size has not been logged yet have NULL savings and are excluded by the predicate, so the check no longer races the best-effort size logging, while still catching a genuine regression that computed a zero/negative saving. The exhaustive structural assertions above continue to cover the advice logic (including the `cuts` CTE that produces `savings`). Modifies test: test/testdrive/expected_group_size_tuning.td Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

bosconi

Looks good.

def- requested a review from bosconi May 29, 2026 15:16

def- enabled auto-merge (squash) May 30, 2026 00:51

bosconi approved these changes May 30, 2026

View reviewed changes

def- merged commit 1b9cf39 into MaterializeInc:main May 30, 2026
45 checks passed

def- deleted the fix-flaky-expected-group-size-tuning branch May 30, 2026 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testdrive: Fix flaky savings check in expected_group_size_tuning#36791

testdrive: Fix flaky savings check in expected_group_size_tuning#36791
def- merged 1 commit into
MaterializeInc:mainfrom
def-:fix-flaky-expected-group-size-tuning

def- commented May 29, 2026 •

edited

Loading

Uh oh!

bosconi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

def- commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bosconi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

def- commented May 29, 2026 •

edited

Loading