testdrive: Fix flaky savings check in expected_group_size_tuning#36791
Merged
def- merged 1 commit intoMay 30, 2026
Conversation
The `SELECT COUNT(savings > 0) = <num advice rows>` assertions in expected_group_size_tuning.td were flaky, failing intermittently in nightly runs with `expected [["6"]] got [["0"]]` (e.g. https://buildkite.com/materialize/nightly/builds/16627, observed repeatedly on main). Root cause: `mz_expected_group_size_advice.savings` is `SUM` of the arrangement heap sizes from `mz_arrangement_sizes.size`. That column is fed by the `mz_arrangement_heap_size_raw` logging stream, which is independent from the `mz_arrangement_records_raw` stream behind the structural advice columns (levels/to_cut/hint) and is sampled asynchronously and best-effort. As a result the structural columns can already be correct while `size` -- and therefore `savings` -- is still NULL for the advice arrangements. Under the heavy-load `--replicas=4 --slow` nightly variant the size accounting did not always converge within the (already bumped, see MaterializeInc#35060) timeout, leaving all `savings` NULL. Note also that `COUNT(savings > 0)` counts rows where `savings > 0` is non-NULL, i.e. it really only required `size` to be logged for every advice arrangement, not that the saving was actually positive -- so the assertion was both flaky and weaker than its comment implied. Replace it with the deterministic, NULL-tolerant invariant that no advice entry ever reports a non-positive saving (`SELECT COUNT(*) ... WHERE savings <= 0` = 0). Rows whose size has not been logged yet have NULL savings and are excluded by the predicate, so the check no longer races the best-effort size logging, while still catching a genuine regression that computed a zero/negative saving. The exhaustive structural assertions above continue to cover the advice logic (including the `cuts` CTE that produces `savings`). Modifies test: test/testdrive/expected_group_size_tuning.td Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Flake seen in https://buildkite.com/materialize/nightly/builds/16627
Test run: https://buildkite.com/materialize/nightly/builds/16634