Repeat recently modified tests with different randomized settings by alexey-milovidov · Pull Request #100385 · ClickHouse/ClickHouse

alexey-milovidov · 2026-03-22T19:02:50Z

When randomized settings are enabled in CI, the functional test runner now repeats recently modified tests multiple times with different random settings. Tests are ranked by git modification time (newest first), and each test at rank r gets int(sqrt(100 / r)) repetitions:

Rank 1 (newest): 10 runs
Rank 2: 7 runs
Rank 3-4: 5 runs
Rank 5-6: 4 runs
Rank 7-11: 3 runs
Rank 12-25: 2 runs
Rank 26+: 1 run (no extra repeats)

This adds ~78 extra test runs per suite, helping detect new flaky tests earlier before they accumulate failures across many CI runs.

The feature is enabled via --repeat-newly-modified-tests flag and is automatically activated in CI for all functional test jobs that use randomized settings (i.e., not flaky check, not coverage, not azure).

Changelog category (leave one):

Build/Testing/Packaging Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Repeat recently modified tests with different randomized settings.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

When randomized settings are enabled, the functional test runner can now repeat recently modified tests multiple times with different random settings. Tests are ranked by git modification time (newest first), and each test at rank r gets int(sqrt(100 / r)) repetitions: the newest test runs 10 times, the 25th most recent runs twice, older tests run once as before. This helps detect flaky tests earlier, before they accumulate failures in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-03-22T19:03:29Z

Workflow [PR], commit [1649fed]

Summary: ✅

AI Review

Summary

This PR adds --repeat-newly-modified-tests to functional CI runs with randomized settings and updates clickhouse-test to increase repetitions for higher-ranked tests. On current PR HEAD, I did not find new high-confidence correctness, safety, performance, or ClickHouse-rule violations beyond already-existing review threads.

Missing context

⚠️ No CI reports/log links were provided in the review request, so this review is based on code and PR metadata only.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅

Final Verdict

Status: ✅ Approve

clickhouse-gh · 2026-03-22T19:11:21Z

        os.environ["LLVM_PROFILE_FILE"] = f"ft-{batch_num}-%2m.profraw"

+    if (
+        not is_flaky_check


BugfixValidation forces GLOBAL_TAGS=no-random-settings (see below in this file), so random settings are effectively disabled for this job type. But this condition does not exclude BugfixValidation, so we still append --repeat-newly-modified-tests and then pay for git log scanning in clickhouse-test even though there are no randomized runs to diversify.

Could you also gate this by not is_bugfix_validation (or equivalently check for GLOBAL_TAGS / explicit random-settings enablement) to keep behavior consistent with the "only when randomized settings are enabled" intent?

`BugfixValidation` sets `GLOBAL_TAGS=no-random-settings` but does not add `--no-random-settings` to `runner_options`, so the existing check did not catch it. Gate explicitly on `not is_bugfix_validation`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add `-n 1000` to `git log` to avoid timeouts in treeless clones (CI uses `--filter=tree:0` when unshallowing) - Increase timeout from 30s to 60s - Add diagnostic output for failures and per-test repetition counts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CI runs tests as a different user than the repo owner, causing `git log` to fail with "detected dubious ownership in repository". Add `git config --global --add safe.directory` before the `git log` call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the `git log`-based ranking with simple reverse name sorting. Since test names have numeric prefixes, sorting by name descending naturally puts the newest tests first, without needing git access. This avoids the "dubious ownership" errors that prevented the feature from working in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

maxknv · 2026-03-23T13:38:51Z

Fail Rate by Test Number Range (last 30 days, amd_binary)

Range	Fails	Total	Fail Rate
00xxx	869	4.8M	0.018%
01xxx	2,377	7.0M	0.034%
02xxx	728	10.2M	0.007%
03xxx	2,673	10.9M	0.025%
04xxx	3,843	214K	1.794%

The 04xxx group fails at ~100x the rate of 02xxx/03xxx.

Key observations

The pattern is stark: 04xxx tests are ~53–100× more flaky than older test ranges.
Many 04xxx tests fail 100% of the time — e.g., 04007_wasm_mixed_types (100 runs, all FAIL),
04036_refresh_every_depends_on_double_fire (99 runs, all FAIL), 04039_variant_element_shared_discriminators (50 runs, all FAIL).
These aren't "flaky" — they're always failing, likely tests for features not yet enabled in the standard binary.
The lower total run count for 04xxx (214K vs 10M+) reflects that these tests are newer and have fewer accumulated runs.
Within 03xxx, the higher-numbered tests like 03835, 03914, 03927, 03960 also dominate the flaky list.

So yes — PR #100385's approach of adjusting repetition based on test number is well-supported by this data. Newer tests (higher
numbers) are dramatically more flaky, so giving them more retries/repetitions makes sense to reduce false CI failures.

maxknv · 2026-03-23T13:43:56Z

I'd suggest disabling retries also for args.test is not None (local run case)

…lickHouse into repeat-newly-modified-tests

When `args.test` is set (local run case), skip the repeat-newly-modified-tests flag to avoid unnecessary repetitions of explicitly selected tests. #100385 (comment) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When `--repeat-newly-modified-tests` duplicates tests in the parallel list, multiple workers run the same test concurrently, all writing to the same `.stdout` file. The `file_suffix` (PID-based) was only set when `args.test_runs > 1`, but the new feature duplicates tests in the list without increasing `test_runs`, causing `FileNotFoundError` when one worker's cleanup removes the file while another calls `replace_in_file`. Fix by always using PID suffix in concurrent mode, and also excluding targeted checks from `--repeat-newly-modified-tests` (targeted checks already have their own rerun logic). #100385 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh Bot added the pr-build Pull request with build/testing/packaging improvement label Mar 22, 2026

clickhouse-gh Bot reviewed Mar 22, 2026

View reviewed changes

alexey-milovidov requested a review from maxknv March 22, 2026 22:32

clickhouse-gh Bot reviewed Mar 22, 2026

View reviewed changes

Comment thread tests/clickhouse-test Outdated

clickhouse-gh Bot reviewed Mar 22, 2026

View reviewed changes

Comment thread tests/clickhouse-test

clickhouse-gh Bot reviewed Mar 22, 2026

View reviewed changes

Comment thread tests/clickhouse-test Outdated

clickhouse-gh Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread tests/clickhouse-test

maxknv approved these changes Mar 23, 2026

View reviewed changes

clickhouse-gh Bot assigned maxknv Mar 23, 2026

alexey-milovidov and others added 4 commits March 24, 2026 01:00

Merge branch 'master' into repeat-newly-modified-tests

767b0a4

Merge branch 'repeat-newly-modified-tests' of github.com:ClickHouse/C…

b18e862

…lickHouse into repeat-newly-modified-tests

alexey-milovidov merged commit 2c40d32 into master Mar 24, 2026
151 of 152 checks passed

alexey-milovidov deleted the repeat-newly-modified-tests branch March 24, 2026 15:22

robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 24, 2026

Algunenano mentioned this pull request Mar 26, 2026

Flaky test: 03706_statistics_preserve_checksums_on_mutations #100786

Closed

alexey-milovidov mentioned this pull request Apr 8, 2026

Fix flaky test 01165_lost_part_empty_partition #100434

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeat recently modified tests with different randomized settings#100385

Repeat recently modified tests with different randomized settings#100385
alexey-milovidov merged 9 commits intomasterfrom
repeat-newly-modified-tests

alexey-milovidov commented Mar 22, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot Mar 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxknv commented Mar 23, 2026 •

edited

Loading

Uh oh!

maxknv commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexey-milovidov commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Missing context

ClickHouse Rules

Final Verdict

Uh oh!

clickhouse-gh Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxknv commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxknv commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexey-milovidov commented Mar 22, 2026 •

edited

Loading

clickhouse-gh Bot commented Mar 22, 2026 •

edited

Loading

maxknv commented Mar 23, 2026 •

edited

Loading