Repeat recently modified tests with different randomized settings#100385
Repeat recently modified tests with different randomized settings#100385alexey-milovidov merged 9 commits intomasterfrom
Conversation
When randomized settings are enabled, the functional test runner can now repeat recently modified tests multiple times with different random settings. Tests are ranked by git modification time (newest first), and each test at rank r gets int(sqrt(100 / r)) repetitions: the newest test runs 10 times, the 25th most recent runs twice, older tests run once as before. This helps detect flaky tests earlier, before they accumulate failures in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [1649fed] Summary: ✅ AI ReviewSummaryThis PR adds Missing context
ClickHouse Rules
Final Verdict
|
| os.environ["LLVM_PROFILE_FILE"] = f"ft-{batch_num}-%2m.profraw" | ||
|
|
||
| if ( | ||
| not is_flaky_check |
There was a problem hiding this comment.
BugfixValidation forces GLOBAL_TAGS=no-random-settings (see below in this file), so random settings are effectively disabled for this job type. But this condition does not exclude BugfixValidation, so we still append --repeat-newly-modified-tests and then pay for git log scanning in clickhouse-test even though there are no randomized runs to diversify.
Could you also gate this by not is_bugfix_validation (or equivalently check for GLOBAL_TAGS / explicit random-settings enablement) to keep behavior consistent with the "only when randomized settings are enabled" intent?
`BugfixValidation` sets `GLOBAL_TAGS=no-random-settings` but does not add `--no-random-settings` to `runner_options`, so the existing check did not catch it. Gate explicitly on `not is_bugfix_validation`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add `-n 1000` to `git log` to avoid timeouts in treeless clones (CI uses `--filter=tree:0` when unshallowing) - Increase timeout from 30s to 60s - Add diagnostic output for failures and per-test repetition counts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI runs tests as a different user than the repo owner, causing `git log` to fail with "detected dubious ownership in repository". Add `git config --global --add safe.directory` before the `git log` call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the `git log`-based ranking with simple reverse name sorting. Since test names have numeric prefixes, sorting by name descending naturally puts the newest tests first, without needing git access. This avoids the "dubious ownership" errors that prevented the feature from working in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Fail Rate by Test Number Range (last 30 days, amd_binary)
The 04xxx group fails at ~100x the rate of 02xxx/03xxx. Key observations
So yes — PR #100385's approach of adjusting repetition based on test number is well-supported by this data. Newer tests (higher |
|
I'd suggest disabling retries also for |
…lickHouse into repeat-newly-modified-tests
When `args.test` is set (local run case), skip the repeat-newly-modified-tests flag to avoid unnecessary repetitions of explicitly selected tests. #100385 (comment) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When `--repeat-newly-modified-tests` duplicates tests in the parallel list, multiple workers run the same test concurrently, all writing to the same `.stdout` file. The `file_suffix` (PID-based) was only set when `args.test_runs > 1`, but the new feature duplicates tests in the list without increasing `test_runs`, causing `FileNotFoundError` when one worker's cleanup removes the file while another calls `replace_in_file`. Fix by always using PID suffix in concurrent mode, and also excluding targeted checks from `--repeat-newly-modified-tests` (targeted checks already have their own rerun logic). #100385 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When randomized settings are enabled in CI, the functional test runner now repeats recently modified tests multiple times with different random settings. Tests are ranked by git modification time (newest first), and each test at rank
rgetsint(sqrt(100 / r))repetitions:This adds ~78 extra test runs per suite, helping detect new flaky tests earlier before they accumulate failures across many CI runs.
The feature is enabled via
--repeat-newly-modified-testsflag and is automatically activated in CI for all functional test jobs that use randomized settings (i.e., not flaky check, not coverage, not azure).Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Repeat recently modified tests with different randomized settings.
Documentation entry for user-facing changes