Skip to content

Repeat recently modified tests with different randomized settings#100385

Merged
alexey-milovidov merged 9 commits intomasterfrom
repeat-newly-modified-tests
Mar 24, 2026
Merged

Repeat recently modified tests with different randomized settings#100385
alexey-milovidov merged 9 commits intomasterfrom
repeat-newly-modified-tests

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Mar 22, 2026

When randomized settings are enabled in CI, the functional test runner now repeats recently modified tests multiple times with different random settings. Tests are ranked by git modification time (newest first), and each test at rank r gets int(sqrt(100 / r)) repetitions:

  • Rank 1 (newest): 10 runs
  • Rank 2: 7 runs
  • Rank 3-4: 5 runs
  • Rank 5-6: 4 runs
  • Rank 7-11: 3 runs
  • Rank 12-25: 2 runs
  • Rank 26+: 1 run (no extra repeats)

This adds ~78 extra test runs per suite, helping detect new flaky tests earlier before they accumulate failures across many CI runs.

The feature is enabled via --repeat-newly-modified-tests flag and is automatically activated in CI for all functional test jobs that use randomized settings (i.e., not flaky check, not coverage, not azure).

Changelog category (leave one):

  • Build/Testing/Packaging Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Repeat recently modified tests with different randomized settings.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

When randomized settings are enabled, the functional test runner can now
repeat recently modified tests multiple times with different random settings.
Tests are ranked by git modification time (newest first), and each test at
rank r gets int(sqrt(100 / r)) repetitions: the newest test runs 10 times,
the 25th most recent runs twice, older tests run once as before.

This helps detect flaky tests earlier, before they accumulate failures in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 22, 2026

Workflow [PR], commit [1649fed]

Summary:


AI Review

Summary

This PR adds --repeat-newly-modified-tests to functional CI runs with randomized settings and updates clickhouse-test to increase repetitions for higher-ranked tests. On current PR HEAD, I did not find new high-confidence correctness, safety, performance, or ClickHouse-rule violations beyond already-existing review threads.

Missing context

  • ⚠️ No CI reports/log links were provided in the review request, so this review is based on code and PR metadata only.

ClickHouse Rules

Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time

Final Verdict

  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-build Pull request with build/testing/packaging improvement label Mar 22, 2026
os.environ["LLVM_PROFILE_FILE"] = f"ft-{batch_num}-%2m.profraw"

if (
not is_flaky_check
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BugfixValidation forces GLOBAL_TAGS=no-random-settings (see below in this file), so random settings are effectively disabled for this job type. But this condition does not exclude BugfixValidation, so we still append --repeat-newly-modified-tests and then pay for git log scanning in clickhouse-test even though there are no randomized runs to diversify.

Could you also gate this by not is_bugfix_validation (or equivalently check for GLOBAL_TAGS / explicit random-settings enablement) to keep behavior consistent with the "only when randomized settings are enabled" intent?

`BugfixValidation` sets `GLOBAL_TAGS=no-random-settings` but does not
add `--no-random-settings` to `runner_options`, so the existing check
did not catch it. Gate explicitly on `not is_bugfix_validation`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov requested a review from maxknv March 22, 2026 22:32
Comment thread tests/clickhouse-test Outdated
- Add `-n 1000` to `git log` to avoid timeouts in treeless clones
  (CI uses `--filter=tree:0` when unshallowing)
- Increase timeout from 30s to 60s
- Add diagnostic output for failures and per-test repetition counts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread tests/clickhouse-test
CI runs tests as a different user than the repo owner, causing
`git log` to fail with "detected dubious ownership in repository".
Add `git config --global --add safe.directory` before the `git log` call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread tests/clickhouse-test Outdated
Replace the `git log`-based ranking with simple reverse name sorting.
Since test names have numeric prefixes, sorting by name descending
naturally puts the newest tests first, without needing git access.

This avoids the "dubious ownership" errors that prevented the feature
from working in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread tests/clickhouse-test
@maxknv
Copy link
Copy Markdown
Member

maxknv commented Mar 23, 2026

Fail Rate by Test Number Range (last 30 days, amd_binary)

Range Fails Total Fail Rate
00xxx 869 4.8M 0.018%
01xxx 2,377 7.0M 0.034%
02xxx 728 10.2M 0.007%
03xxx 2,673 10.9M 0.025%
04xxx 3,843 214K 1.794%

The 04xxx group fails at ~100x the rate of 02xxx/03xxx.

Key observations

  1. The pattern is stark: 04xxx tests are ~53–100× more flaky than older test ranges.
  2. Many 04xxx tests fail 100% of the time — e.g., 04007_wasm_mixed_types (100 runs, all FAIL),
    04036_refresh_every_depends_on_double_fire (99 runs, all FAIL), 04039_variant_element_shared_discriminators (50 runs, all FAIL).
    These aren't "flaky" — they're always failing, likely tests for features not yet enabled in the standard binary.
  3. The lower total run count for 04xxx (214K vs 10M+) reflects that these tests are newer and have fewer accumulated runs.
  4. Within 03xxx, the higher-numbered tests like 03835, 03914, 03927, 03960 also dominate the flaky list.

So yes — PR #100385's approach of adjusting repetition based on test number is well-supported by this data. Newer tests (higher
numbers) are dramatically more flaky, so giving them more retries/repetitions makes sense to reduce false CI failures.

@maxknv
Copy link
Copy Markdown
Member

maxknv commented Mar 23, 2026

I'd suggest disabling retries also for args.test is not None (local run case)

alexey-milovidov and others added 4 commits March 24, 2026 01:00
When `args.test` is set (local run case), skip the repeat-newly-modified-tests
flag to avoid unnecessary repetitions of explicitly selected tests.

#100385 (comment)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When `--repeat-newly-modified-tests` duplicates tests in the parallel
list, multiple workers run the same test concurrently, all writing to
the same `.stdout` file. The `file_suffix` (PID-based) was only set
when `args.test_runs > 1`, but the new feature duplicates tests in
the list without increasing `test_runs`, causing `FileNotFoundError`
when one worker's cleanup removes the file while another calls
`replace_in_file`.

Fix by always using PID suffix in concurrent mode, and also excluding
targeted checks from `--repeat-newly-modified-tests` (targeted checks
already have their own rerun logic).

#100385

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov merged commit 2c40d32 into master Mar 24, 2026
151 of 152 checks passed
@alexey-milovidov alexey-milovidov deleted the repeat-newly-modified-tests branch March 24, 2026 15:22
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-build Pull request with build/testing/packaging improvement pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants