Fix flaky test_insert_select_from_cluster_with_partition_pruning#105525
Conversation
The cluster INSERT path (distributedWriteIntoReplicatedMergeTreeOrDataLakeFromClusterStorage) sends the SELECT to a remote replica, which writes the part there and replicates via ZooKeeper. The INSERT pipeline returns to the client as soon as the remote query completes, before the LOCAL replica on the coordinator has fetched the part. If the test runs SELECT immediately after INSERT (timing under ~10ms on slow CI shards), the part is still PreActive locally and the SELECT returns 0 rows. The first INSERT in the test happened to pass because the SYSTEM FLUSH LOGS ON CLUSTER between INSERT and SELECT acted as an accidental barrier. The second INSERT lacked any synchronization, so it fails about 30% of the time on slower configs (arm_binary, ASan, MSan, db disk, llvm_coverage). Add explicit SYSTEM SYNC REPLICA after each INSERT to ensure the local replica has fetched any new parts before reading. Related: ClickHouse#100752 Signed-off-by: Groene AI <270696204+groeneai@users.noreply.github.com>
Pre-PR validation gatea) Deterministic repro? Partial. The race is timing-dependent (interval between INSERT pipeline return and the start of b) Root cause explained? Yes. The cluster INSERT path in c) Fix matches root cause? Yes. d) Test intent preserved? Yes. The fix adds synchronization without weakening any assertions or pinning settings:
e) Demonstrated in both directions? Partial. Without the fix: CIDB shows 30+ failures with f) Fix is general? Yes. Both INSERTs in the test exhibit the same race; both are fixed. The "happens to work" first INSERT (with Note: the same race is observable for user-written Session: cron:clickhouse-ci-task-worker:20260521-124500 |
|
cc @scanhex12 — could you review this? You wrote this test in #101299 / #101634; the second INSERT block is missing a |
|
Workflow [PR], commit [e4d6870] Summary: ❌
AI ReviewSummaryThis PR makes Final VerdictStatus: ✅ Approve |
LLVM Coverage Report
Changed lines: No C/C++ source files changed — skipping uncovered code analysis. Newly covered by added/modified tests: 458 line(s), 14 function(s) across 133 file(s) · Details Top files
|
The cluster INSERT path (
distributedWriteIntoReplicatedMergeTreeOrDataLakeFromClusterStorage, added in #101299) sends the SELECT to a remote replica, which writes the part there and replicates via ZooKeeper. The INSERT pipeline returns as soon as the remote query completes, before the local replica on the coordinator has fetched the part. If SELECT runs immediately after, it sees 0 rows.The first INSERT in the test happened to pass because
SYSTEM FLUSH LOGS ON CLUSTERbetween INSERT and SELECT acted as an accidental barrier. The second INSERT had no synchronization and failed across many unrelated PRs (#100185, #101446, #101757, #102115, #103525, #105249, etc.) on slower configs (arm_binary, ASan, MSan,db disk,llvm_coverage), plus one master hit on 2026-05-12. Reported by @alexey-milovidov on #100752.Fix: add
SYSTEM SYNC REPLICAafter each INSERT so the local replica has fetched any new parts before reading.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix flaky
test_insert_select_from_cluster_with_partition_pruning.Documentation entry for user-facing changes