Fix wrong column comparison in FinishAggregatingInOrderAlgorithm#102299
Fix wrong column comparison in FinishAggregatingInOrderAlgorithm#102299Avogar merged 2 commits intoClickHouse:masterfrom
Conversation
Pre-PR Validation Gate (session: cron:clickhouse-ci-task-worker:20260409-214500)a) Deterministic repro? ✅ Yes. CREATE TABLE t (a String, b UInt32, c UInt32) ENGINE = MergeTree ORDER BY b;
-- Insert into 4+ parts, then:
SELECT count() FROM (SELECT a, b, count() FROM t GROUP BY a, b)
SETTINGS optimize_aggregation_in_order = 1, max_threads = 8;
-- Returns 28091 instead of 20000 (EVERY TIME, not intermittent)b) Root cause explained? ✅ c) Fix matches root cause? ✅ Changes d) New test added? ✅ e) Both directions demonstrated? ✅
f) Fix is general? ✅ The fix corrects the |
|
cc @KochetovNicolai @nickitat — could you review this? It fixes a correctness bug in |
|
Workflow [PR], commit [db810f8] Summary: ✅ AI ReviewSummaryThis PR fixes a real correctness issue in ClickHouse Rules
Final Verdict
|
|
Bugfix validation fix (commit 48d9b5d): The first CI run failed bugfix validation because the regression test didn't trigger the bug on the master binary. Two issues:
Verified against the master release binary (same one bugfix validation downloads):
Session: |
|
The flaky check failure is fixed in #102148, let's update the branch. |
alexey-milovidov
left a comment
There was a problem hiding this comment.
Looks good. Clean and clear.
|
@groeneai update with master branch, flaky checks should be fixed |
In FinishAggregatingInOrderAlgorithm::State::State(), sorting_columns was built by iterating through the sort description and using emplace_back, creating a compact array indexed 0..desc.size()-1. However, the less() comparison function uses elem.column_number (the column's position in the header) to index into sorting_columns. When the sort description column order differs from the header column order — which happens whenever GROUP BY lists a non-sort-key column before a sort-key column (e.g., GROUP BY a, b on a table ORDER BY b) — the wrong columns were compared. This corrupted the binary search in upper_bound(), producing incorrect merge group boundaries and causing duplicate rows in the aggregation result. The fix changes sorting_columns to be indexed by header position (resize to all_columns.size(), assign at column_number) instead of by description iteration order (emplace_back). This ensures less() accesses the correct column for each sort description entry. Reproduction: GROUP BY a, b on a table ORDER BY b with optimize_aggregation_in_order=1 and multiple parallel streams. Before fix: 28091-37081 rows instead of 20000 (40-85% inflation). After fix: consistently 20000.
The bug requires a multi-column sort description where column_number values differ from iteration order. With ORDER BY b only, the sort description has just 1 column — no mismatch possible. Changed to ORDER BY (b, a) so the optimizer uses a 2-column sort description [b (pos 1), a (pos 0)], triggering the swapped comparison bug. Also add SYSTEM STOP MERGES to prevent background merges from reducing 4 parts to 1 before the SELECT, which would eliminate the multi-stream merge step needed to trigger the bug. This is critical for release builds where merges happen much faster. Verified: master binary produces 50914 rows (bug triggered), fixed binary produces 20000 rows (10/10 passes).
d5cc13a to
db810f8
Compare
|
@Avogar Done — rebased onto current master and updated the description with |
LLVM Coverage Report
Changed lines: 80.00% (4/5) · Uncovered code |
|
Hi — this PR may need backporting to Affected code: Why: This is a P0 wrong-results bug (duplicate rows in aggregation). git log shows If this should be backported, consider adding |
Fixes #102909
Changelog category:
Changelog entry:
Fix incorrect aggregation results (duplicate rows) when using
optimize_aggregation_in_order=1with GROUP BY columns ordered differently from the table's sorting key.Description
FinishAggregatingInOrderAlgorithmmerges partially aggregated data from parallel in-order aggregation streams. It uses a sort description to compare rows across streams and determine merge group boundaries viastd::upper_bound.The bug: In
State::State(),sorting_columnswas built by iterating through the sort description and usingemplace_back, creating a compact array indexed0..desc.size()-1. However, theless()comparison function useselem.column_number(the column's position in the output header) to index intosorting_columns. When these two orderings differ, the wrong columns are compared.This happens whenever the GROUP BY lists a non-sort-key column before a sort-key column. For example,
GROUP BY a, bon a table withORDER BY b:aat position 0,bat position 1[b, a](sort-key column first)sorting_columns:[b_col, a_col](indexed 0, 1)less()forbusescolumn_number=1→ accessessorting_columns[1]=a_col← WRONGless()forausescolumn_number=0→ accessessorting_columns[0]=b_col← WRONGThe swapped comparison corrupts the
upper_boundbinary search (the data is sorted by(b, a)but compared by(a, b)), producing incorrect merge group boundaries. Rows with the same key end up in different merge groups, causing duplicate rows in the final result.The fix: Change
sorting_columnsto be indexed by header position (resizetoall_columns.size(), assign atcolumn_number) instead of by description iteration order (emplace_back). This ensuresless()accesses the correct column for each sort description entry.Reproduction:
GROUP BY a, bon a tableORDER BY bwithoptimize_aggregation_in_order=1and multiple parallel streams:This also fixes the flaky test
00069_duplicate_aggregation_keyswhich usesGROUP BY URL, EventDate, EventDateon thetest.hitstable (sorted byCounterID, EventDate, ...).