Fix wrong column comparison in FinishAggregatingInOrderAlgorithm by groeneai · Pull Request #102299 · ClickHouse/ClickHouse

groeneai · 2026-04-09T22:09:22Z

Changelog category:

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry:

Fix incorrect aggregation results (duplicate rows) when using optimize_aggregation_in_order=1 with GROUP BY columns ordered differently from the table's sorting key.

Description

FinishAggregatingInOrderAlgorithm merges partially aggregated data from parallel in-order aggregation streams. It uses a sort description to compare rows across streams and determine merge group boundaries via std::upper_bound.

The bug: In State::State(), sorting_columns was built by iterating through the sort description and using emplace_back, creating a compact array indexed 0..desc.size()-1. However, the less() comparison function uses elem.column_number (the column's position in the output header) to index into sorting_columns. When these two orderings differ, the wrong columns are compared.

This happens whenever the GROUP BY lists a non-sort-key column before a sort-key column. For example, GROUP BY a, b on a table with ORDER BY b:

Output header: a at position 0, b at position 1
Sort description: [b, a] (sort-key column first)
Old sorting_columns: [b_col, a_col] (indexed 0, 1)
less() for b uses column_number=1 → accesses sorting_columns[1] = a_col ← WRONG
less() for a uses column_number=0 → accesses sorting_columns[0] = b_col ← WRONG

The swapped comparison corrupts the upper_bound binary search (the data is sorted by (b, a) but compared by (a, b)), producing incorrect merge group boundaries. Rows with the same key end up in different merge groups, causing duplicate rows in the final result.

The fix: Change sorting_columns to be indexed by header position (resize to all_columns.size(), assign at column_number) instead of by description iteration order (emplace_back). This ensures less() accesses the correct column for each sort description entry.

Reproduction: GROUP BY a, b on a table ORDER BY b with optimize_aggregation_in_order=1 and multiple parallel streams:

Before fix: 28091–37081 rows instead of 20000 (40–85% inflation from duplicates)
After fix: consistently 20000

This also fixes the flaky test 00069_duplicate_aggregation_keys which uses GROUP BY URL, EventDate, EventDate on the test.hits table (sorted by CounterID, EventDate, ...).

groeneai · 2026-04-09T22:09:42Z

Pre-PR Validation Gate (session: cron:clickhouse-ci-task-worker:20260409-214500)

a) Deterministic repro? ✅ Yes.

CREATE TABLE t (a String, b UInt32, c UInt32) ENGINE = MergeTree ORDER BY b;
-- Insert into 4+ parts, then:
SELECT count() FROM (SELECT a, b, count() FROM t GROUP BY a, b)
SETTINGS optimize_aggregation_in_order = 1, max_threads = 8;
-- Returns 28091 instead of 20000 (EVERY TIME, not intermittent)

b) Root cause explained? ✅ State::State() builds sorting_columns via emplace_back (indexed 0..N-1), but less() uses elem.column_number (header position) to index into it. When sort description order ≠ header column order, the wrong columns are compared. Specifically: for GROUP BY a, b on ORDER BY b, the sort description is [b, a], header positions are a=0, b=1. sorting_columns[0]=b_col, [1]=a_col, but less() accesses [1] for b → gets a_col. The swapped comparison corrupts upper_bound on (b,a)-sorted data, splitting matching keys across merge groups.

c) Fix matches root cause? ✅ Changes sorting_columns from emplace_back (description-order indexed) to resize + assign at column_number (header-position indexed). This makes less() access the correct column.

d) New test added? ✅ 04095_aggregation_in_order_merge_wrong_column_order — creates a table ORDER BY b, groups by (a, b), verifies exactly 20000 groups with small block sizes that force multiple merge boundaries. Also tests the duplicate GROUP BY key variant.

e) Both directions demonstrated? ✅

Without fix: 28091–37081 rows (deterministic, every run)
With fix: 20000 rows (50/50 stress test passes, 0 mismatches vs baseline, 0 duplicates)

f) Fix is general? ✅ The fix corrects the sorting_columns indexing for ALL queries processed by FinishAggregatingInOrderAlgorithm, not just the specific test case. Any query where GROUP BY column order differs from sort key prefix order was affected. The fix is a 2-line change in the State constructor that ensures consistent indexing.

groeneai · 2026-04-09T22:10:09Z

cc @KochetovNicolai @nickitat — could you review this? It fixes a correctness bug in FinishAggregatingInOrderAlgorithm where sorting_columns was indexed by sort-description iteration order but less() indexed by header position, causing swapped column comparisons and duplicate rows when GROUP BY column order differs from the table's sorting key order.

clickhouse-gh · 2026-04-09T22:12:24Z

Workflow [PR], commit [db810f8]

Summary: ✅

AI Review

Summary

This PR fixes a real correctness issue in FinishAggregatingInOrderAlgorithm: sort columns were previously stored by description iteration order while less accessed them by header position, which could swap comparisons and split equal keys into different merge groups. The fix reindexes sorting_columns by header position, and the added regression test covers both GROUP BY a, b and duplicate-key variant behavior under multi-stream in-order aggregation. I did not find additional blocker/major issues.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅
No large/binary files	✅

Final Verdict

Status: ✅ Approve

groeneai · 2026-04-09T22:31:13Z

Bugfix validation fix (commit 48d9b5d):

The first CI run failed bugfix validation because the regression test didn't trigger the bug on the master binary. Two issues:

Single-column sort description: ORDER BY b produces a 1-column sort description for in-order aggregation. The bug requires a multi-column sort description where column_number values differ from iteration order. Changed to ORDER BY (b, a) — now the optimizer uses a 2-column sort description [b (pos 1), a (pos 0)], and the old emplace_back code puts them at sorting_columns[0]=b, [1]=a while less() accesses [1] expecting b but gets a → swapped comparisons.
Parts merged before query: Release builds merge 4 small parts into 1 before the SELECT runs, eliminating the multi-stream merge step. Added SYSTEM STOP MERGES to keep all 4 parts alive.

Verified against the master release binary (same one bugfix validation downloads):

Master (buggy): 50,914 rows instead of 20,000 (2.5× duplicates)
Fixed: 20,000 rows (10/10 passes)

Session: cron:clickhouse-ci-task-worker:20260409-221500

alexey-milovidov · 2026-04-10T02:27:30Z

The flaky check failure is fixed in #102148, let's update the branch.

alexey-milovidov

Looks good. Clean and clear.

Avogar · 2026-04-16T19:36:30Z

@groeneai update with master branch, flaky checks should be fixed

Avogar · 2026-04-16T19:37:35Z

@groeneai also update the description to link #102909

In FinishAggregatingInOrderAlgorithm::State::State(), sorting_columns was built by iterating through the sort description and using emplace_back, creating a compact array indexed 0..desc.size()-1. However, the less() comparison function uses elem.column_number (the column's position in the header) to index into sorting_columns. When the sort description column order differs from the header column order — which happens whenever GROUP BY lists a non-sort-key column before a sort-key column (e.g., GROUP BY a, b on a table ORDER BY b) — the wrong columns were compared. This corrupted the binary search in upper_bound(), producing incorrect merge group boundaries and causing duplicate rows in the aggregation result. The fix changes sorting_columns to be indexed by header position (resize to all_columns.size(), assign at column_number) instead of by description iteration order (emplace_back). This ensures less() accesses the correct column for each sort description entry. Reproduction: GROUP BY a, b on a table ORDER BY b with optimize_aggregation_in_order=1 and multiple parallel streams. Before fix: 28091-37081 rows instead of 20000 (40-85% inflation). After fix: consistently 20000.

The bug requires a multi-column sort description where column_number values differ from iteration order. With ORDER BY b only, the sort description has just 1 column — no mismatch possible. Changed to ORDER BY (b, a) so the optimizer uses a 2-column sort description [b (pos 1), a (pos 0)], triggering the swapped comparison bug. Also add SYSTEM STOP MERGES to prevent background merges from reducing 4 parts to 1 before the SELECT, which would eliminate the multi-stream merge step needed to trigger the bug. This is critical for release builds where merges happen much faster. Verified: master binary produces 50914 rows (bug triggered), fixed binary produces 20000 rows (10/10 passes).

groeneai · 2026-04-16T20:47:11Z

@Avogar Done — rebased onto current master and updated the description with Fixes #102909.

clickhouse-gh · 2026-04-16T23:39:14Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.00%	84.00%	+0.00%
Functions	90.90%	90.90%	+0.00%
Branches	76.50%	76.50%	+0.00%

Changed lines: 80.00% (4/5) · Uncovered code

Full report · Diff report

clickgapai · 2026-04-17T12:11:49Z

Hi — this PR may need backporting to 26.3 (LTS), 26.2, 26.1, 25.8 (LTS), but no backport label was found.

Affected code: FinishAggregatingInOrderAlgorithm::State::State constructor — column indexing in sorting_columns — core code present in all supported branches.

Why: This is a P0 wrong-results bug (duplicate rows in aggregation). git log shows FinishAggregatingInOrderAlgorithm was introduced in commit 573edbc (2021-01-22), predating all supported branches. The bug triggers with optimize_aggregation_in_order = 1 (default OFF but commonly enabled) when GROUP BY column order differs from the table's ORDER BY, with multiple input streams.

If this should be backported, consider adding pr-must-backport or a version-specific label (e.g. v26.3-must-backport). Ignore this if backporting is not applicable.

alexey-milovidov added the can be tested Allows running workflows for external contributors label Apr 9, 2026

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 9, 2026

alexey-milovidov approved these changes Apr 10, 2026

View reviewed changes

alexey-milovidov self-assigned this Apr 10, 2026

Avogar mentioned this pull request Apr 16, 2026

Fix CASE with Dynamic expression returning ELSE for all rows #102684

Merged

1 task

groeneai added 2 commits April 16, 2026 20:46

groeneai force-pushed the fix/inorder-agg-column-order-mismatch branch from d5cc13a to db810f8 Compare April 16, 2026 20:46

Avogar added this pull request to the merge queue Apr 17, 2026

Merged via the queue into ClickHouse:master with commit b5f4a93 Apr 17, 2026
161 checks passed

robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wrong column comparison in FinishAggregatingInOrderAlgorithm#102299

Fix wrong column comparison in FinishAggregatingInOrderAlgorithm#102299
Avogar merged 2 commits intoClickHouse:masterfrom
groeneai:fix/inorder-agg-column-order-mismatch

groeneai commented Apr 9, 2026 •

edited

Loading

Uh oh!

groeneai commented Apr 9, 2026

Uh oh!

groeneai commented Apr 9, 2026

Uh oh!

clickhouse-gh Bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

groeneai commented Apr 9, 2026

Uh oh!

alexey-milovidov commented Apr 10, 2026

Uh oh!

alexey-milovidov left a comment

Uh oh!

Avogar commented Apr 16, 2026

Uh oh!

Avogar commented Apr 16, 2026 •

edited

Loading

Uh oh!

groeneai commented Apr 16, 2026

Uh oh!

clickhouse-gh Bot commented Apr 16, 2026

Uh oh!

Uh oh!

clickgapai commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

groeneai commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category:

Changelog entry:

Description

Uh oh!

groeneai commented Apr 9, 2026

Pre-PR Validation Gate (session: cron:clickhouse-ci-task-worker:20260409-214500)

Uh oh!

groeneai commented Apr 9, 2026

Uh oh!

clickhouse-gh Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

ClickHouse Rules

Final Verdict

Uh oh!

groeneai commented Apr 9, 2026

Uh oh!

alexey-milovidov commented Apr 10, 2026

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

Avogar commented Apr 16, 2026

Uh oh!

Avogar commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groeneai commented Apr 16, 2026

Uh oh!

clickhouse-gh Bot commented Apr 16, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

clickgapai commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

groeneai commented Apr 9, 2026 •

edited

Loading

clickhouse-gh Bot commented Apr 9, 2026 •

edited

Loading

Avogar commented Apr 16, 2026 •

edited

Loading