Skip to content

Fix wrong column comparison in FinishAggregatingInOrderAlgorithm#102299

Merged
Avogar merged 2 commits intoClickHouse:masterfrom
groeneai:fix/inorder-agg-column-order-mismatch
Apr 17, 2026
Merged

Fix wrong column comparison in FinishAggregatingInOrderAlgorithm#102299
Avogar merged 2 commits intoClickHouse:masterfrom
groeneai:fix/inorder-agg-column-order-mismatch

Conversation

@groeneai
Copy link
Copy Markdown
Contributor

@groeneai groeneai commented Apr 9, 2026

Fixes #102909

Changelog category:

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry:

Fix incorrect aggregation results (duplicate rows) when using optimize_aggregation_in_order=1 with GROUP BY columns ordered differently from the table's sorting key.

Description

FinishAggregatingInOrderAlgorithm merges partially aggregated data from parallel in-order aggregation streams. It uses a sort description to compare rows across streams and determine merge group boundaries via std::upper_bound.

The bug: In State::State(), sorting_columns was built by iterating through the sort description and using emplace_back, creating a compact array indexed 0..desc.size()-1. However, the less() comparison function uses elem.column_number (the column's position in the output header) to index into sorting_columns. When these two orderings differ, the wrong columns are compared.

This happens whenever the GROUP BY lists a non-sort-key column before a sort-key column. For example, GROUP BY a, b on a table with ORDER BY b:

  • Output header: a at position 0, b at position 1
  • Sort description: [b, a] (sort-key column first)
  • Old sorting_columns: [b_col, a_col] (indexed 0, 1)
  • less() for b uses column_number=1 → accesses sorting_columns[1] = a_colWRONG
  • less() for a uses column_number=0 → accesses sorting_columns[0] = b_colWRONG

The swapped comparison corrupts the upper_bound binary search (the data is sorted by (b, a) but compared by (a, b)), producing incorrect merge group boundaries. Rows with the same key end up in different merge groups, causing duplicate rows in the final result.

The fix: Change sorting_columns to be indexed by header position (resize to all_columns.size(), assign at column_number) instead of by description iteration order (emplace_back). This ensures less() accesses the correct column for each sort description entry.

Reproduction: GROUP BY a, b on a table ORDER BY b with optimize_aggregation_in_order=1 and multiple parallel streams:

  • Before fix: 28091–37081 rows instead of 20000 (40–85% inflation from duplicates)
  • After fix: consistently 20000

This also fixes the flaky test 00069_duplicate_aggregation_keys which uses GROUP BY URL, EventDate, EventDate on the test.hits table (sorted by CounterID, EventDate, ...).

@groeneai
Copy link
Copy Markdown
Contributor Author

groeneai commented Apr 9, 2026

Pre-PR Validation Gate (session: cron:clickhouse-ci-task-worker:20260409-214500)

a) Deterministic repro? ✅ Yes.

CREATE TABLE t (a String, b UInt32, c UInt32) ENGINE = MergeTree ORDER BY b;
-- Insert into 4+ parts, then:
SELECT count() FROM (SELECT a, b, count() FROM t GROUP BY a, b)
SETTINGS optimize_aggregation_in_order = 1, max_threads = 8;
-- Returns 28091 instead of 20000 (EVERY TIME, not intermittent)

b) Root cause explained?State::State() builds sorting_columns via emplace_back (indexed 0..N-1), but less() uses elem.column_number (header position) to index into it. When sort description order ≠ header column order, the wrong columns are compared. Specifically: for GROUP BY a, b on ORDER BY b, the sort description is [b, a], header positions are a=0, b=1. sorting_columns[0]=b_col, [1]=a_col, but less() accesses [1] for b → gets a_col. The swapped comparison corrupts upper_bound on (b,a)-sorted data, splitting matching keys across merge groups.

c) Fix matches root cause? ✅ Changes sorting_columns from emplace_back (description-order indexed) to resize + assign at column_number (header-position indexed). This makes less() access the correct column.

d) New test added?04095_aggregation_in_order_merge_wrong_column_order — creates a table ORDER BY b, groups by (a, b), verifies exactly 20000 groups with small block sizes that force multiple merge boundaries. Also tests the duplicate GROUP BY key variant.

e) Both directions demonstrated?

  • Without fix: 28091–37081 rows (deterministic, every run)
  • With fix: 20000 rows (50/50 stress test passes, 0 mismatches vs baseline, 0 duplicates)

f) Fix is general? ✅ The fix corrects the sorting_columns indexing for ALL queries processed by FinishAggregatingInOrderAlgorithm, not just the specific test case. Any query where GROUP BY column order differs from sort key prefix order was affected. The fix is a 2-line change in the State constructor that ensures consistent indexing.

@groeneai
Copy link
Copy Markdown
Contributor Author

groeneai commented Apr 9, 2026

cc @KochetovNicolai @nickitat — could you review this? It fixes a correctness bug in FinishAggregatingInOrderAlgorithm where sorting_columns was indexed by sort-description iteration order but less() indexed by header position, causing swapped column comparisons and duplicate rows when GROUP BY column order differs from the table's sorting key order.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Apr 9, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 9, 2026

Workflow [PR], commit [db810f8]

Summary:


AI Review

Summary

This PR fixes a real correctness issue in FinishAggregatingInOrderAlgorithm: sort columns were previously stored by description iteration order while less accessed them by header position, which could swap comparisons and split equal keys into different merge groups. The fix reindexes sorting_columns by header position, and the added regression test covers both GROUP BY a, b and duplicate-key variant behavior under multi-stream in-order aggregation. I did not find additional blocker/major issues.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 9, 2026
@groeneai
Copy link
Copy Markdown
Contributor Author

groeneai commented Apr 9, 2026

Bugfix validation fix (commit 48d9b5d):

The first CI run failed bugfix validation because the regression test didn't trigger the bug on the master binary. Two issues:

  1. Single-column sort description: ORDER BY b produces a 1-column sort description for in-order aggregation. The bug requires a multi-column sort description where column_number values differ from iteration order. Changed to ORDER BY (b, a) — now the optimizer uses a 2-column sort description [b (pos 1), a (pos 0)], and the old emplace_back code puts them at sorting_columns[0]=b, [1]=a while less() accesses [1] expecting b but gets a → swapped comparisons.

  2. Parts merged before query: Release builds merge 4 small parts into 1 before the SELECT runs, eliminating the multi-stream merge step. Added SYSTEM STOP MERGES to keep all 4 parts alive.

Verified against the master release binary (same one bugfix validation downloads):

  • Master (buggy): 50,914 rows instead of 20,000 (2.5× duplicates)
  • Fixed: 20,000 rows (10/10 passes)

Session: cron:clickhouse-ci-task-worker:20260409-221500

@alexey-milovidov
Copy link
Copy Markdown
Member

The flaky check failure is fixed in #102148, let's update the branch.

Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Clean and clear.

@alexey-milovidov alexey-milovidov self-assigned this Apr 10, 2026
@Avogar
Copy link
Copy Markdown
Member

Avogar commented Apr 16, 2026

@groeneai update with master branch, flaky checks should be fixed

@Avogar
Copy link
Copy Markdown
Member

Avogar commented Apr 16, 2026

@groeneai also update the description to link #102909

In FinishAggregatingInOrderAlgorithm::State::State(), sorting_columns
was built by iterating through the sort description and using
emplace_back, creating a compact array indexed 0..desc.size()-1.
However, the less() comparison function uses elem.column_number (the
column's position in the header) to index into sorting_columns.

When the sort description column order differs from the header column
order — which happens whenever GROUP BY lists a non-sort-key column
before a sort-key column (e.g., GROUP BY a, b on a table ORDER BY b)
— the wrong columns were compared. This corrupted the binary search
in upper_bound(), producing incorrect merge group boundaries and
causing duplicate rows in the aggregation result.

The fix changes sorting_columns to be indexed by header position
(resize to all_columns.size(), assign at column_number) instead of
by description iteration order (emplace_back). This ensures less()
accesses the correct column for each sort description entry.

Reproduction: GROUP BY a, b on a table ORDER BY b with
optimize_aggregation_in_order=1 and multiple parallel streams.
Before fix: 28091-37081 rows instead of 20000 (40-85% inflation).
After fix: consistently 20000.
The bug requires a multi-column sort description where column_number
values differ from iteration order. With ORDER BY b only, the sort
description has just 1 column — no mismatch possible. Changed to
ORDER BY (b, a) so the optimizer uses a 2-column sort description
[b (pos 1), a (pos 0)], triggering the swapped comparison bug.

Also add SYSTEM STOP MERGES to prevent background merges from
reducing 4 parts to 1 before the SELECT, which would eliminate
the multi-stream merge step needed to trigger the bug. This is
critical for release builds where merges happen much faster.

Verified: master binary produces 50914 rows (bug triggered),
fixed binary produces 20000 rows (10/10 passes).
@groeneai groeneai force-pushed the fix/inorder-agg-column-order-mismatch branch from d5cc13a to db810f8 Compare April 16, 2026 20:46
@groeneai
Copy link
Copy Markdown
Contributor Author

@Avogar Done — rebased onto current master and updated the description with Fixes #102909.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 16, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 80.00% (4/5) · Uncovered code

Full report · Diff report

@Avogar Avogar added this pull request to the merge queue Apr 17, 2026
Merged via the queue into ClickHouse:master with commit b5f4a93 Apr 17, 2026
161 checks passed
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 17, 2026
@clickgapai
Copy link
Copy Markdown
Contributor

Hi — this PR may need backporting to 26.3 (LTS), 26.2, 26.1, 25.8 (LTS), but no backport label was found.

Affected code: FinishAggregatingInOrderAlgorithm::State::State constructor — column indexing in sorting_columns — core code present in all supported branches.

Why: This is a P0 wrong-results bug (duplicate rows in aggregation). git log shows FinishAggregatingInOrderAlgorithm was introduced in commit 573edbc (2021-01-22), predating all supported branches. The bug triggers with optimize_aggregation_in_order = 1 (default OFF but commonly enabled) when GROUP BY column order differs from the table's ORDER BY, with multiple input streams.

If this should be backported, consider adding pr-must-backport or a version-specific label (e.g. v26.3-must-backport). Ignore this if backporting is not applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: 00069_duplicate_aggregation_keys

5 participants