[Feature](agg) support bucketed agg operator by BiteTheDDDDt · Pull Request #61495 · apache/doris

BiteTheDDDDt · 2026-03-18T16:06:24Z

This pull request introduces a new bucketed hash aggregation operator for the pipeline engine, refactors aggregation data variant handling to support this new operator, and adds supporting infrastructure for efficient memory usage and operator registration. The main changes include new source and sink operator implementations for bucketed aggregation, a flexible and reusable aggregation data variant base, and various supporting improvements for memory management and code organization.

Bucketed Hash Aggregation Operator Implementation:

Added new files bucketed_aggregation_sink_operator.h and bucketed_aggregation_source_operator.h implementing the sink and source operators for bucketed hash aggregation, including local state management, per-bucket hash tables, and pipelined merge logic. [1] [2]
Registered the new operators in the operator factory system, enabling their use in query execution. [1] [2] [3]

Aggregation Data Variant Refactoring:

Refactored aggregation data variant logic in agg_utils.h to introduce a parameterized AggMethodVariantsBase and AggDataVariantsBase, supporting both traditional and bucketed aggregation with different string key hash map implementations. Added BucketedAggDataVariants and associated types for bucketed aggregation. [1] [2]

Performance and Memory Improvements:

Added reusable buffers for output key storage in hash table contexts to reduce per-batch heap allocations and improve performance.

These changes collectively enable efficient, parallel, and memory-aware bucketed hash aggregation in the pipeline engine, improving scalability and paving the way for further aggregation optimizations.

Thearas · 2026-03-18T16:06:32Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

BiteTheDDDDt · 2026-03-18T16:06:34Z

run buildall

Copilot

Pull request overview

This PR introduces a new “bucketed hash aggregation” optimization path that fuses local+global aggregation into a single operator for single-BE deployments, along with the required FE/BE plan and pipeline support.

Changes:

Add a new Thrift plan node type + payload (BUCKETED_AGGREGATION_NODE / TBucketedAggregationNode) and wire it into TPlanNode.
Add Nereids physical plan + translation + costing to generate and pick PhysicalBucketedHashAggregate.
Implement BE pipeline sink/source operators and shared-state to build per-instance bucketed hash tables and merge/output them without exchange.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
gensrc/thrift/PlanNodes.thrift	Adds new plan node type and Thrift struct for bucketed aggregation
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java	Adds session variables controlling the optimization and thresholds
fe/fe-core/src/main/java/org/apache/doris/planner/BucketedAggregationNode.java	Adds legacy planner node that serializes bucketed agg into Thrift
fe/fe-core/src/main/java/org/apache/doris/nereids/**	Adds new physical node, visitor hooks, properties, stats, cost model, and implementation rule
be/src/exec/pipeline/pipeline_fragment_context.cpp	Creates bucketed agg source/sink pipelines and registers shared state
be/src/exec/pipeline/dependency.{h,cpp}	Adds `BucketedAggSharedState` and cleanup/destroy support
be/src/exec/operator/operator.cpp	Registers new bucketed agg pipeline local states
be/src/exec/operator/bucketed_aggregation_*	Implements bucketed agg sink/source operators
be/src/exec/common/hash_table/hash_map_context.h	Adds reusable output buffer to hash method state
be/src/exec/common/agg_utils.h	Factors agg hash-table variants and adds `BucketedAggDataVariants`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java

+                    + "消除 Exchange 开销和序列化/反序列化成本。默认关闭。",
+            "Whether to enable bucketed hash aggregation optimization. This optimization fuses two-phase "
+                    + "aggregation into a single operator on single-BE deployments, eliminating exchange overhead "
+                    + "and serialization/deserialization costs. Disabled by default."})


fe/fe-core/src/main/java/org/apache/doris/planner/BucketedAggregationNode.java

+        TBucketedAggregationNode bucketedAggNode = new TBucketedAggregationNode();
+        bucketedAggNode.setGroupingExprs(groupingExprs);
+        bucketedAggNode.setAggregateFunctions(aggregateFunctions);
+        bucketedAggNode.setIntermediateTupleId(aggInfo.getOutputTupleId().asInt());


gensrc/thrift/PlanNodes.thrift

+struct TBucketedAggregationNode {
+  1: optional list<Exprs.TExpr> grouping_exprs
+  2: required list<Exprs.TExpr> aggregate_functions
+  3: required Types.TTupleId intermediate_tuple_id
+  4: required Types.TTupleId output_tuple_id
+  5: required bool need_finalize
+}


...ore/src/main/java/org/apache/doris/nereids/rules/implementation/SplitAggWithoutDistinct.java

+            return ImmutableList.of();
+        }
+        // Only for single-BE deployments
+        int beNumber = Math.max(1, ctx.getEnv().getClusterInfo().getBackendsNumber(true));


be/src/exec/pipeline/pipeline_fragment_context.cpp

+        op = std::make_shared<BucketedAggSourceOperatorX>(pool, tnode, next_operator_id(), descs);
+        RETURN_IF_ERROR(cur_pipe->add_operator(op, _parallel_instances));


be/src/exec/pipeline/pipeline_fragment_context.cpp

+            for (int i = 0; i < _num_instances; i++) {
+                auto sink_dep = std::make_shared<Dependency>(op->operator_id(), op->node_id(),
+                                                             "BUCKETED_AGG_SINK_DEPENDENCY");
+                sink_dep->set_shared_state(shared_state.get());
+                shared_state->sink_deps.push_back(sink_dep);
+            }
+            shared_state->create_source_dependencies(_num_instances, op->operator_id(),


...ore/src/main/java/org/apache/doris/nereids/rules/implementation/SplitAggWithoutDistinct.java

+            if (expr instanceof SlotReference) {
+                SlotReference slot = (SlotReference) expr;
+                if (slot.getOriginalColumn().isPresent()) {
+                    groupByColumnNames.add(slot.getOriginalColumn().get().getName().toLowerCase());


...ore/src/main/java/org/apache/doris/nereids/rules/implementation/SplitAggWithoutDistinct.java

+        }
+        // All distribution columns must appear in the GROUP BY keys
+        for (Column column : distributionColumns) {
+            if (!groupByColumnNames.contains(column.getName().toLowerCase())) {


hello-stephen · 2026-03-18T20:04:53Z

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.11% (26915/36814)
Line Coverage	56.66% (288714/509559)
Region Coverage	53.86% (239706/445048)
Branch Coverage	55.60% (103754/186611)

BiteTheDDDDt · 2026-03-19T09:21:02Z

run buildall

BiteTheDDDDt · 2026-03-19T11:01:55Z

run buildall

BiteTheDDDDt · 2026-03-19T11:01:59Z

/review

github-actions

Bucketed Hash Aggregation Review — Code Review Skill Checkpoint Analysis

Part 1.3 Critical Checkpoint Conclusions

Goal: Fuse two-phase (local + global) aggregation into a single operator for single-BE deployments using 256 per-bucket hash tables. The code achieves this goal structurally, but has critical correctness and safety issues.

Concurrency: Sink instances write to independent per_instance_data[task_idx] (safe). Source uses per-bucket CAS locking (safe). state_generation fence-based wakeup is correct. However, shared probe_expr_ctxs are called concurrently without per-instance cloning — see inline comment.

Memory Safety: get_reserve_mem_size() returns 0, completely bypassing pipeline memory reservation. With 256 hash tables × N instances, uncontrolled memory growth is possible. No spill path exists as fallback. No SCOPED_PEAK_MEM instrumentation. See inline comment.

Test Coverage: Zero tests — no regression tests, no BE unit tests, no FE unit tests for ~2700 lines of new production code across FE, BE, and Thrift.

Configuration: 3 new session variables added (enable_bucketed_hash_agg, etc.) — appropriately gated.

FE-BE Protocol: New TPlanNodeType::BUCKETED_AGGREGATION_NODE = 37 properly handled in both FE (BucketedAggregationNode) and BE (pipeline_fragment_context.cpp).

Incompatible Changes: None — new node type, feature-gated, no existing behavior modified.

Observability: Profile counters added for hash table size, memory, build/expr/compute/emplace timers. Adequate.

Other observations:

simple_count optimization uses reinterpret_cast<UInt64&> on AggregateDataPtr to store counters in pointer bits — correct but subtle, add a comment explaining the aliasing.
is_fixed_key member added to AggregatedDataVariants but never read in the diff.
close() properly releases held CAS locks (prevents deadlock on LIMIT-induced early termination) — good.
Source-side merge correctly handles the two-level fan-in from N instances × 256 buckets.

Summary: 1 Critical, 1 Significant, 1 Major gap

Critical: Data race on shared probe_expr_ctxs (concurrent mutation of non-thread-safe VExprContext)
Significant: Memory reservation bypassed (get_reserve_mem_size returns 0)
Major gap: Zero test coverage for ~2700 lines of new code

github-actions · 2026-03-19T11:14:09Z

be/src/exec/operator/bucketed_aggregation_sink_operator.cpp

+        SCOPED_TIMER(_expr_timer);
+        for (size_t i = 0; i < key_size; ++i) {
+            int result_column_id = -1;
+            RETURN_IF_ERROR(shared_state.probe_expr_ctxs[i]->execute(block, &result_column_id));


Critical: Data race on shared probe_expr_ctxs

shared_state.probe_expr_ctxs[i]->execute() is called by ALL sink instances concurrently on the SAME shared VExprContext objects. VExprContext::execute() mutates _last_result_column_id (a non-atomic int) and potentially FunctionContext internal state.

The probe_expr_ctxs were cloned once in call_once (lines 88-94) into shared state, then used concurrently by all sink instances without per-instance cloning.

Note that _aggregate_evaluators are already correctly cloned per-instance (lines 113-115) with a comment explaining the same class of bug. The same treatment is needed here.

Fix: Each sink instance should clone its own probe_expr_ctxs in open() (similar to lines 113-115), and _execute_with_serialized_key should use the per-instance clones instead of shared_state.probe_expr_ctxs.

github-actions · 2026-03-19T11:14:09Z

be/src/exec/operator/bucketed_aggregation_sink_operator.h

+        return DataDistribution(ExchangeType::NOOP);
+    }
+
+    size_t get_reserve_mem_size(RuntimeState* state, bool eos) override { return 0; }


Significant: Memory reservation completely bypassed

get_reserve_mem_size() returns 0, which disables the pipeline memory reservation protocol for this operator. For comparison, the existing AggSinkOperatorX::get_reserve_mem_size() returns hash_table->estimate_memory(batch_size) + _memory_usage_last_executing.

With 256 hash tables per instance × N pipeline instances, hash table resizes can cause massive uncontrolled memory growth with no back-pressure mechanism. There is also:

No SCOPED_PEAK_MEM instrumentation (unlike the regular agg operator)

No spill path as a fallback

No _memory_sufficient_dependency wiring (per be/src/exec/AGENTS.md requirements)

Even if spill support is deferred, the reservation protocol should still provide accurate estimates so the scheduler can apply back-pressure before OOM.

BiteTheDDDDt · 2026-03-19T11:37:26Z

run buildall

BiteTheDDDDt · 2026-03-19T14:00:14Z

run buildall

hello-stephen · 2026-03-19T15:15:54Z

FE UT Coverage Report

Increment line coverage 34.83% (116/333) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-03-19T17:17:02Z

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.13% (26927/36821)
Line Coverage	56.68% (288868/509632)
Region Coverage	54.02% (240445/445129)
Branch Coverage	55.64% (103841/186634)

update plan update simple count and ph string map update fix update refact update adjust default value fmt fix fix update update update

BiteTheDDDDt · 2026-03-20T02:10:38Z

run buildall

hello-stephen · 2026-03-20T03:26:41Z

FE UT Coverage Report

Increment line coverage 34.73% (116/334) 🎉
Increment coverage report
Complete coverage report

BiteTheDDDDt · 2026-03-20T05:13:01Z

run buildall

hello-stephen · 2026-03-20T05:50:30Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	78.63% (1796/2284)
Line Coverage	64.36% (32263/50130)
Region Coverage	65.25% (16155/24760)
Branch Coverage	55.73% (8613/15456)

hello-stephen · 2026-03-20T06:24:20Z

FE UT Coverage Report

Increment line coverage 66.17% (221/334) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-03-20T09:09:07Z

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.71% (19858/37674)
Line Coverage	36.20% (185337/512013)
Region Coverage	32.50% (143572/441753)
Branch Coverage	33.70% (62861/186533)

hello-stephen · 2026-03-20T09:17:20Z

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.79% (26475/36877)
Line Coverage	54.63% (278777/510305)
Region Coverage	51.75% (230699/445805)
Branch Coverage	53.25% (99587/187023)

BiteTheDDDDt · 2026-03-20T13:28:14Z

run buildall

BiteTheDDDDt · 2026-03-22T12:28:07Z

run buildall

hello-stephen · 2026-03-22T13:41:48Z

FE UT Coverage Report

Increment line coverage 35.50% (120/338) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-03-22T13:49:37Z

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.68% (19847/37678)
Line Coverage	36.17% (185200/512061)
Region Coverage	32.47% (143438/441771)
Branch Coverage	33.68% (62822/186549)

hello-stephen · 2026-03-22T14:34:35Z

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.34% (27047/36881)
Line Coverage	56.79% (289829/510353)
Region Coverage	54.04% (240915/445823)
Branch Coverage	55.75% (104277/187039)

hello-stephen · 2026-03-22T14:37:56Z

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.34% (27047/36881)
Line Coverage	56.79% (289829/510353)
Region Coverage	54.04% (240915/445823)
Branch Coverage	55.75% (104277/187039)

hello-stephen · 2026-03-22T14:59:44Z

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.33% (27045/36881)
Line Coverage	56.79% (289851/510353)
Region Coverage	54.05% (240978/445823)
Branch Coverage	55.75% (104283/187039)

hello-stephen · 2026-03-22T15:00:12Z

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.33% (27045/36881)
Line Coverage	56.79% (289851/510353)
Region Coverage	54.05% (240978/445823)
Branch Coverage	55.75% (104283/187039)

Copilot AI review requested due to automatic review settings March 18, 2026 16:06

Copilot started reviewing on behalf of BiteTheDDDDt March 18, 2026 16:14 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

github-actions bot reviewed Mar 19, 2026

View reviewed changes

BiteTheDDDDt force-pushed the dev_0319_3 branch from 5e132d9 to 5773305 Compare March 19, 2026 11:37

BiteTheDDDDt added 2 commits March 20, 2026 10:00

bucket agg

7eb1881

update plan update simple count and ph string map update fix update refact update adjust default value fmt fix fix update update update

add fuzzy

142e1da

BiteTheDDDDt force-pushed the dev_0319_3 branch from 15e1a5a to 142e1da Compare March 20, 2026 02:10

for test

a9e1f35

fix test

b78bcbc

update

a32bfa5

BiteTheDDDDt changed the title ~~Dev 0319 3~~ [Feature](agg) support bucketed agg operator Mar 23, 2026

		op = std::make_shared<BucketedAggSourceOperatorX>(pool, tnode, next_operator_id(), descs);
		RETURN_IF_ERROR(cur_pipe->add_operator(op, _parallel_instances));

Conversation

BiteTheDDDDt commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thearas commented Mar 18, 2026

Uh oh!

BiteTheDDDDt commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

hello-stephen commented Mar 18, 2026

BE Regression && UT Coverage Report

Uh oh!

BiteTheDDDDt commented Mar 19, 2026

Uh oh!

BiteTheDDDDt commented Mar 19, 2026

Uh oh!

BiteTheDDDDt commented Mar 19, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Bucketed Hash Aggregation Review — Code Review Skill Checkpoint Analysis

Part 1.3 Critical Checkpoint Conclusions

Summary: 1 Critical, 1 Significant, 1 Major gap

Uh oh!

github-actions bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

BiteTheDDDDt commented Mar 19, 2026

Uh oh!

BiteTheDDDDt commented Mar 19, 2026

Uh oh!

hello-stephen commented Mar 19, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Mar 19, 2026

BE Regression && UT Coverage Report

Uh oh!

BiteTheDDDDt commented Mar 20, 2026

Uh oh!

hello-stephen commented Mar 20, 2026

FE UT Coverage Report

Uh oh!

BiteTheDDDDt commented Mar 20, 2026

Uh oh!

hello-stephen commented Mar 20, 2026

Cloud UT Coverage Report

Uh oh!

hello-stephen commented Mar 20, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Mar 20, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Mar 20, 2026

BE Regression && UT Coverage Report

Uh oh!

BiteTheDDDDt commented Mar 20, 2026

Uh oh!

BiteTheDDDDt commented Mar 22, 2026

Uh oh!

hello-stephen commented Mar 22, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Mar 22, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Mar 22, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Mar 22, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Mar 22, 2026

BiteTheDDDDt commented Mar 18, 2026 •

edited

Loading