Skip to content

ORCA: add gp_orca_use_streaming_hashagg GUC to control streaming hashagg#1681

Merged
yjhjstz merged 1 commit intoapache:mainfrom
yjhjstz:streaming_hashagg
Apr 21, 2026
Merged

ORCA: add gp_orca_use_streaming_hashagg GUC to control streaming hashagg#1681
yjhjstz merged 1 commit intoapache:mainfrom
yjhjstz:streaming_hashagg

Conversation

@yjhjstz
Copy link
Copy Markdown
Member

@yjhjstz yjhjstz commented Apr 17, 2026

TPC-H SF=10 Performance Change Summary

Overall Result

Total runtime: 274,201 ms → 245,672 ms (−10.4%, saved 28.5s)

  • 10 queries show plan changes
  • 1 regression (Q03), 9 improvements

Root Cause of Plan Changes

All plan diffs follow the same pattern: Streaming Partial HashAggregateHashAggregate.

Per-Query Changes

Query Baseline New Change Key Driver
Q17 60,282 ms 30,396 ms −49.6% Spill 1,375 MB → 697 MB; memory-wanted 47.6 GB → 3.8 GB
Q07 8,124 ms 7,123 ms −12.3% /
Q08 7,628 ms 7,250 ms −4.9% /
Q09 14,508 ms 13,886 ms −4.3% /
Q05 7,756 ms 7,448 ms −4.0% /
Q03 (regression) 19,977 ms 25,561 ms +28.0% Spill 7 MB → 471 MB

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@my-ship-it
Copy link
Copy Markdown
Contributor

Given the motivation is observed plan changes on TPC-H, an explain-plan test toggling the GUC and asserting Streaming Partial HashAggregate → HashAggregate would lock in the behavior and prevent silent regressions. No regression test is added.

@my-ship-it
Copy link
Copy Markdown
Contributor

Commit message ends with hash…agg (literal ellipsis from the editor clipping). Should be rewritten as a clean title ≤72 chars, e.g. ORCA: add optimizer_use_streaming_hashagg GUC.

Comment thread src/backend/gpopt/config/CConfigParamMapping.cpp Outdated
Comment thread src/backend/gpopt/config/CConfigParamMapping.cpp
ORCA unconditionally sets stream_safe=true for all local HashAggs in
FLocalHashAggStreamSafe, so the existing gp_use_streaming_hashagg GUC
(which is only read by the Postgres planner path in cdbgroupingpaths.c)
has no effect when optimizer=on. There was no way to disable streaming
hash agg for ORCA plans.

Introduce optimizer_use_streaming_hashagg (default on) and wire it
through the standard CConfigParamMapping path: map it (negated) to a
new EopttraceDisableStreamingHashAgg traceflag, and check that
traceflag in FLocalHashAggStreamSafe. When the GUC is off, ORCA emits
a non-streaming Partial HashAggregate that spills to disk and fully
deduplicates.
@yjhjstz yjhjstz force-pushed the streaming_hashagg branch from 7d6f31e to aae5613 Compare April 20, 2026 13:33
Copy link
Copy Markdown
Contributor

@my-ship-it my-ship-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@yjhjstz yjhjstz merged commit c9100df into apache:main Apr 21, 2026
43 checks passed
oppenheimer01 pushed a commit to oppenheimer01/cloudberrydb that referenced this pull request May 3, 2026
ORCA unconditionally sets stream_safe=true for all local HashAggs in
FLocalHashAggStreamSafe, so the existing gp_use_streaming_hashagg GUC
(which is only read by the Postgres planner path in cdbgroupingpaths.c)
has no effect when optimizer=on. There was no way to disable streaming
hash agg for ORCA plans.

Introduce optimizer_use_streaming_hashagg (default on) and wire it
through the standard CConfigParamMapping path: map it (negated) to a
new EopttraceDisableStreamingHashAgg traceflag, and check that
traceflag in FLocalHashAggStreamSafe. When the GUC is off, ORCA emits
a non-streaming Partial HashAggregate that spills to disk and fully
deduplicates.
oppenheimer01 pushed a commit that referenced this pull request May 4, 2026
ORCA unconditionally sets stream_safe=true for all local HashAggs in
FLocalHashAggStreamSafe, so the existing gp_use_streaming_hashagg GUC
(which is only read by the Postgres planner path in cdbgroupingpaths.c)
has no effect when optimizer=on. There was no way to disable streaming
hash agg for ORCA plans.

Introduce optimizer_use_streaming_hashagg (default on) and wire it
through the standard CConfigParamMapping path: map it (negated) to a
new EopttraceDisableStreamingHashAgg traceflag, and check that
traceflag in FLocalHashAggStreamSafe. When the GUC is off, ORCA emits
a non-streaming Partial HashAggregate that spills to disk and fully
deduplicates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants