Skip to content

Push limit to leaf stage by default for DISTINCT / no-aggregate GROUP BY#18598

Open
yashmayya wants to merge 1 commit into
apache:masterfrom
yashmayya:default-leaf-limit-pushdown-distinct
Open

Push limit to leaf stage by default for DISTINCT / no-aggregate GROUP BY#18598
yashmayya wants to merge 1 commit into
apache:masterfrom
yashmayya:default-leaf-limit-pushdown-distinct

Conversation

@yashmayya
Copy link
Copy Markdown
Contributor

@yashmayya yashmayya commented May 27, 2026

For SELECT DISTINCT col ... LIMIT n and GROUP BY col ... LIMIT n without aggregate functions, the multi-stage engine currently ships every distinct group key from each server to the intermediate stage before applying the limit. This pushes the limit (and order-by-on-key) down to the leaf aggregate by default, so each server emits at most limit groups.

This is safe for the no-aggregate case: each leaf produces complete group keys (no partial aggregation), so leaf-level trimming is exact for ordered queries and a valid subset for unordered ones. Queries with aggregate functions are unchanged — they remain gated behind the existing is_enable_group_trim hint/config. Limited to a single group set, so ROLLUP / CUBE / GROUPING SETS are excluded. Opt out per query with /*+ aggOptions(is_enable_group_trim='false') */.

Note: for an unordered ... LIMIT (no ORDER BY), the specific rows returned may differ from before — this is already unspecified in SQL and was non-deterministic previously.

Covered by new planner plan tests (DISTINCT/GROUP BY + LIMIT, ORDER BY on key, HAVING, OFFSET, opt-out hint, and a multi-group-set negative case) and GroupByOptionsTest integration tests for paginated DISTINCT/GROUP BY.

@yashmayya yashmayya added backward-incompat Introduces a backward-incompatible API or behavior change multi-stage Related to the multi-stage query engine and removed backward-incompat Introduces a backward-incompatible API or behavior change labels May 27, 2026
@yashmayya yashmayya requested a review from Jackie-Jiang May 27, 2026 18:44
@yashmayya yashmayya added the enhancement Improvement to existing functionality label May 27, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.29%. Comparing base (c2561d4) to head (0d584fc).

Files with missing lines Patch % Lines
...el/rules/PinotAggregateExchangeNodeInsertRule.java 75.00% 0 Missing and 1 partial ⚠️
...t/calcite/rel/rules/PinotLogicalAggregateRule.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18598      +/-   ##
============================================
+ Coverage     56.82%   64.29%   +7.47%     
- Complexity        7     1137    +1130     
============================================
  Files          2567     3335     +768     
  Lines        149066   206012   +56946     
  Branches      24103    32134    +8031     
============================================
+ Hits          84700   132453   +47753     
- Misses        57178    62906    +5728     
- Partials       7188    10653    +3465     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (?)
java-21 64.29% <60.00%> (+7.47%) ⬆️
temurin 64.29% <60.00%> (+7.47%) ⬆️
unittests 64.29% <60.00%> (+7.47%) ⬆️
unittests1 56.78% <60.00%> (-0.03%) ⬇️
unittests2 36.82% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya force-pushed the default-leaf-limit-pushdown-distinct branch from 896b567 to 3048529 Compare May 27, 2026 20:01
}
}

// Group-by WITHOUT aggregate functions (DISTINCT or `GROUP BY col` with no agg calls) can always push the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can push down the limit for the following scenarios:

  • Distinct (no aggregates)
  • Aggregates with order-by and order-by doesn't include aggregates

I don't follow why we cannot push down limit when there are multiple group keys

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current PR we're pushing down limit for distinct and group by without aggregate (with or without order by). We're pushing down limit for the multiple group keys case too - the condition here is to exclude queries with multiple group sets (ROLLUP / CUBE / GROUPING SETS - probably not supported today, but if they are in the future, this push down won't be logically valid for those cases).

@yashmayya yashmayya force-pushed the default-leaf-limit-pushdown-distinct branch from 3048529 to 0d584fc Compare May 28, 2026 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants