Push limit to leaf stage by default for DISTINCT / no-aggregate GROUP BY#18598
Push limit to leaf stage by default for DISTINCT / no-aggregate GROUP BY#18598yashmayya wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18598 +/- ##
============================================
+ Coverage 56.82% 64.29% +7.47%
- Complexity 7 1137 +1130
============================================
Files 2567 3335 +768
Lines 149066 206012 +56946
Branches 24103 32134 +8031
============================================
+ Hits 84700 132453 +47753
- Misses 57178 62906 +5728
- Partials 7188 10653 +3465
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
896b567 to
3048529
Compare
| } | ||
| } | ||
|
|
||
| // Group-by WITHOUT aggregate functions (DISTINCT or `GROUP BY col` with no agg calls) can always push the |
There was a problem hiding this comment.
I think we can push down the limit for the following scenarios:
- Distinct (no aggregates)
- Aggregates with order-by and order-by doesn't include aggregates
I don't follow why we cannot push down limit when there are multiple group keys
There was a problem hiding this comment.
In the current PR we're pushing down limit for distinct and group by without aggregate (with or without order by). We're pushing down limit for the multiple group keys case too - the condition here is to exclude queries with multiple group sets (ROLLUP / CUBE / GROUPING SETS - probably not supported today, but if they are in the future, this push down won't be logically valid for those cases).
3048529 to
0d584fc
Compare
For
SELECT DISTINCT col ... LIMIT nandGROUP BY col ... LIMIT nwithout aggregate functions, the multi-stage engine currently ships every distinct group key from each server to the intermediate stage before applying the limit. This pushes the limit (and order-by-on-key) down to the leaf aggregate by default, so each server emits at mostlimitgroups.This is safe for the no-aggregate case: each leaf produces complete group keys (no partial aggregation), so leaf-level trimming is exact for ordered queries and a valid subset for unordered ones. Queries with aggregate functions are unchanged — they remain gated behind the existing
is_enable_group_trimhint/config. Limited to a single group set, soROLLUP/CUBE/GROUPING SETSare excluded. Opt out per query with/*+ aggOptions(is_enable_group_trim='false') */.Note: for an unordered
... LIMIT(noORDER BY), the specific rows returned may differ from before — this is already unspecified in SQL and was non-deterministic previously.Covered by new planner plan tests (DISTINCT/GROUP BY + LIMIT, ORDER BY on key, HAVING, OFFSET, opt-out hint, and a multi-group-set negative case) and
GroupByOptionsTestintegration tests for paginated DISTINCT/GROUP BY.