-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add support for defining custom window frame bounds for window functions #14249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for defining custom window frame bounds for window functions #14249
Conversation
1018bcf to
640ad27
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14249 +/- ##
============================================
+ Coverage 61.75% 63.78% +2.03%
- Complexity 207 1536 +1329
============================================
Files 2436 2627 +191
Lines 133233 144844 +11611
Branches 20636 22187 +1551
============================================
+ Hits 82274 92385 +10111
- Misses 44911 45647 +736
- Partials 6048 6812 +764
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
640ad27 to
a445cb0
Compare
| if (_partitionByOnly) { | ||
| return processPartitionOnlyRows(rows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd initially removed this optimization to reduce clutter since there are a lot of different cases being handled in the new function implementation. However, on second thoughts, the optimization to avoid key computation (among other things) for each row might be significant enough to be worth retaining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the optimization is still applied to windows without ORDER BY, since Calcite forces the window frame to be RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING for such windows (and we do avoid the per row key computation for RANGE/ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING). So the only other case is where the partition keys and order by keys are identical.
f1425c0 to
377fecd
Compare
377fecd to
6e95ca0
Compare
…indow.aggregate to org.apache.pinot.query.runtime.operator.window
| @Test | ||
| public void testWindowFunctionsWithCustomWindowFrame() { | ||
| String queryWithDefaultWindow = "SELECT col1, col2, RANK() OVER (PARTITION BY col1 ORDER BY col2) FROM a"; | ||
| _queryEnvironment.planQuery(queryWithDefaultWindow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a complete test for a query ? The expectation is that planning wont throw an exception ?
Also the same test contains queries that will throw a parse exception ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a complete test for a query
The queries aren't actually executed, just validated, compiled and optimized.
The expectation is that planning wont throw an exception ?
Yes. I can change it to an assertion on QueryEnvironment.canCompileQuery to make that more clear perhaps.
Also the same test contains queries that will throw a parse exception ?
Yes.
| } | ||
|
|
||
| @Test | ||
| public void testWindowFunction() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is added to check if queries with these window functions execute ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah basically to verify the end to end flow (query planning + execution with runtime operators) works without errors.
|
Superseded by #14273. |
FIRST_VALUE/LAST_VALUEassume that the window frame is alwaysROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGeven though the default window frame as per standard SQL isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Furthermore, support for defining the lower bound explicitly asUNBOUNDED PRECEDING/CURRENT ROW/n FOLLOWING/n PRECEDINGand the upper bound asUNBOUNDED FOLLOWING/CURRENT ROW/n FOLLOWING/n PRECEDINGdoes not exist.ROWSwindow frames, and also adds support forUNBOUNDED PRECEDING/CURRENT ROW/UNBOUNDED FOLLOWINGbounds forRANGEwindow frames. There are a ton of edge cases to be handled here but this patch attempts to add test cases to cover most of these scenarios.ROWSandRANGEbased window frame bounds, whereas Postgres also supportsGROUPS.RANGEbased window frames, another important future enhancement is to optimize the performance ofROWSbased window frames for aggregate window functions where both the lower and upper bounds are offset based / current row. Since the changes in this patch are built over the existing framework for window functions where a "merger" is used to merge values for aggregate window functions, it isn't possible to use a sliding window based algorithm to efficiently compute aggregates for windows. This will require more significant changes to the framework but is critical to ensure performant computations especially for larger windows. Optimizations have been added in this patch to ensure that aggregation window functions over window frames withUNBOUNDED PRECEDINGlower bound orUNBOUNDED FOLLOWINGupper bound are computed efficiently.SUM,COUNT,MIN,MAXetc.) andFIRST_VALUE/LAST_VALUE. The other window functions currently supported by Pinot (LAG,LEAD,RANK,DENSE_RANK,ROW_NUMBER) don't support custom window frame bounds and Calcite ensures that during query planning.UNBOUNDED FOLLOWING/ upper bound isn'tUNBOUNDED PRECEDING, lower bound isn'tUNBOUNDED FOLLOWINGif upper bound isUNBOUNDED PRECEDINGand vice versa etc.