-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Custom window frame support extended to built-in window functions #4078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom window frame support extended to built-in window functions #4078
Conversation
…ynnada-ai/arrow-datafusion into feature/builtin_window_running
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mustafasrepo -- I don't have time to review this one today but I will put it on my queue for tomorrow
cc @jimexist
|
I again ran out of time today but it is on my list for first thing tomorrow |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @mustafasrepo and @ozankabak
I went through this code quite carefully and it is really nice and easy to follow 🏅
In case it is not obvious to others reading this PR, this PR adds support for the ROWS BETWEEN 1 PRECEDING and 2 FOLLOWING window clauses.
I tried it out using some simple data and postgres:
postgres=# create table foo as values (1,2), (3,4), (3,2), (2,1), (null, 0);
SELECT 5
postgres=# select first_value(column1) over (order by column2), last_value(column1) over (order by column2) from foo;
first_value | last_value
-------------+------------
|
| 2
| 3
| 3
| 3
(5 rows)Before this change:
❯ select first_value(column1) over (order by column2 ROWS BETWEEN 1 PRECEDING and 2 FOLLOWING) from foo;
+--------------------------+
| FIRST_VALUE(foo.column1) |
+--------------------------+
| |
| |
| |
| |
| |
+--------------------------+
5 rows in set. Query took 0.003 seconds.After this PR it gets the same answer as postgres 👍
❯ select first_value(column1) over (order by column2 ROWS BETWEEN 1 PRECEDING and 2 FOLLOWING) from foo;
+--------------------------+
| FIRST_VALUE(foo.column1) |
+--------------------------+
| |
| |
| 2 |
| 1 |
| 3 |
+--------------------------+| self.window_frame.clone() | ||
| }; | ||
| let mut row_wise_results: Vec<ScalarValue> = vec![]; | ||
| for partition_range in &partition_points { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reorganization is very nice and makes the code much easier to read . Very nice 👍
|
|
||
| /// We use start and end bounds to calculate current row's starting and ending range. | ||
| /// This function supports different modes, but we currently do not support window calculation for GROUPS inside window frames. | ||
| fn calculate_range( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this logic to be very straightforward and easy to follow 👍
| Ok((start, end)) | ||
| } | ||
| WindowFrameUnits::Groups => Err(DataFusionError::NotImplemented( | ||
| "Window frame for groups is not implemented".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
I'll plan to merge this PR tomorrow unless there are other comments raised. |
|
Again, really nice work @mustafasrepo and @ozankabak -- thank you very much |
|
Benchmark runs are scheduled for baseline = 6d00bd9 and contender = 238e179. 238e179 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…ache#4078) * refactor running window * remove unnecessary changes * implement suggested changes * Minor refactors to improve readability * Refactor according to reviews * minor changes * Remove unnecessary into/collect calls * convert evaluate_inside_range result to ScalarValue * Simplify evaluate function of BuiltInWindowExpr Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
…ache#4078) * refactor running window * remove unnecessary changes * implement suggested changes * Minor refactors to improve readability * Refactor according to reviews * minor changes * Remove unnecessary into/collect calls * convert evaluate_inside_range result to ScalarValue * Simplify evaluate function of BuiltInWindowExpr Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Which issue does this PR close?
Closes #4076.
Rationale for this change
With this change, we can now run built-in window functions with custom window frames such as queries in the form
What changes are included in this PR?
Are there any user-facing changes?
N.A