-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-34257][SQL] Improve performance for last_value over unbounded window frame #31356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Do you have do some benchmark? I think with a benchmark result is good. |
|
ok to test |
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134581 has finished for PR 31356 at commit
|
|
Test build #134586 has finished for PR 31356 at commit
|
|
cc @cloud-fan |
| if (jumpToEnd) { | ||
| var lastRow = EmptyRow | ||
| while (iterator.hasNext) { | ||
| lastRow = iterator.next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any better way to get the last element in this case than iterate over all of them? maybe not given the unusual nature of the dat backing ExternalAppendOnlyUnsafeRowArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExternalAppendOnlyUnsafeRowArray cannot get the last element directly.
|
It's a bit of waste to optimize for a single case |
|
I can't find any other fuctions like |
| new UnboundedWindowFunctionFrame(target, processor) | ||
| } | ||
|
|
||
| case ("AGGREGATE_LAST", _, UnboundedPreceding, UnboundedFollowing, _) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that we can't use one frame to evaluate last_value and other aggregate functions.
TBH I'm not sure if it's worth extending the framework for such a small optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean I understand. It is not worthwhile to optimize the last_value alone in the framework.
I also feel that way. If we can find other functions that are the same as last_value in the future, then we can do it.
What changes were proposed in this pull request?
The current implement of
last_valueover unbounded window frame will executeupdateExpressionsmultiple times.In fact,
last_valueonly executeupdateExpressionsonce on the last row.Why are the changes needed?
Improve performance for last_value over unbounded window frame
Does this PR introduce any user-facing change?
'No'.
How was this patch tested?
Jenkins test.