Skip to content

Conversation

@beliefer
Copy link
Contributor

What changes were proposed in this pull request?

The current implement of last_value over unbounded window frame will execute updateExpressions multiple times.
In fact, last_value only execute updateExpressions once on the last row.

Why are the changes needed?

Improve performance for last_value over unbounded window frame

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

Jenkins test.

@github-actions github-actions bot added the SQL label Jan 27, 2021
@AngersZhuuuu
Copy link
Contributor

Do you have do some benchmark? I think with a benchmark result is good.

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jan 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39167/

@SparkQA
Copy link

SparkQA commented Jan 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39174/

@SparkQA
Copy link

SparkQA commented Jan 28, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39174/

@SparkQA
Copy link

SparkQA commented Jan 28, 2021

Test build #134581 has finished for PR 31356 at commit 2da29d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 28, 2021

Test build #134586 has finished for PR 31356 at commit a10115c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

cc @cloud-fan

if (jumpToEnd) {
var lastRow = EmptyRow
while (iterator.hasNext) {
lastRow = iterator.next()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any better way to get the last element in this case than iterate over all of them? maybe not given the unusual nature of the dat backing ExternalAppendOnlyUnsafeRowArray

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExternalAppendOnlyUnsafeRowArray cannot get the last element directly.

@cloud-fan
Copy link
Contributor

cloud-fan commented Mar 1, 2021

It's a bit of waste to optimize for a single case last_value. Do we have other functions to get the n-th file from the end?

@beliefer
Copy link
Contributor Author

beliefer commented Mar 2, 2021

I can't find any other fuctions like last_value now. Yes, it's a bit waste.

new UnboundedWindowFunctionFrame(target, processor)
}

case ("AGGREGATE_LAST", _, UnboundedPreceding, UnboundedFollowing, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we can't use one frame to evaluate last_value and other aggregate functions.

TBH I'm not sure if it's worth extending the framework for such a small optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean I understand. It is not worthwhile to optimize the last_value alone in the framework.
I also feel that way. If we can find other functions that are the same as last_value in the future, then we can do it.

@srowen srowen closed this Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants