New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37099][SQL] Optimize the filter based on rank-like window function by reduce not required rows #38745
Conversation
…tion by reduce not required rows
ping @zhengruifeng cc @cloud-fan |
@@ -87,7 +88,8 @@ case class WindowExec( | |||
windowExpression: Seq[NamedExpression], | |||
partitionSpec: Seq[Expression], | |||
orderSpec: Seq[SortOrder], | |||
child: SparkPlan) | |||
child: SparkPlan, | |||
groupLimitInfo: Option[(Int, Expression)] = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK to add a physical node, but the amount of code is a little large, and the filtering and reduction of data occur a little late.
This PR has been replaced by #38799 |
What changes were proposed in this pull request?
Sometimes, the filter condition compares rank-like(e.g. row_number, rank, dense_rank) window functions with number. For example,
We can extract the limit value 5 for window group and skip rows of window group in
WindowExec
.In short, it supports following pattern:
For these three rank-like functions (row_number|rank|dense_rank), the rank of a key computed on dataset always <= its total rows of whole dataset,so we can safely discard rows with rank > k, anywhere.
This PR also take over some functions from #34367.
Why are the changes needed?
Improve the performance.
Micro Benchmark
TPC-DS data size: 2TB.
This improvement is valid for tpcds q67 and no regression for other test cases.
Does this PR introduce any user-facing change?
'No'.
Just update the inner implementation.
How was this patch tested?
New tests.