Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24066][SQL]Add new optimization rule to eliminate unnecessary sort by exchanged adjacent Window expressions #22945

Closed
wants to merge 1 commit into from

Conversation

heary-cao
Copy link
Contributor

@heary-cao heary-cao commented Nov 5, 2018

What changes were proposed in this pull request?

Currently, when two adjacent window functions have the same partition and the same intersection of order,
There will be two sorted after shuffling, which is not necessary. This PR adds a new optimization rule to eliminate unnecessary sort by exchanged adjacent Window expressions.

For example:

val df = Seq(("a", "p1", 10.0, 20.0, 30.0), ("a", "p2", 20.0, 10.0, 40.0)).toDF("key", "value", "value1", "value2", "value3").select($"key", sum("value1").over(Window.partitionBy("key").orderBy("value")), max("value2").over(Window.partitionBy("key").orderBy("value", "value1")), avg("value3").over(Window.partitionBy("key").orderBy("value", "value1", "value2"))).queryExecution.executedPlan
println(df)

Before this PR:

*(5) Project key#16, sum(value1) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST unspecifiedframe$())#29, max(value2) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST unspecifiedframe$())#30, avg(value3) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST, value2 ASC NULLS FIRST unspecifiedframe$())#31
 +- Window max(value2#19) windowspecdefinition(key#16, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS max(value2) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST unspecifiedframe$())#30, key#16, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST
    +- *(4) Project key#16, value1#18, value#17, value2#19, sum(value1) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST unspecifiedframe$())#29, avg(value3) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST, value2 ASC NULLS FIRST unspecifiedframe$())#31
       +- Window avg(value3#20) windowspecdefinition(key#16, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST, value2#19 ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS avg(value3) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST, value2 ASC NULLS FIRST unspecifiedframe$())#31, key#16, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST, value2#19 ASC NULLS FIRST
          +- *(3) Sort key#16 ASC NULLS FIRST, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST, value2#19 ASC NULLS FIRST, false, 0
             +- Window sum(value1#18) windowspecdefinition(key#16, value#17 ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS sum(value1) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST unspecifiedframe$())#29, key#16, value#17 ASC NULLS FIRST
                +- *(2) Sort key#16 ASC NULLS FIRST, value#17 ASC NULLS FIRST, false, 0
                   +- Exchange hashpartitioning(key#16, 5)
                      +- *(1) Project _1#5 AS key#16, _3#7 AS value1#18, _2#6 AS value#17, _4#8 AS value2#19, _5#9 AS value3#20
                         +- LocalTableScan _1#5, _2#6, _3#7, _4#8, _5#9

After this PR:

*(5) Project key#16, sum(value1) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST unspecifiedframe$())#29, max(value2) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST unspecifiedframe$())#30, avg(value3) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST, value2 ASC NULLS FIRST unspecifiedframe$())#31
 +- Window sum(value1#18) windowspecdefinition(key#16, value#17 ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS sum(value1) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST unspecifiedframe$())#29, key#16, value#17 ASC NULLS FIRST
    +- *(4) Project key#16, value1#18, value#17, avg(value3) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST, value2 ASC NULLS FIRST unspecifiedframe$())#31, max(value2) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST unspecifiedframe$())#30
       +- Window max(value2#19) windowspecdefinition(key#16, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS max(value2) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST unspecifiedframe$())#30, key#16, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST
          +- *(3) Project key#16, value1#18, value#17, value2#19, avg(value3) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST, value2 ASC NULLS FIRST unspecifiedframe$())#31
             +- Window avg(value3#20) windowspecdefinition(key#16, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST, value2#19 ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS avg(value3) OVER (PARTITION BY key ORDER BY value ASC NULLS FIRST, value1 ASC NULLS FIRST, value2 ASC NULLS FIRST unspecifiedframe$())#31, key#16, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST, value2#19 ASC NULLS FIRST
                +- *(2) Sort key#16 ASC NULLS FIRST, value#17 ASC NULLS FIRST, value1#18 ASC NULLS FIRST, value2#19 ASC NULLS FIRST, false, 0
                   +- Exchange hashpartitioning(key#16, 5)
                      +- *(1) Project _1#5 AS key#16, _3#7 AS value1#18, _2#6 AS value#17, _4#8 AS value2#19, _5#9 AS value3#20
                         +- LocalTableScan _1#5, _2#6, _3#7, _4#8, _5#9

How was this patch tested?

add new unit tested

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@heary-cao
Copy link
Contributor Author

heary-cao commented Nov 6, 2018

cc @cloud-fan @gatorsmile

@github-actions
Copy link

github-actions bot commented Jan 5, 2020

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just
a way of keeping the PR queue manageable.

If you'd like to revive this PR, please reopen it!

@github-actions github-actions bot added the Stale label Jan 5, 2020
@github-actions github-actions bot closed this Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants