Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44340][SQL] Define the computing logic through PartitionEvaluator API and use it in WindowGroupLimitExec #41899

Closed
wants to merge 3 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Jul 8, 2023

What changes were proposed in this pull request?

WindowGroupLimitExec is updated to use the PartitionEvaluator API to do execution.

Why are the changes needed?

To define the computing logic and requires the caller side to explicitly list what needs to be serialized and sent to executors

Does this PR introduce any user-facing change?

'No'.
Just update the inner implementation.

How was this patch tested?

Test cases updated & running benchmark manually.

[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_311-b11 on Mac OS X 10.16
[info] Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
[info] Benchmark Top-K:                                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------------------------------------------------------
[info] ROW_NUMBER (PARTITION: , WindowGroupLimit: false)                        10622          11266         617          2.0         506.5       1.0X
[info] ROW_NUMBER (PARTITION: , WindowGroupLimit: true)                          1712           1744          19         12.2          81.6       6.2X
[info] ROW_NUMBER (PARTITION: PARTITION BY b, WindowGroupLimit: false)          23679          25107         NaN          0.9        1129.1       0.4X
[info] ROW_NUMBER (PARTITION: PARTITION BY b, WindowGroupLimit: true)            6381           6527          95          3.3         304.3       1.7X
[info] RANK (PARTITION: , WindowGroupLimit: false)                              11492          11631         106          1.8         548.0       0.9X
[info] RANK (PARTITION: , WindowGroupLimit: true)                                2675           2920         118          7.8         127.5       4.0X
[info] RANK (PARTITION: PARTITION BY b, WindowGroupLimit: false)                24208          24299          95          0.9        1154.3       0.4X
[info] RANK (PARTITION: PARTITION BY b, WindowGroupLimit: true)                  6347           6478          85          3.3         302.6       1.7X
[info] DENSE_RANK (PARTITION: , WindowGroupLimit: false)                        11288          11959         458          1.9         538.2       0.9X
[info] DENSE_RANK (PARTITION: , WindowGroupLimit: true)                          2684           2945         144          7.8         128.0       4.0X
[info] DENSE_RANK (PARTITION: PARTITION BY b, WindowGroupLimit: false)          24316          25130         711          0.9        1159.5       0.4X
[info] DENSE_RANK (PARTITION: PARTITION BY b, WindowGroupLimit: true)            6589           6925         383          3.2         314.2       1.6X

@github-actions github-actions bot added the SQL label Jul 8, 2023
@beliefer
Copy link
Contributor Author

ping @cloud-fan cc @vinodkc

@beliefer
Copy link
Contributor Author

cc @viirya

}

}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create the function in the match and then call `new WindowGroupLimitPartitionEvaluator(f) outside the match expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@viirya
Copy link
Member

viirya commented Jul 11, 2023

The CI failure looks unrelated.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 6c02bd0 Jul 12, 2023
@beliefer
Copy link
Contributor Author

@cloud-fan @viirya @vinodkc Thank you!

} else {
child.execute().mapPartitionsInternal { iter =>
val evaluator = evaluatorFactory.createEvaluator()
evaluator.eval(0, iter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer, Can you please raise a follow-up PR to handle the partition index as this #42185

cloud-fan pushed a commit that referenced this pull request Jul 31, 2023
… correctly for WindowGroupLimitExec,WindowExec and WindowInPandasExec

### What changes were proposed in this pull request?
This is a followup of #41899 and #41939, to set the partition index correctly even if it's not used for now. It also contains a few code cleanup.

### Why are the changes needed?
future-proof

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
existing tests

Closes #42208 from beliefer/SPARK-44340_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jul 31, 2023
…indowGroupLimitExec

### What changes were proposed in this pull request?
This is a followup of #41899, to set the partition index correctly even if it's not used for now. It also contains a few code cleanup.

### Why are the changes needed?
future-proof

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
existing tests

Closes #42233 from beliefer/SPARK-44340_followup_3.5.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
…tor API and use it in WindowGroupLimitExec

### What changes were proposed in this pull request?
`WindowGroupLimitExec` is updated to use the PartitionEvaluator API to do execution.

### Why are the changes needed?
To define the computing logic and requires the caller side to explicitly list what needs to be serialized and sent to executors

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
Test cases updated & running benchmark manually.

```
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_311-b11 on Mac OS X 10.16
[info] Intel(R) Core(TM) i7-9750H CPU  2.60GHz
[info] Benchmark Top-K:                                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------------------------------------------------------
[info] ROW_NUMBER (PARTITION: , WindowGroupLimit: false)                        10622          11266         617          2.0         506.5       1.0X
[info] ROW_NUMBER (PARTITION: , WindowGroupLimit: true)                          1712           1744          19         12.2          81.6       6.2X
[info] ROW_NUMBER (PARTITION: PARTITION BY b, WindowGroupLimit: false)          23679          25107         NaN          0.9        1129.1       0.4X
[info] ROW_NUMBER (PARTITION: PARTITION BY b, WindowGroupLimit: true)            6381           6527          95          3.3         304.3       1.7X
[info] RANK (PARTITION: , WindowGroupLimit: false)                              11492          11631         106          1.8         548.0       0.9X
[info] RANK (PARTITION: , WindowGroupLimit: true)                                2675           2920         118          7.8         127.5       4.0X
[info] RANK (PARTITION: PARTITION BY b, WindowGroupLimit: false)                24208          24299          95          0.9        1154.3       0.4X
[info] RANK (PARTITION: PARTITION BY b, WindowGroupLimit: true)                  6347           6478          85          3.3         302.6       1.7X
[info] DENSE_RANK (PARTITION: , WindowGroupLimit: false)                        11288          11959         458          1.9         538.2       0.9X
[info] DENSE_RANK (PARTITION: , WindowGroupLimit: true)                          2684           2945         144          7.8         128.0       4.0X
[info] DENSE_RANK (PARTITION: PARTITION BY b, WindowGroupLimit: false)          24316          25130         711          0.9        1159.5       0.4X
[info] DENSE_RANK (PARTITION: PARTITION BY b, WindowGroupLimit: true)            6589           6925         383          3.2         314.2       1.6X
```

Closes apache#41899 from beliefer/SPARK-44340.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
… correctly for WindowGroupLimitExec,WindowExec and WindowInPandasExec

### What changes were proposed in this pull request?
This is a followup of apache#41899 and apache#41939, to set the partition index correctly even if it's not used for now. It also contains a few code cleanup.

### Why are the changes needed?
future-proof

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
existing tests

Closes apache#42208 from beliefer/SPARK-44340_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants