Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for pushLimit and pushTopN of PushDownUtils #36092

Closed

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Apr 7, 2022

What changes were proposed in this pull request?

pushLimit and pushTopN of PushDownUtils returns tuple of boolean. It will be good to explain what the boolean value represents.

Why are the changes needed?

Make DS V2 API more friendly to developers.

Does this PR introduce any user-facing change?

'No'.
Just update comments.

How was this patch tested?

N/A

@github-actions github-actions bot added the SQL label Apr 7, 2022
@beliefer
Copy link
Contributor Author

beliefer commented Apr 9, 2022

ping @huaxingao cc @cloud-fan

*
* @return the tuple of Boolean. The first Boolean value represents whether to push down, and
* the second Boolean value represents whether to push down partially which means to
* keep the `Limit`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be a little more clear and say something like
...the second Boolean value represents whether to push down partially, which means Spark will keep the Limit and do it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

*
* @return the tuple of Boolean. The first Boolean value represents whether to push down, and
* the second Boolean value represents whether to push down partially which means to
* keep the `Sort` and `Limit`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as the above

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master/3.3.

dongjoon-hyun pushed a commit that referenced this pull request Apr 11, 2022
…` and `pushTopN` of `PushDownUtils`

### What changes were proposed in this pull request?
`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

### Why are the changes needed?
Make DS V2 API more friendly to developers.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update comments.

### How was this patch tested?
N/A

Closes #36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit c4397cb)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@beliefer
Copy link
Contributor Author

@dongjoon-hyun Thank you. @huaxingao Thank you for your review.

chenzhx pushed a commit to chenzhx/spark that referenced this pull request Jun 1, 2022
…` and `pushTopN` of `PushDownUtils`

### What changes were proposed in this pull request?
`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

### Why are the changes needed?
Make DS V2 API more friendly to developers.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update comments.

### How was this patch tested?
N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
chenzhx added a commit to Kyligence/spark that referenced this pull request Jun 2, 2022
… to data source (#474)

* [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source

### What changes were proposed in this pull request?
Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

### Why are the changes needed?
Improve performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils`

### What changes were proposed in this pull request?
`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

### Why are the changes needed?
Make DS V2 API more friendly to developers.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update comments.

### How was this patch tested?
N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable

### What changes were proposed in this pull request?
Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage.

This PR updates some CASE WHEN queries to make them more reasonable.

### Why are the changes needed?
future-proof test coverage.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
N/A

Closes apache#36125 from beliefer/SPARK-37960_followup3.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* update spark version

Co-authored-by: Jiaan Geng <beliefer@163.com>
leejaywei pushed a commit to Kyligence/spark that referenced this pull request Jul 14, 2022
… to data source (#474)

* [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source

Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

Improve performance.

'No'.
New feature.

Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils`

`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

Make DS V2 API more friendly to developers.

'No'.
Just update comments.

N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable

Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage.

This PR updates some CASE WHEN queries to make them more reasonable.

future-proof test coverage.

'No'.

N/A

Closes apache#36125 from beliefer/SPARK-37960_followup3.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* update spark version

Co-authored-by: Jiaan Geng <beliefer@163.com>
RolatZhang pushed a commit to Kyligence/spark that referenced this pull request Aug 29, 2023
… to data source (#474)

* [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source

Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

Improve performance.

'No'.
New feature.

Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils`

`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

Make DS V2 API more friendly to developers.

'No'.
Just update comments.

N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable

Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage.

This PR updates some CASE WHEN queries to make them more reasonable.

future-proof test coverage.

'No'.

N/A

Closes apache#36125 from beliefer/SPARK-37960_followup3.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* update spark version

Co-authored-by: Jiaan Geng <beliefer@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants