Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38768][SQL] Remove Limit from plan if complete push down limit to data source #474

Merged

Conversation

chenzhx
Copy link

@chenzhx chenzhx commented Jun 1, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

beliefer and others added 4 commits June 1, 2022 21:00
…it to data source

### What changes were proposed in this pull request?
Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

### Why are the changes needed?
Improve performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…` and `pushTopN` of `PushDownUtils`

### What changes were proposed in this pull request?
`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

### Why are the changes needed?
Make DS V2 API more friendly to developers.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update comments.

### How was this patch tested?
N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…asonable

### What changes were proposed in this pull request?
Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage.

This PR updates some CASE WHEN queries to make them more reasonable.

### Why are the changes needed?
future-proof test coverage.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
N/A

Closes apache#36125 from beliefer/SPARK-37960_followup3.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@chenzhx chenzhx merged commit 0553594 into Kyligence:kyspark-3.2.x-4.x Jun 2, 2022
leejaywei pushed a commit that referenced this pull request Jul 14, 2022
… to data source (#474)

* [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source

Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

Improve performance.

'No'.
New feature.

Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils`

`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

Make DS V2 API more friendly to developers.

'No'.
Just update comments.

N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable

Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage.

This PR updates some CASE WHEN queries to make them more reasonable.

future-proof test coverage.

'No'.

N/A

Closes apache#36125 from beliefer/SPARK-37960_followup3.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* update spark version

Co-authored-by: Jiaan Geng <beliefer@163.com>
RolatZhang pushed a commit that referenced this pull request Aug 29, 2023
… to data source (#474)

* [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source

Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

Improve performance.

'No'.
New feature.

Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils`

`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

Make DS V2 API more friendly to developers.

'No'.
Just update comments.

N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

* [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable

Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage.

This PR updates some CASE WHEN queries to make them more reasonable.

future-proof test coverage.

'No'.

N/A

Closes apache#36125 from beliefer/SPARK-37960_followup3.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

* update spark version

Co-authored-by: Jiaan Geng <beliefer@163.com>
baibaichen pushed a commit that referenced this pull request Jun 18, 2024
### What changes were proposed in this pull request?
apache#41458 updated `numeric.sql.out` but not update `numeric.sql.out.java21`, this pr updated `numeric.sql.out.java21` for Java 21.

### Why are the changes needed?
Fix golden file for Java 21.

https://github.com/apache/spark/actions/runs/5362442727/jobs/9729315685

```
[info] - postgreSQL/numeric.sql *** FAILED *** (1 minute, 4 seconds)
[info]   postgreSQL/numeric.sql
[info]   Expected "...OLUMN_ARITY_MISMATCH[",
[info]     "sqlState" : "21S01",
[info]     "messageParameters" : {
[info]       "dataColumns" : "'id', 'id', 'val', 'val', '(val * val)'",
[info]       "reason" : "too many data columns",
[info]       "tableColumns" : "'id1', 'id2', 'result']",
[info]       "tableName" :...", but got "...OLUMN_ARITY_MISMATCH[.TOO_MANY_DATA_COLUMNS",
[info]     "sqlState" : "21S01",
[info]     "messageParameters" : {
[info]       "dataColumns" : "`id`, `id`, `val`, `val`, `(val * val)`",
[info]       "tableColumns" : "`id1`, `id2`, `result`]",
[info]       "tableName" :..." Result did not match for query #474
[info]   INSERT INTO num_result SELECT t1.id, t2.id, t1.val, t2.val, t1.val * t2.val
[info]       FROM num_data t1, num_data t2 (SQLQueryTestSuite.scala:848)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Assertions.assertResult(Assertions.scala:847)
[info]   at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
[info]   at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)
[info]   at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$readGoldenFileAndCompareResults$3(SQLQueryTestSuite.scala:848)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual checked using Java 21

Closes apache#41720 from LuciferYang/SPARK-43969-FOLLOWUP-2.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants