Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33372][SQL] Fix InSet bucket pruning #30279

Closed
wants to merge 1 commit into from
Closed

[SPARK-33372][SQL] Fix InSet bucket pruning #30279

wants to merge 1 commit into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Nov 6, 2020

What changes were proposed in this pull request?

This pr fix InSet bucket pruning because of it's values should not be Literal:

} else if (newList.length > SQLConf.get.optimizerInSetConversionThreshold) {
val hSet = newList.map(e => e.eval(EmptyRow))
InSet(v, HashSet() ++ hSet)

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test and manual test:

spark.sql("select id as a, id as b from range(10000)").write.bucketBy(100, "a").saveAsTable("t")
spark.sql("select * from t where a in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)").show
Before this PR After this PR
image image

@github-actions github-actions bot added the SQL label Nov 6, 2020
@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35337/

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35337/

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Test build #130728 has finished for PR 30279 at commit 203025e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 69799c5 Nov 9, 2020
cloud-fan pushed a commit that referenced this pull request Nov 9, 2020
### What changes were proposed in this pull request?

This pr fix `InSet` bucket pruning because of it's values should not be `Literal`:
https://github.com/apache/spark/blob/cbd3fdea62dab73fc4a96702de8fd1f07722da66/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L253-L255

### Why are the changes needed?

Fix bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test and manual test:

```scala
spark.sql("select id as a, id as b from range(10000)").write.bucketBy(100, "a").saveAsTable("t")
spark.sql("select * from t where a in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)").show
```

Before this PR | After this PR
-- | --
![image](https://user-images.githubusercontent.com/5399861/98380788-fb120980-2083-11eb-8fae-4e21ad873e9b.png) | ![image](https://user-images.githubusercontent.com/5399861/98381095-5ba14680-2084-11eb-82ca-2d780c85305c.png)

Closes #30279 from wangyum/SPARK-33372.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 69799c5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@wangyum wangyum deleted the SPARK-33372 branch November 9, 2020 09:01
@maropu
Copy link
Member

maropu commented Nov 9, 2020

late LGTM. Nice catch.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 9, 2020

Hi, @cloud-fan and @wangyum and @maropu .
Can we have this at branch-2.4 because the original code came from SPARK-23803 (Apache Spark 2.4.0), too?

@cloud-fan
Copy link
Contributor

@wangyum can you open a PR for 2.4?

wangyum added a commit that referenced this pull request Nov 10, 2020
### What changes were proposed in this pull request?

This is a backport of #30279.

This pr fix `InSet` bucket pruning because of it's values should not be `Literal`:
https://github.com/apache/spark/blob/cbd3fdea62dab73fc4a96702de8fd1f07722da66/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L253-L255

### Why are the changes needed?

Fix bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test

Closes #30308 from wangyum/SPARK-33372-2.4.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants