Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27421][SQL][2.4] Fix filter for int column and value class java.lang.String when pruning partition column #30422

Closed
wants to merge 2 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Nov 19, 2020

This pr backport #30380 to branch-2.4.

What changes were proposed in this pull request?

This pr fix filter for int column and value class java.lang.String when pruning partition column.

How to reproduce this issue:

spark.sql("CREATE table test (name STRING) partitioned by (id int) STORED AS PARQUET")
spark.sql("CREATE VIEW test_view as select cast(id as string) as id, name from test")
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
20/11/15 06:19:01 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_partitions_by_filter : db=default tbl=test
20/11/15 06:19:01 INFO MetaStoreDirectSql: Unable to push down SQL filter: Cannot push down filter for int column and value class java.lang.String
20/11/15 06:19:01 ERROR SparkSQLDriver: Failed in [SELECT * FROM test_view WHERE id = '0']
java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK
 at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
 at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:745)
 at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
 at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
 at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
 at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
 at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:743)

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35929/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35929/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131325 has finished for PR 30422 at commit a97ce6a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Nov 19, 2020

retest this please.

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35934/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35934/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131329 has finished for PR 30422 at commit a97ce6a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35940/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35940/

@HyukjinKwon
Copy link
Member

Hm, the test failures in Jenkins look related. But it passed in GitHub Actions ...:

[info] - 0.13: getPartitionsByFilter: chunk in ('ab', 'ba') and ((cast(ds as string)>'20170102') (37 milliseconds)

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131336 has finished for PR 30422 at commit 29ddfe9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35949/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35949/

@HyukjinKwon
Copy link
Member

Merged to branch-2.4.

HyukjinKwon pushed a commit that referenced this pull request Nov 19, 2020
…a.lang.String when pruning partition column

This pr backport #30380 to branch-2.4.

### What changes were proposed in this pull request?

This pr fix filter for int column and value class java.lang.String when pruning partition column.

How to reproduce this issue:
```scala
spark.sql("CREATE table test (name STRING) partitioned by (id int) STORED AS PARQUET")
spark.sql("CREATE VIEW test_view as select cast(id as string) as id, name from test")
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
```
```
20/11/15 06:19:01 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_partitions_by_filter : db=default tbl=test
20/11/15 06:19:01 INFO MetaStoreDirectSql: Unable to push down SQL filter: Cannot push down filter for int column and value class java.lang.String
20/11/15 06:19:01 ERROR SparkSQLDriver: Failed in [SELECT * FROM test_view WHERE id = '0']
java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK
 at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
 at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:745)
 at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
 at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
 at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
 at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
 at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:743)
```

### Why are the changes needed?

Fix bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #30422 from wangyum/SPARK-27421-2.4.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@wangyum wangyum closed this Nov 19, 2020
@wangyum wangyum deleted the SPARK-27421-2.4 branch November 19, 2020 13:19
@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131345 has finished for PR 30422 at commit 29ddfe9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants