[SPARK-27421][SQL][2.4] Fix filter for int column and value class java.lang.String when pruning partition column #30422

wangyum · 2020-11-19T06:34:21Z

This pr backport #30380 to branch-2.4.

What changes were proposed in this pull request?

This pr fix filter for int column and value class java.lang.String when pruning partition column.

How to reproduce this issue:

spark.sql("CREATE table test (name STRING) partitioned by (id int) STORED AS PARQUET")
spark.sql("CREATE VIEW test_view as select cast(id as string) as id, name from test")
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain

20/11/15 06:19:01 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_partitions_by_filter : db=default tbl=test
20/11/15 06:19:01 INFO MetaStoreDirectSql: Unable to push down SQL filter: Cannot push down filter for int column and value class java.lang.String
20/11/15 06:19:01 ERROR SparkSQLDriver: Failed in [SELECT * FROM test_view WHERE id = '0']
java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK
 at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
 at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:745)
 at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
 at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
 at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
 at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
 at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:743)

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

SparkQA · 2020-11-19T07:00:25Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35929/

SparkQA · 2020-11-19T07:19:19Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35929/

SparkQA · 2020-11-19T08:05:02Z

Test build #131325 has finished for PR 30422 at commit a97ce6a.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2020-11-19T08:09:15Z

retest this please.

SparkQA · 2020-11-19T08:40:18Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35934/

SparkQA · 2020-11-19T08:59:22Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35934/

SparkQA · 2020-11-19T09:40:07Z

Test build #131329 has finished for PR 30422 at commit a97ce6a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-19T09:54:53Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35940/

SparkQA · 2020-11-19T10:13:11Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35940/

HyukjinKwon · 2020-11-19T10:47:27Z

Hm, the test failures in Jenkins look related. But it passed in GitHub Actions ...:

[info] - 0.13: getPartitionsByFilter: chunk in ('ab', 'ba') and ((cast(ds as string)>'20170102') (37 milliseconds)

HyukjinKwon · 2020-11-19T10:47:38Z

retest this please

SparkQA · 2020-11-19T11:48:50Z

Test build #131336 has finished for PR 30422 at commit 29ddfe9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-19T12:15:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35949/

SparkQA · 2020-11-19T12:25:49Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35949/

HyukjinKwon · 2020-11-19T13:14:02Z

Merged to branch-2.4.

…a.lang.String when pruning partition column This pr backport #30380 to branch-2.4. ### What changes were proposed in this pull request? This pr fix filter for int column and value class java.lang.String when pruning partition column. How to reproduce this issue: ```scala spark.sql("CREATE table test (name STRING) partitioned by (id int) STORED AS PARQUET") spark.sql("CREATE VIEW test_view as select cast(id as string) as id, name from test") spark.sql("SELECT * FROM test_view WHERE id = '0'").explain ``` ``` 20/11/15 06:19:01 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_partitions_by_filter : db=default tbl=test 20/11/15 06:19:01 INFO MetaStoreDirectSql: Unable to push down SQL filter: Cannot push down filter for int column and value class java.lang.String 20/11/15 06:19:01 ERROR SparkSQLDriver: Failed in [SELECT * FROM test_view WHERE id = '0'] java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:745) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:743) ``` ### Why are the changes needed? Fix bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #30422 from wangyum/SPARK-27421-2.4. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

SparkQA · 2020-11-19T13:35:57Z

Test build #131345 has finished for PR 30422 at commit 29ddfe9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Backport SPARK-27421

a97ce6a

HyukjinKwon approved these changes Nov 19, 2020

View reviewed changes

Fix test

29ddfe9

wangyum closed this Nov 19, 2020

wangyum deleted the SPARK-27421-2.4 branch November 19, 2020 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-27421][SQL][2.4] Fix filter for int column and value class java.lang.String when pruning partition column #30422

[SPARK-27421][SQL][2.4] Fix filter for int column and value class java.lang.String when pruning partition column #30422

wangyum commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

wangyum commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

HyukjinKwon commented Nov 19, 2020

HyukjinKwon commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

HyukjinKwon commented Nov 19, 2020

SparkQA commented Nov 19, 2020

[SPARK-27421][SQL][2.4] Fix filter for int column and value class java.lang.String when pruning partition column #30422

[SPARK-27421][SQL][2.4] Fix filter for int column and value class java.lang.String when pruning partition column #30422

Conversation

wangyum commented Nov 19, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

wangyum commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

HyukjinKwon commented Nov 19, 2020

HyukjinKwon commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

HyukjinKwon commented Nov 19, 2020

SparkQA commented Nov 19, 2020