[SPARK-33477][SQL] Hive Metastore support filter by date type#30408
[SPARK-33477][SQL] Hive Metastore support filter by date type#30408wangyum wants to merge 4 commits intoapache:masterfrom wangyum:SPARK-33477
Conversation
|
Test build #131274 has finished for PR 30408 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
Show resolved
Hide resolved
| (Literal(1) === a("intcol", IntegerType)) :: (Literal("a") === a("strcol", IntegerType)) :: Nil, | ||
| "1 = intcol and \"a\" = strcol") | ||
|
|
||
| filterTest("date filter", |
There was a problem hiding this comment.
do we run these test with different hive versions?
There was a problem hiding this comment.
Different hive versions tested by HivePartitionFilteringSuite:
|
@wangyum can you resolve the conflicts? thanks! |
# Conflicts: # sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala # sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala # sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala
|
last question about correctness: Does hive execute the partition predicate as date comparison or string comparison? The later can be problematic. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131357 has finished for PR 30408 at commit
|
| "2019-01-01 = datecol and \"a\" = strcol") | ||
|
|
||
| filterTest("date filter with null", | ||
| (a("datecol", DateType) === Literal(null)) :: Nil, |
There was a problem hiding this comment.
not related to this PR, but we can pushdown col is null predicate to hive for this case.
There was a problem hiding this comment.
can we create an attr method to get the AttributeReference from the table? to follow other tests.
cloud-fan
left a comment
There was a problem hiding this comment.
LGTM except one comment for test
|
|
||
| def unapply(values: Set[Any]): Option[Seq[String]] = { | ||
| val extractables = values.toSeq.map(valueToLiteralString.lift) | ||
| if (extractables.nonEmpty && extractables.forall(_.isDefined)) { |
There was a problem hiding this comment.
Why do we need forall here? InSet can have mixed values: int and other types?
There was a problem hiding this comment.
Otherwise this test will fail:
filterTest("string filter with InSet predicate",
(InSet(a("stringcol", StringType),
Range(1, 3).map(d => UTF8String.fromString(d.toString)).toSet)) :: Nil,
"(stringcol = \"1\" or stringcol = \"2\")")None.get
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:529)
at scala.None$.get(Option.scala:527)
at org.apache.spark.sql.hive.client.Shim_v0_13$ExtractableDateValues$1$.$anonfun$unapply$7(HiveShim.scala:720)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
|
Test build #131624 has finished for PR 30408 at commit
|
|
retest this please |
|
Test build #131631 has finished for PR 30408 at commit
|
|
@shaneknapp Did you set : This issue should be fixed if we set |
|
@wangyum How about asking it in the spark-dev thread so that Shane could notice it quickly?http://apache-spark-developers-list.1001551.n3.nabble.com/jenkins-downtime-tomorrow-evening-weekend-tt30405.html |
|
retest this please |
|
Test build #131689 has finished for PR 30408 at commit
|
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
Show resolved
Hide resolved
|
Test build #131740 has finished for PR 30408 at commit
|
|
retest this please. |
|
Merged to master. |
|
Test build #131751 has finished for PR 30408 at commit
|
|
|
||
| val partitions = | ||
| for { | ||
| date <- Seq("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04") |
|
@wangyum Could you add more test cases to check the NULL handling cases? For example,
Please check https://spark.apache.org/docs/3.0.1/sql-ref-null-semantics.html#comp-operators |
OK |
What changes were proposed in this pull request?
Hive Metastore supports strings and integral types in filters. It could also support dates. Please see HIVE-5679 for more details.
This pr add support it.
Why are the changes needed?
Improve query performance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.