Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22109][SQL][BRANCH-2.2] Resolves type conflicts between strings and timestamps in partition column #19333

Closed

Commits on Sep 23, 2017

  1. [SPARK-22109][SQL] Resolves type conflicts between strings and timest…

    …amps in partition column
    
    This PR proposes to resolve the type conflicts in strings and timestamps in partition column values.
    It looks we need to set the timezone as it needs a cast between strings and timestamps.
    
    ```scala
    val df = Seq((1, "2015-01-01 00:00:00"), (2, "2014-01-01 00:00:00"), (3, "blah")).toDF("i", "str")
    val path = "/tmp/test.parquet"
    df.write.format("parquet").partitionBy("str").save(path)
    spark.read.parquet(path).show()
    ```
    
    **Before**
    
    ```
    java.util.NoSuchElementException: None.get
      at scala.None$.get(Option.scala:347)
      at scala.None$.get(Option.scala:345)
      at org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression$class.timeZone(datetimeExpressions.scala:46)
      at org.apache.spark.sql.catalyst.expressions.Cast.timeZone$lzycompute(Cast.scala:172)
      at org.apache.spark.sql.catalyst.expressions.Cast.timeZone(Cast.scala:172)
      at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToString$3$$anonfun$apply$16.apply(Cast.scala:208)
      at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToString$3$$anonfun$apply$16.apply(Cast.scala:208)
      at org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$catalyst$expressions$Cast$$buildCast(Cast.scala:201)
      at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToString$3.apply(Cast.scala:207)
      at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:533)
      at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:331)
      at org.apache.spark.sql.execution.datasources.PartitioningUtils$$anonfun$org$apache$spark$sql$execution$datasources$PartitioningUtils$$resolveTypeConflicts$1.apply(PartitioningUtils.scala:481)
      at org.apache.spark.sql.execution.datasources.PartitioningUtils$$anonfun$org$apache$spark$sql$execution$datasources$PartitioningUtils$$resolveTypeConflicts$1.apply(PartitioningUtils.scala:480)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    ```
    
    **After**
    
    ```
    +---+-------------------+
    |  i|                str|
    +---+-------------------+
    |  2|2014-01-01 00:00:00|
    |  1|2015-01-01 00:00:00|
    |  3|               blah|
    +---+-------------------+
    ```
    
    Unit tests added in `ParquetPartitionDiscoverySuite` and manual tests.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes apache#19331 from HyukjinKwon/SPARK-22109.
    HyukjinKwon committed Sep 23, 2017
    Configuration menu
    Copy the full SHA
    42fa83c View commit details
    Browse the repository at this point in the history