[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds #18252

aokolnychyi · 2017-06-09T11:59:23Z

The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps.

Consider the following example:

spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false)
+------------------------------------------------+
|CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)|
+------------------------------------------------+
|2015-01-02 00:00:00.000001                      |
+------------------------------------------------+

The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously.

SparkQA · 2017-06-09T14:26:07Z

Test build #77844 has finished for PR 18252 at commit 2f232a7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2017-06-09T19:15:58Z

Looks like DateTimeUtils.scala#L411-L414 is for the same purpose but not enough.
I guess we can remove these lines now.

aokolnychyi · 2017-06-09T20:44:13Z

@ueshin good point, thanks.

SparkQA · 2017-06-09T23:00:57Z

Test build #77857 has finished for PR 18252 at commit 4d057c9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wzhfy · 2017-06-09T23:38:10Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

@@ -399,13 +399,13 @@ object DateTimeUtils {
      digitsMilli += 1
    }

-    if (!justTime && isInvalidDate(segments(0), segments(1), segments(2))) {
-      return None
+    while (digitsMilli > 6) {


add a comment indicating we are truncating the nanosecond part and its lossy?

@wzhfy done

rxin · 2017-06-10T01:37:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

@@ -32,7 +32,7 @@ import org.apache.spark.unsafe.types.UTF8String
 * Helper functions for converting between internal and external date and time representations.
 * Dates are exposed externally as java.sql.Date and are represented internally as the number of
 * dates since the Unix epoch (1970-01-01). Timestamps are exposed externally as java.sql.Timestamp
- * and are stored internally as longs, which are capable of storing timestamps with 100 nanosecond
+ * and are stored internally as longs, which are capable of storing timestamps with microsecond


100 ns is different from micro, isn't it?

Sure, but the previous comment, which was introduced in this commit, was no longer correct. The logic was changed in this commit and now it is up to microseconds.

SparkQA · 2017-06-10T10:50:09Z

Test build #77870 has finished for PR 18252 at commit a498f83.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wzhfy · 2017-06-10T17:09:32Z

LGTM

aokolnychyi · 2017-06-12T17:50:42Z

@wzhfy @rxin @ueshin can someone, please, merge this?

The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps. Consider the following example: ``` spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false) +------------------------------------------------+ |CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)| +------------------------------------------------+ |2015-01-02 00:00:00.000001 | +------------------------------------------------+ ``` The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously. Author: aokolnychyi <anton.okolnychyi@sap.com> Closes #18252 from aokolnychyi/spark-17914. (cherry picked from commit ca4e960) Signed-off-by: Takuya UESHIN <ueshin@databricks.com>

ueshin · 2017-06-12T20:07:09Z

Thanks! merging to master/2.2.

The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps. Consider the following example: ``` spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false) +------------------------------------------------+ |CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)| +------------------------------------------------+ |2015-01-02 00:00:00.000001 | +------------------------------------------------+ ``` The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously. Author: aokolnychyi <anton.okolnychyi@sap.com> Closes apache#18252 from aokolnychyi/spark-17914.

[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds

2f232a7

[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds

4d057c9

wzhfy reviewed Jun 9, 2017

View reviewed changes

rxin reviewed Jun 10, 2017

View reviewed changes

[SPARK-17914][SQL] Added a comment about truncation

a498f83

asfgit closed this in ca4e960 Jun 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds #18252

[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds #18252

aokolnychyi commented Jun 9, 2017

SparkQA commented Jun 9, 2017

ueshin commented Jun 9, 2017

aokolnychyi commented Jun 9, 2017

SparkQA commented Jun 9, 2017

wzhfy Jun 9, 2017 •

edited

Loading

aokolnychyi Jun 10, 2017

rxin Jun 10, 2017

aokolnychyi Jun 10, 2017

SparkQA commented Jun 10, 2017

wzhfy commented Jun 10, 2017

aokolnychyi commented Jun 12, 2017

ueshin commented Jun 12, 2017

[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds #18252

[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds #18252

Conversation

aokolnychyi commented Jun 9, 2017

SparkQA commented Jun 9, 2017

ueshin commented Jun 9, 2017

aokolnychyi commented Jun 9, 2017

SparkQA commented Jun 9, 2017

wzhfy Jun 9, 2017 • edited Loading

Choose a reason for hiding this comment

aokolnychyi Jun 10, 2017

Choose a reason for hiding this comment

rxin Jun 10, 2017

Choose a reason for hiding this comment

aokolnychyi Jun 10, 2017

Choose a reason for hiding this comment

SparkQA commented Jun 10, 2017

wzhfy commented Jun 10, 2017

aokolnychyi commented Jun 12, 2017

ueshin commented Jun 12, 2017

wzhfy Jun 9, 2017 •

edited

Loading