Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds #18252

Closed
wants to merge 3 commits into from

Conversation

aokolnychyi
Copy link
Contributor

The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps.

Consider the following example:

spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false)
+------------------------------------------------+
|CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)|
+------------------------------------------------+
|2015-01-02 00:00:00.000001                      |
+------------------------------------------------+

The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously.

@SparkQA
Copy link

SparkQA commented Jun 9, 2017

Test build #77844 has finished for PR 18252 at commit 2f232a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Jun 9, 2017

Looks like DateTimeUtils.scala#L411-L414 is for the same purpose but not enough.
I guess we can remove these lines now.

@aokolnychyi
Copy link
Contributor Author

@ueshin good point, thanks.

@SparkQA
Copy link

SparkQA commented Jun 9, 2017

Test build #77857 has finished for PR 18252 at commit 4d057c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -399,13 +399,13 @@ object DateTimeUtils {
digitsMilli += 1
}

if (!justTime && isInvalidDate(segments(0), segments(1), segments(2))) {
return None
while (digitsMilli > 6) {
Copy link
Contributor

@wzhfy wzhfy Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment indicating we are truncating the nanosecond part and its lossy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wzhfy done

@@ -32,7 +32,7 @@ import org.apache.spark.unsafe.types.UTF8String
* Helper functions for converting between internal and external date and time representations.
* Dates are exposed externally as java.sql.Date and are represented internally as the number of
* dates since the Unix epoch (1970-01-01). Timestamps are exposed externally as java.sql.Timestamp
* and are stored internally as longs, which are capable of storing timestamps with 100 nanosecond
* and are stored internally as longs, which are capable of storing timestamps with microsecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 ns is different from micro, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but the previous comment, which was introduced in this commit, was no longer correct. The logic was changed in this commit and now it is up to microseconds.

@SparkQA
Copy link

SparkQA commented Jun 10, 2017

Test build #77870 has finished for PR 18252 at commit a498f83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy
Copy link
Contributor

wzhfy commented Jun 10, 2017

LGTM

@aokolnychyi
Copy link
Contributor Author

@wzhfy @rxin @ueshin can someone, please, merge this?

asfgit pushed a commit that referenced this pull request Jun 12, 2017
The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps.

Consider the following example:
```
spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false)
+------------------------------------------------+
|CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)|
+------------------------------------------------+
|2015-01-02 00:00:00.000001                      |
+------------------------------------------------+
```

The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously.

Author: aokolnychyi <anton.okolnychyi@sap.com>

Closes #18252 from aokolnychyi/spark-17914.

(cherry picked from commit ca4e960)
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
@ueshin
Copy link
Member

ueshin commented Jun 12, 2017

Thanks! merging to master/2.2.

@asfgit asfgit closed this in ca4e960 Jun 12, 2017
dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps.

Consider the following example:
```
spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false)
+------------------------------------------------+
|CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)|
+------------------------------------------------+
|2015-01-02 00:00:00.000001                      |
+------------------------------------------------+
```

The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously.

Author: aokolnychyi <anton.okolnychyi@sap.com>

Closes apache#18252 from aokolnychyi/spark-17914.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants