-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds #18252
Conversation
Test build #77844 has finished for PR 18252 at commit
|
Looks like DateTimeUtils.scala#L411-L414 is for the same purpose but not enough. |
@ueshin good point, thanks. |
Test build #77857 has finished for PR 18252 at commit
|
@@ -399,13 +399,13 @@ object DateTimeUtils { | |||
digitsMilli += 1 | |||
} | |||
|
|||
if (!justTime && isInvalidDate(segments(0), segments(1), segments(2))) { | |||
return None | |||
while (digitsMilli > 6) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment indicating we are truncating the nanosecond part and its lossy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wzhfy done
@@ -32,7 +32,7 @@ import org.apache.spark.unsafe.types.UTF8String | |||
* Helper functions for converting between internal and external date and time representations. | |||
* Dates are exposed externally as java.sql.Date and are represented internally as the number of | |||
* dates since the Unix epoch (1970-01-01). Timestamps are exposed externally as java.sql.Timestamp | |||
* and are stored internally as longs, which are capable of storing timestamps with 100 nanosecond | |||
* and are stored internally as longs, which are capable of storing timestamps with microsecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100 ns is different from micro, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test build #77870 has finished for PR 18252 at commit
|
LGTM |
The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps. Consider the following example: ``` spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false) +------------------------------------------------+ |CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)| +------------------------------------------------+ |2015-01-02 00:00:00.000001 | +------------------------------------------------+ ``` The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously. Author: aokolnychyi <anton.okolnychyi@sap.com> Closes #18252 from aokolnychyi/spark-17914. (cherry picked from commit ca4e960) Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
Thanks! merging to master/2.2. |
The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps. Consider the following example: ``` spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false) +------------------------------------------------+ |CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)| +------------------------------------------------+ |2015-01-02 00:00:00.000001 | +------------------------------------------------+ ``` The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously. Author: aokolnychyi <anton.okolnychyi@sap.com> Closes apache#18252 from aokolnychyi/spark-17914.
The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps.
Consider the following example:
The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously.