-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-15062][orc] Orc reader should use java.sql.Timestamp to read for respecting time zone #10426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…or respecting time zone
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit c4582e2 (Thu Dec 05 03:57:10 UTC 2019) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
xuefuz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. I left two minor comments for consideration.
| Timestamp timestamp = new Timestamp(millisecond); | ||
| timestamp.setNanos(nanoOfSecond); | ||
| Timestamp timestamp = value instanceof LocalDateTime ? | ||
| Timestamp.valueOf((LocalDateTime) value) : (Timestamp) value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it doesn't matter much, but I'm curious if we need to deal with both types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java.sql.Timestamp is the default format in hive world, but LocalDateTime is the default format in flink world.
Whatever, It must be correct that support all.
| col2.nanos[i] = i; | ||
|
|
||
| Timestamp timestamp = Timestamp.valueOf( | ||
| padZero(4, i + 1000) + "-01-01 00:00:00." + i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test would cover more cases if the values are more representative (rather than a lot of zeros for parts of the timestamp).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it has test 1000-00-00 until 2023-00-00. But I can assign all values too.
| return SqlTimestamp.fromEpochMillis( | ||
| vector.time[index], | ||
| SqlTimestamp.isCompact(precision) ? 0 : vector.nanos[index] % 1_000_000); | ||
| Timestamp timestamp = new Timestamp(vector.time[index]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference with the original one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original one directly use the underlying long and int to construct SqlTimestamp.
But hive orc is using java.sql.Timestamp to construct underlying data. You can understand like:
java.sql.Timestamp orcTimestamp;
SqlTimestamp.fromEpochMillis(orcTimestamp.getTime(), orcTimestamp.getNano());
VS
SqlTimestamp.fromString(orcTimestamp.toString());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the later one will be influenced by local time zone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
KurtYoung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will merge this after travis
What is the purpose of the change
Hive orc use java.sql.Timestamp to read and write orc files... default, timestamp will consider time zone to adjust seconds.
Our vector Orc reader should use java.sql.Timestamp to read for respecting time zone
Brief change log
Verifying this change
OrcColumnarRowSplitReaderTest
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation