-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][ORC] Fix timestamp type mapping between orc and arrow #34590
Comments
wgtmac
changed the title
[C++][ORC] Use TIMESTAMP_INSTANT instead of TIMESTAMP
[C++][ORC] Fix timestamp type mapping between orc and arrow
Mar 16, 2023
wgtmac
added a commit
to wgtmac/arrow
that referenced
this issue
Mar 16, 2023
wgtmac
added a commit
to wgtmac/arrow
that referenced
this issue
Mar 16, 2023
wjones127
pushed a commit
that referenced
this issue
Mar 21, 2023
…#34591) ### Rationale for this change Background: There was an effort to fix inconsistent timestamp types across different SQL-on-Hadoop engines: https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q In the Apache Orc, two timestamp types are provided: - TIMESTAMP: timestamp type without timezone, timestamp value is stored in the writer timezone . - TIMESTAMP_INSTANT: timestamp type with local timezone, timestamp value is stored in the UTC timezone. arrow::TimestampType has an optional timezone field: - If timezone is provided, values are normalized in UTC. - If timezone is missing, values can be in any timezone. ### What changes are included in this PR? The type mapping is fixed as below: - orc::TIMESTAMP <=> arrow::TimestampType w/o timezone - orc::TIMESTAMP_INSTANT <=> arrow::TimestampType w/ timezone ### Are these changes tested? Make sure all tests pass. ### Are there any user-facing changes? No. * Closes: #34590 Authored-by: Gang Wu <ustcwg@gmail.com> Signed-off-by: Will Jones <willjones127@gmail.com>
rtpsw
pushed a commit
to rtpsw/arrow
that referenced
this issue
Mar 27, 2023
… arrow (apache#34591) ### Rationale for this change Background: There was an effort to fix inconsistent timestamp types across different SQL-on-Hadoop engines: https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q In the Apache Orc, two timestamp types are provided: - TIMESTAMP: timestamp type without timezone, timestamp value is stored in the writer timezone . - TIMESTAMP_INSTANT: timestamp type with local timezone, timestamp value is stored in the UTC timezone. arrow::TimestampType has an optional timezone field: - If timezone is provided, values are normalized in UTC. - If timezone is missing, values can be in any timezone. ### What changes are included in this PR? The type mapping is fixed as below: - orc::TIMESTAMP <=> arrow::TimestampType w/o timezone - orc::TIMESTAMP_INSTANT <=> arrow::TimestampType w/ timezone ### Are these changes tested? Make sure all tests pass. ### Are there any user-facing changes? No. * Closes: apache#34590 Authored-by: Gang Wu <ustcwg@gmail.com> Signed-off-by: Will Jones <willjones127@gmail.com>
This was referenced Mar 25, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
Background: There was an effort to fix inconsistent timestamp types across different SQL-on-Hadoop engines: https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q
In the Apache Orc, two timestamp types are provided:
arrow::TimestampType has an optional timezone field: https://github.com/apache/arrow/blob/main/cpp/src/arrow/type.h#L1385
Therefore, the type mapping should be as below:
Component(s)
C++
The text was updated successfully, but these errors were encountered: