Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][ORC] Fix timestamp type mapping between orc and arrow #34590

Closed
wgtmac opened this issue Mar 16, 2023 · 0 comments · Fixed by #34591
Closed

[C++][ORC] Fix timestamp type mapping between orc and arrow #34590

wgtmac opened this issue Mar 16, 2023 · 0 comments · Fixed by #34591

Comments

@wgtmac
Copy link
Member

wgtmac commented Mar 16, 2023

Describe the enhancement requested

Background: There was an effort to fix inconsistent timestamp types across different SQL-on-Hadoop engines: https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q

In the Apache Orc, two timestamp types are provided:

  • TIMESTAMP: timestamp type without timezone, timestamp value is stored in the writer timezone .
  • TIMESTAMP_INSTANT: timestamp type with local timezone, timestamp value is stored in the UTC timezone.

arrow::TimestampType has an optional timezone field: https://github.com/apache/arrow/blob/main/cpp/src/arrow/type.h#L1385

  • If timezone is provided, values are normalized in UTC.
  • If timezone is missing, values can be in any timezone.

Therefore, the type mapping should be as below:

  • orc::TIMESTAMP <=> arrow::TimestampType w/o timezone
  • orc::TIMESTAMP_INSTANT <=> arrow::TimestampType w/ timezone

Component(s)

C++

@wgtmac wgtmac changed the title [C++][ORC] Use TIMESTAMP_INSTANT instead of TIMESTAMP [C++][ORC] Fix timestamp type mapping between orc and arrow Mar 16, 2023
wgtmac added a commit to wgtmac/arrow that referenced this issue Mar 16, 2023
wgtmac added a commit to wgtmac/arrow that referenced this issue Mar 16, 2023
wjones127 pushed a commit that referenced this issue Mar 21, 2023
…#34591)

### Rationale for this change

Background: There was an effort to fix inconsistent timestamp types across different SQL-on-Hadoop engines: https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q

In the Apache Orc, two timestamp types are provided:

- TIMESTAMP: timestamp type without timezone, timestamp value is stored in the writer timezone .
- TIMESTAMP_INSTANT: timestamp type with local timezone, timestamp value is stored in the UTC timezone.

arrow::TimestampType has an optional timezone field:
- If timezone is provided, values are normalized in UTC.
- If timezone is missing, values can be in any timezone.

### What changes are included in this PR?

The type mapping is fixed as below:
- orc::TIMESTAMP <=> arrow::TimestampType w/o timezone
- orc::TIMESTAMP_INSTANT <=> arrow::TimestampType w/ timezone

### Are these changes tested?

Make sure all tests pass.

### Are there any user-facing changes?

No.
* Closes: #34590

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
@wjones127 wjones127 added this to the 12.0.0 milestone Mar 21, 2023
rtpsw pushed a commit to rtpsw/arrow that referenced this issue Mar 27, 2023
… arrow (apache#34591)

### Rationale for this change

Background: There was an effort to fix inconsistent timestamp types across different SQL-on-Hadoop engines: https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q

In the Apache Orc, two timestamp types are provided:

- TIMESTAMP: timestamp type without timezone, timestamp value is stored in the writer timezone .
- TIMESTAMP_INSTANT: timestamp type with local timezone, timestamp value is stored in the UTC timezone.

arrow::TimestampType has an optional timezone field:
- If timezone is provided, values are normalized in UTC.
- If timezone is missing, values can be in any timezone.

### What changes are included in this PR?

The type mapping is fixed as below:
- orc::TIMESTAMP <=> arrow::TimestampType w/o timezone
- orc::TIMESTAMP_INSTANT <=> arrow::TimestampType w/ timezone

### Are these changes tested?

Make sure all tests pass.

### Are there any user-facing changes?

No.
* Closes: apache#34590

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants