Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-25104: Backward incompatible timestamp serialization in Parquet for certain timezones #2282

Merged
merged 12 commits into from Jun 4, 2021

Conversation

zabetak
Copy link
Contributor

@zabetak zabetak commented May 17, 2021

What changes were proposed in this pull request?

  1. Add new read/write config properties to control legacy zone conversions in Parquet.
  2. Deprecate hive.parquet.timestamp.legacy.conversion.enabled property since it is not clear if it applies on conversion during read or write.
  3. Exploit file metadata and property to choose between new/old conversion rules.
  4. Update existing tests to remove usages of now deprecated hive.parquet.timestamp.legacy.conversion.enabled property.
  5. Simplify NanoTimeUtils#getTimestamp & NanoTimeUtils#getNanoTime by removing 'skipConversion' parameter

Why are the changes needed?

  1. Provide the end-users the possibility to write backward compatible timestamps in Parquet files so that files can be read correctly by older versions.
  2. Improve code readability of NanoTimeUtils APIs.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  1. Add timestamp read/write compatibility test with Hive2 Parquet APIs (TestParquetTimestampsHive2Compatibility)
  2. Add qtest writing timestamps in Parquet using legacy zone conversions (parquet_int96_legacy_compatibility_timestamp.q)
mvn test -Dtest=*Timestamp*
cd itests/qtest
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex=".*timestamp.*" -Dtest.output.overwrite
  1. Manual tests
  • Write to external Parquet table with current Hive version setting hive.parquet.timestamp.write.legacy.conversion.enabled=true
  • Read from external Parquet table with Hive 2 (commit 324f9fa)

@jcamachor jcamachor requested a review from klcopp May 26, 2021 14:49
@jcamachor
Copy link
Contributor

@klcopp , could you take a look too? Thanks

…eter

The conversion can be skipped by passing an appropriate timezone so
the boolean parameter is redundant and makes the code harder to
understand.
…meter

The conversion can be skipped by passing an appropriate timezone so
the boolean parameter is redundant and makes the code harder to
understand.

Adapt callers to pass the appropriate timezone (to perform or not the
conversion) to retain the old behavior.
1. Add new read/write config properties to control legacy zone conversions in Parquet.
2. Deprecate hive.parquet.timestamp.legacy.conversion.enabled property since it
is not clear if it applies on conversion during read or write.
3. Exploit file metadata and property to choose between new/old conversion rules.
4. Update existing tests to remove usages of now deprecated
hive.parquet.timestamp.legacy.conversion.enabled property.
Use old hive.parquet.timestamp.legacy.conversion.enabled property to
control legacy conversion when reading timestamps from Parquet files.

When the user has set explicitly the legacy conversion flag using the
old property name Hive needs to take it into account otherwise data may
be corrupted.
… callers

The method was used only in two places and doesn't significantly improve
readability.
…dd-util

Code didn't compile after fd6e701
was merged to master.
@jcamachor jcamachor merged commit f1ff996 into apache:master Jun 4, 2021
ashish-kumar-sharma pushed a commit to ashish-kumar-sharma/hive that referenced this pull request Mar 31, 2022
…for certain timezones (Stamatis Zampetakis, reviewed by Jesus Camacho Rodriguez)

Closes apache#2282
amanraj2520 pushed a commit to amanraj2520/hive that referenced this pull request Jan 23, 2024
…for certain timezones (Stamatis Zampetakis, reviewed by Jesus Camacho Rodriguez)

Closes apache#2282
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants