Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
DRILL-4203: Fix date values written in parquet files created by Drill
Drill was writing non-standard dates into parquet files for all releases before 1.9.0. The values have been read by Drill correctly by Drill, but external tools like Spark reading the files will see corrupted values for all dates that have been written by Drill. This change corrects the behavior of the Drill parquet writer to correctly store dates in the format given in the parquet specification. To maintain compatibility with old files, the parquet reader code has been updated to check for the old format and automatically shift the corrupted values into corrected ones automatically. The test cases included here should ensure that all files produced by historical versions of Drill will continue to return the same values they had in previous releases. For compatibility with external tools, any old files with corrupted dates can be re-written using the CREATE TABLE AS command (as the writer will now only produce the specification-compliant values, even if after reading out of older corrupt files). While the old behavior was a consistent shift into an unlikely range to be used in a modern database (over 10,000 years in the future), these are still valid date values. In the case where these may have been written into files intentionally, and we cannot be certain from the metadata if Drill produced the files, an option is included to turn off the auto-correction. Use of this option is assumed to be extremely unlikely, but it is included for completeness. This patch was originally written against version 1.5.0, when rebasing the corruption threshold was updated to 1.9.0. Added regenerated binary files, updated metadata cache files accordingly. One small fix in the ParquetGroupScan to accommodate changes in master that changed when metadata is read. Tests for bugs revealed by the regression suite. Fix drill version number in metadata file generation
- Loading branch information
1 parent
2f4b5ef
commit ae34d5c
Showing
87 changed files
with
1,607 additions
and
87 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.