From ae34d5c30582de777db19360abf013bc50c8640b Mon Sep 17 00:00:00 2001 From: Jason Altekruse Date: Thu, 31 Dec 2015 10:22:04 -0600 Subject: [PATCH] DRILL-4203: Fix date values written in parquet files created by Drill Drill was writing non-standard dates into parquet files for all releases before 1.9.0. The values have been read by Drill correctly by Drill, but external tools like Spark reading the files will see corrupted values for all dates that have been written by Drill. This change corrects the behavior of the Drill parquet writer to correctly store dates in the format given in the parquet specification. To maintain compatibility with old files, the parquet reader code has been updated to check for the old format and automatically shift the corrupted values into corrected ones automatically. The test cases included here should ensure that all files produced by historical versions of Drill will continue to return the same values they had in previous releases. For compatibility with external tools, any old files with corrupted dates can be re-written using the CREATE TABLE AS command (as the writer will now only produce the specification-compliant values, even if after reading out of older corrupt files). While the old behavior was a consistent shift into an unlikely range to be used in a modern database (over 10,000 years in the future), these are still valid date values. In the case where these may have been written into files intentionally, and we cannot be certain from the metadata if Drill produced the files, an option is included to turn off the auto-correction. Use of this option is assumed to be extremely unlikely, but it is included for completeness. This patch was originally written against version 1.5.0, when rebasing the corruption threshold was updated to 1.9.0. Added regenerated binary files, updated metadata cache files accordingly. One small fix in the ParquetGroupScan to accommodate changes in master that changed when metadata is read. Tests for bugs revealed by the regression suite. Fix drill version number in metadata file generation --- .../hive/HiveDrillNativeScanBatchCreator.java | 8 +- .../templates/ParquetOutputRecordWriter.java | 4 +- .../sql/handlers/RefreshMetadataHandler.java | 5 +- .../drill/exec/store/parquet/Metadata.java | 77 ++- .../store/parquet/ParquetFormatConfig.java | 22 +- .../store/parquet/ParquetFormatPlugin.java | 9 +- .../exec/store/parquet/ParquetGroupScan.java | 34 +- .../store/parquet/ParquetReaderUtility.java | 232 ++++++++ .../parquet/ParquetScanBatchCreator.java | 8 +- .../columnreaders/ColumnReaderFactory.java | 28 +- .../columnreaders/FixedByteAlignedReader.java | 65 ++- .../NullableFixedByteAlignedReaders.java | 63 +- .../columnreaders/ParquetRecordReader.java | 85 ++- .../parquet2/DrillParquetGroupConverter.java | 71 ++- .../store/parquet2/DrillParquetReader.java | 8 +- .../DrillParquetRecordMaterializer.java | 6 +- .../org/apache/drill/DrillTestWrapper.java | 4 +- .../TestCorruptParquetDateCorrection.java | 539 ++++++++++++++++++ .../dfs/TestFormatPluginOptionExtractor.java | 4 +- .../parquet/ParquetRecordReaderTest.java | 2 +- .../0_0_1.parquet | Bin 0 -> 257 bytes .../0_0_2.parquet | Bin 0 -> 257 bytes .../0_0_3.parquet | Bin 0 -> 257 bytes .../0_0_4.parquet | Bin 0 -> 257 bytes .../0_0_5.parquet | Bin 0 -> 257 bytes .../0_0_6.parquet | Bin 0 -> 257 bytes ....parquet.metadata_1_2.requires_replace.txt | 119 ++++ .../fewtypes_datepartition/0_0_1.parquet | Bin 0 -> 1226 bytes .../fewtypes_datepartition/0_0_10.parquet | Bin 0 -> 1258 bytes .../fewtypes_datepartition/0_0_11.parquet | Bin 0 -> 1238 bytes .../fewtypes_datepartition/0_0_12.parquet | Bin 0 -> 1258 bytes .../fewtypes_datepartition/0_0_13.parquet | Bin 0 -> 1226 bytes .../fewtypes_datepartition/0_0_14.parquet | Bin 0 -> 1201 bytes .../fewtypes_datepartition/0_0_15.parquet | Bin 0 -> 1216 bytes .../fewtypes_datepartition/0_0_16.parquet | Bin 0 -> 1253 bytes .../fewtypes_datepartition/0_0_17.parquet | Bin 0 -> 1231 bytes .../fewtypes_datepartition/0_0_18.parquet | Bin 0 -> 1216 bytes .../fewtypes_datepartition/0_0_19.parquet | Bin 0 -> 1186 bytes .../fewtypes_datepartition/0_0_2.parquet | Bin 0 -> 1268 bytes .../fewtypes_datepartition/0_0_20.parquet | Bin 0 -> 1228 bytes .../fewtypes_datepartition/0_0_21.parquet | Bin 0 -> 1231 bytes .../fewtypes_datepartition/0_0_3.parquet | Bin 0 -> 1278 bytes .../fewtypes_datepartition/0_0_4.parquet | Bin 0 -> 1242 bytes .../fewtypes_datepartition/0_0_5.parquet | Bin 0 -> 1335 bytes .../fewtypes_datepartition/0_0_6.parquet | Bin 0 -> 1222 bytes .../fewtypes_datepartition/0_0_7.parquet | Bin 0 -> 1273 bytes .../fewtypes_datepartition/0_0_8.parquet | Bin 0 -> 1263 bytes .../fewtypes_datepartition/0_0_9.parquet | Bin 0 -> 1268 bytes .../fewtypes_varcharpartition/0_0_1.parquet | Bin 0 -> 2128 bytes .../fewtypes_varcharpartition/0_0_10.parquet | Bin 0 -> 2086 bytes .../fewtypes_varcharpartition/0_0_11.parquet | Bin 0 -> 2121 bytes .../fewtypes_varcharpartition/0_0_12.parquet | Bin 0 -> 2114 bytes .../fewtypes_varcharpartition/0_0_13.parquet | Bin 0 -> 2128 bytes .../fewtypes_varcharpartition/0_0_14.parquet | Bin 0 -> 2068 bytes .../fewtypes_varcharpartition/0_0_15.parquet | Bin 0 -> 2054 bytes .../fewtypes_varcharpartition/0_0_16.parquet | Bin 0 -> 2114 bytes .../fewtypes_varcharpartition/0_0_17.parquet | Bin 0 -> 2135 bytes .../fewtypes_varcharpartition/0_0_18.parquet | Bin 0 -> 2223 bytes .../fewtypes_varcharpartition/0_0_19.parquet | Bin 0 -> 2072 bytes .../fewtypes_varcharpartition/0_0_2.parquet | Bin 0 -> 2107 bytes .../fewtypes_varcharpartition/0_0_20.parquet | Bin 0 -> 2012 bytes .../fewtypes_varcharpartition/0_0_21.parquet | Bin 0 -> 2033 bytes .../fewtypes_varcharpartition/0_0_3.parquet | Bin 0 -> 2068 bytes .../fewtypes_varcharpartition/0_0_4.parquet | Bin 0 -> 2075 bytes .../fewtypes_varcharpartition/0_0_5.parquet | Bin 0 -> 2075 bytes .../fewtypes_varcharpartition/0_0_6.parquet | Bin 0 -> 2091 bytes .../fewtypes_varcharpartition/0_0_7.parquet | Bin 0 -> 2063 bytes .../fewtypes_varcharpartition/0_0_8.parquet | Bin 0 -> 2142 bytes .../fewtypes_varcharpartition/0_0_9.parquet | Bin 0 -> 2054 bytes .../4203_corrected_dates.parquet | Bin 0 -> 278 bytes .../4203_corrupt_dates.parquet | Bin 0 -> 181 bytes .../4203_corrupted_dates_1.4.parquet | Bin 0 -> 278 bytes .../drill_0_6_currupt_dates_no_stats.parquet | Bin 0 -> 181 bytes ..._partitioned_metadata.requires_replace.txt | 301 ++++++++++ ...ull_date_cols_with_corruption_4203.parquet | Bin 0 -> 364 bytes .../0_0_1.parquet | Bin 0 -> 257 bytes .../0_0_2.parquet | Bin 0 -> 257 bytes .../0_0_3.parquet | Bin 0 -> 257 bytes .../0_0_4.parquet | Bin 0 -> 257 bytes .../0_0_5.parquet | Bin 0 -> 257 bytes .../0_0_6.parquet | Bin 0 -> 257 bytes .../0_0_1.parquet | Bin 0 -> 160 bytes .../0_0_2.parquet | Bin 0 -> 160 bytes .../0_0_3.parquet | Bin 0 -> 160 bytes .../0_0_4.parquet | Bin 0 -> 160 bytes .../0_0_5.parquet | Bin 0 -> 160 bytes .../0_0_6.parquet | Bin 0 -> 160 bytes 87 files changed, 1607 insertions(+), 87 deletions(-) create mode 100644 exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_1.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_2.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_3.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_4.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_5.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_6.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/drill.parquet.metadata_1_2.requires_replace.txt create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_1.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_10.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_11.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_12.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_13.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_14.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_15.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_16.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_17.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_18.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_19.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_2.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_20.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_21.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_3.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_4.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_5.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_6.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_7.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_8.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_9.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_1.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_10.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_11.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_12.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_13.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_14.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_15.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_16.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_17.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_18.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_19.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_2.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_20.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_21.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_3.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_4.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_5.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_6.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_7.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_8.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_9.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/4203_corrected_dates.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/4203_corrupt_dates.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/4203_corrupted_dates_1.4.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/drill_0_6_currupt_dates_no_stats.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_1.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_2.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_3.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_4.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_5.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_6.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2/0_0_1.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2/0_0_2.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2/0_0_3.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2/0_0_4.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2/0_0_5.parquet create mode 100644 exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2/0_0_6.parquet diff --git a/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java b/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java index a9575ba8443..1ded1532069 100644 --- a/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java +++ b/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java @@ -36,6 +36,7 @@ import org.apache.drill.exec.store.AbstractRecordReader; import org.apache.drill.exec.store.RecordReader; import org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator; +import org.apache.drill.exec.store.parquet.ParquetReaderUtility; import org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader; import org.apache.drill.exec.util.ImpersonationUtil; import org.apache.hadoop.fs.FileSystem; @@ -118,6 +119,10 @@ public ScanBatch getBatch(FragmentContext context, HiveDrillNativeParquetSubScan final List rowGroupNums = getRowGroupNumbersFromFileSplit(fileSplit, parquetMetadata); for(int rowGroupNum : rowGroupNums) { + // Drill has only ever written a single row group per file, only detect corruption + // in the first row group + ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = + ParquetReaderUtility.detectCorruptDates(parquetMetadata, config.getColumns(), true); readers.add(new ParquetRecordReader( context, Path.getPathWithoutSchemeAndAuthority(finalPath).toString(), @@ -125,7 +130,8 @@ public ScanBatch getBatch(FragmentContext context, HiveDrillNativeParquetSubScan CodecFactory.createDirectCodecFactory(fs.getConf(), new ParquetDirectByteBufferAllocator(oContext.getAllocator()), 0), parquetMetadata, - newColumns) + newColumns, + containsCorruptDates) ); Map implicitValues = Maps.newLinkedHashMap(); diff --git a/exec/java-exec/src/main/codegen/templates/ParquetOutputRecordWriter.java b/exec/java-exec/src/main/codegen/templates/ParquetOutputRecordWriter.java index 74af3eac052..aac0f0c261a 100644 --- a/exec/java-exec/src/main/codegen/templates/ParquetOutputRecordWriter.java +++ b/exec/java-exec/src/main/codegen/templates/ParquetOutputRecordWriter.java @@ -156,12 +156,12 @@ public void writeField() throws IOException { <#elseif minor.class == "Date"> <#if mode.prefix == "Repeated" > reader.read(i, holder); - consumer.addInteger((int) (DateTimeUtils.toJulianDayNumber(holder.value) + JULIAN_DAY_EPOC)); + consumer.addInteger((int) (DateTimeUtils.toJulianDayNumber(holder.value) - JULIAN_DAY_EPOC)); <#else> consumer.startField(fieldName, fieldId); reader.read(holder); // convert from internal Drill date format to Julian Day centered around Unix Epoc - consumer.addInteger((int) (DateTimeUtils.toJulianDayNumber(holder.value) + JULIAN_DAY_EPOC)); + consumer.addInteger((int) (DateTimeUtils.toJulianDayNumber(holder.value) - JULIAN_DAY_EPOC)); consumer.endField(fieldName, fieldId); <#elseif diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java index 7be46f06b1d..b36356ab7ef 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java @@ -110,7 +110,10 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws ValidationException, RelConv return notSupported(tableName); } - Metadata.createMeta(fs, selectionRoot); + if (!(formatConfig instanceof ParquetFormatConfig)) { + formatConfig = new ParquetFormatConfig(); + } + Metadata.createMeta(fs, selectionRoot, (ParquetFormatConfig) formatConfig); return direct(true, "Successfully updated metadata for table %s.", tableName); } catch(Exception e) { diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java index 86b860a8f5b..d6a739d0fff 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java @@ -26,6 +26,8 @@ import java.util.concurrent.TimeUnit; import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.util.DrillVersionInfo; +import org.apache.drill.exec.store.AbstractRecordReader; import org.apache.drill.exec.store.TimedRunnable; import org.apache.drill.exec.store.dfs.DrillPathFilter; import org.apache.drill.exec.store.dfs.MetadataContext; @@ -80,6 +82,7 @@ public class Metadata { public static final String METADATA_DIRECTORIES_FILENAME = ".drill.parquet_metadata_directories"; private final FileSystem fs; + private final ParquetFormatConfig formatConfig; private ParquetTableMetadataBase parquetTableMetadata; private ParquetTableMetadataDirs parquetTableMetadataDirs; @@ -91,8 +94,8 @@ public class Metadata { * @param path * @throws IOException */ - public static void createMeta(FileSystem fs, String path) throws IOException { - Metadata metadata = new Metadata(fs); + public static void createMeta(FileSystem fs, String path, ParquetFormatConfig formatConfig) throws IOException { + Metadata metadata = new Metadata(fs, formatConfig); metadata.createMetaFilesRecursively(path); } @@ -104,9 +107,9 @@ public static void createMeta(FileSystem fs, String path) throws IOException { * @return * @throws IOException */ - public static ParquetTableMetadata_v2 getParquetTableMetadata(FileSystem fs, String path) + public static ParquetTableMetadata_v2 getParquetTableMetadata(FileSystem fs, String path, ParquetFormatConfig formatConfig) throws IOException { - Metadata metadata = new Metadata(fs); + Metadata metadata = new Metadata(fs, formatConfig); return metadata.getParquetTableMetadata(path); } @@ -119,8 +122,8 @@ public static ParquetTableMetadata_v2 getParquetTableMetadata(FileSystem fs, Str * @throws IOException */ public static ParquetTableMetadata_v2 getParquetTableMetadata(FileSystem fs, - List fileStatuses) throws IOException { - Metadata metadata = new Metadata(fs); + List fileStatuses, ParquetFormatConfig formatConfig) throws IOException { + Metadata metadata = new Metadata(fs, formatConfig); return metadata.getParquetTableMetadata(fileStatuses); } @@ -132,20 +135,21 @@ public static ParquetTableMetadata_v2 getParquetTableMetadata(FileSystem fs, * @return * @throws IOException */ - public static ParquetTableMetadataBase readBlockMeta(FileSystem fs, String path, MetadataContext metaContext) throws IOException { - Metadata metadata = new Metadata(fs); + public static ParquetTableMetadataBase readBlockMeta(FileSystem fs, String path, MetadataContext metaContext, ParquetFormatConfig formatConfig) throws IOException { + Metadata metadata = new Metadata(fs, formatConfig); metadata.readBlockMeta(path, false, metaContext); return metadata.parquetTableMetadata; } - public static ParquetTableMetadataDirs readMetadataDirs(FileSystem fs, String path, MetadataContext metaContext) throws IOException { - Metadata metadata = new Metadata(fs); + public static ParquetTableMetadataDirs readMetadataDirs(FileSystem fs, String path, MetadataContext metaContext, ParquetFormatConfig formatConfig) throws IOException { + Metadata metadata = new Metadata(fs, formatConfig); metadata.readBlockMeta(path, true, metaContext); return metadata.parquetTableMetadataDirs; } - private Metadata(FileSystem fs) { + private Metadata(FileSystem fs, ParquetFormatConfig formatConfig) { this.fs = ImpersonationUtil.createFileSystem(ImpersonationUtil.getProcessUserName(), fs.getConf()); + this.formatConfig = formatConfig; } /** @@ -345,6 +349,10 @@ private ParquetFileMetadata_v2 getParquetFileMetadata_v2(ParquetTableMetadata_v2 List rowGroupMetadataList = Lists.newArrayList(); + ArrayList ALL_COLS = new ArrayList<>(); + ALL_COLS.add(AbstractRecordReader.STAR_COLUMN); + boolean autoCorrectCorruptDates = formatConfig.autoCorrectCorruptDates; + ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = ParquetReaderUtility.detectCorruptDates(metadata, ALL_COLS, autoCorrectCorruptDates); for (BlockMetaData rowGroup : metadata.getBlocks()) { List columnMetadataList = Lists.newArrayList(); long length = 0; @@ -367,9 +375,13 @@ private ParquetFileMetadata_v2 getParquetFileMetadata_v2(ParquetTableMetadata_v2 if (statsAvailable) { // Write stats only if minVal==maxVal. Also, we then store only maxVal Object mxValue = null; - if (stats.genericGetMax() != null && stats.genericGetMin() != null && stats.genericGetMax() - .equals(stats.genericGetMin())) { + if (stats.genericGetMax() != null && stats.genericGetMin() != null && + stats.genericGetMax().equals(stats.genericGetMin())) { mxValue = stats.genericGetMax(); + if (containsCorruptDates == ParquetReaderUtility.DateCorruptionStatus.META_SHOWS_CORRUPTION + && columnTypeMetadata.originalType == OriginalType.DATE) { + mxValue = ParquetReaderUtility.autoCorrectCorruptedDate((Integer) mxValue); + } } columnMetadata = new ColumnMetadata_v2(columnTypeMetadata.name, col.getType(), mxValue, stats.getNumNulls()); @@ -521,7 +533,6 @@ private void readBlockMeta(String path, * Check if the parquet metadata needs to be updated by comparing the modification time of the directories with * the modification time of the metadata file * - * @param tableMetadata * @param metaFilePath * @return * @throws IOException @@ -585,6 +596,7 @@ public static abstract class ParquetTableMetadataBase { @JsonIgnore public abstract OriginalType getOriginalType(String[] columnName); @JsonIgnore public abstract ParquetTableMetadataBase clone(); + @JsonIgnore public abstract String getDrillVersion(); } public static abstract class ParquetFileMetadata { @@ -618,6 +630,24 @@ public static abstract class ColumnMetadata { public abstract Object getMaxValue(); + /** + * Set the max value recorded in the parquet metadata statistics. + * + * This object would just be immutable, but due to Drill-4203 we need to correct + * date values that had been corrupted by earlier versions of Drill. + * @return + */ + public abstract void setMax(Object newMax); + + /** + * Set the max value recorded in the parquet metadata statistics. + * + * This object would just be immutable, but due to Drill-4203 we need to correct + * date values that had been corrupted by earlier versions of Drill. + * @return + */ + public abstract void setMin(Object newMax); + public abstract PrimitiveTypeName getPrimitiveType(); public abstract OriginalType getOriginalType(); @@ -681,6 +711,10 @@ public ParquetTableMetadata_v1(List files, List @JsonIgnore @Override public ParquetTableMetadataBase clone() { return new ParquetTableMetadata_v1(files, directories); } + @Override + public String getDrillVersion() { + return null; + } } @@ -870,7 +904,6 @@ public void setMax(Object max) { return max; } - } /** @@ -885,9 +918,10 @@ public void setMax(Object max) { @JsonProperty public ConcurrentHashMap columnTypeInfo; @JsonProperty List files; @JsonProperty List directories; + @JsonProperty String drillVersion; public ParquetTableMetadata_v2() { - super(); + this.drillVersion = DrillVersionInfo.getVersion(); } public ParquetTableMetadata_v2(ParquetTableMetadataBase parquetTable, @@ -895,6 +929,7 @@ public ParquetTableMetadata_v2(ParquetTableMetadataBase parquetTable, this.files = files; this.directories = directories; this.columnTypeInfo = ((ParquetTableMetadata_v2) parquetTable).columnTypeInfo; + this.drillVersion = DrillVersionInfo.getVersion(); } public ParquetTableMetadata_v2(List files, List directories, @@ -935,6 +970,11 @@ public ColumnTypeMetadata_v2 getColumnTypeInfo(String[] name) { @JsonIgnore @Override public ParquetTableMetadataBase clone() { return new ParquetTableMetadata_v2(files, directories, columnTypeInfo); } + @Override + public String getDrillVersion() { + return drillVersion; + } + } @@ -1141,6 +1181,11 @@ public boolean hasSingleValue() { return mxValue; } + @Override + public void setMin(Object newMin) { + // noop - min value not stored in this version of the metadata + } + @Override public PrimitiveTypeName getPrimitiveType() { return null; } diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatConfig.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatConfig.java index 74a90c06dc2..9ba03df9753 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatConfig.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatConfig.java @@ -17,6 +17,7 @@ */ package org.apache.drill.exec.store.parquet; +import com.fasterxml.jackson.annotation.JsonProperty; import org.apache.drill.common.logical.FormatPluginConfig; import com.fasterxml.jackson.annotation.JsonTypeName; @@ -24,14 +25,25 @@ @JsonTypeName("parquet") public class ParquetFormatConfig implements FormatPluginConfig{ + public boolean autoCorrectCorruptDates = true; + @Override - public int hashCode() { - return 7; + public boolean equals(Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + + ParquetFormatConfig that = (ParquetFormatConfig) o; + + return autoCorrectCorruptDates == that.autoCorrectCorruptDates; + } @Override - public boolean equals(Object obj) { - return obj instanceof ParquetFormatConfig; + public int hashCode() { + return (autoCorrectCorruptDates ? 1 : 0); } - } diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java index 1ab621bca8c..f17d414e986 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java @@ -92,7 +92,7 @@ public ParquetFormatPlugin(String name, DrillbitContext context, Configuration f StoragePluginConfig storageConfig, ParquetFormatConfig formatConfig){ this.context = context; this.config = formatConfig; - this.formatMatcher = new ParquetFormatMatcher(this); + this.formatMatcher = new ParquetFormatMatcher(this, config); this.storageConfig = storageConfig; this.fsConf = fsConf; this.name = name == null ? DEFAULT_NAME : name; @@ -196,8 +196,11 @@ public FormatMatcher getMatcher() { private static class ParquetFormatMatcher extends BasicFormatMatcher{ - public ParquetFormatMatcher(ParquetFormatPlugin plugin) { + private final ParquetFormatConfig formatConfig; + + public ParquetFormatMatcher(ParquetFormatPlugin plugin, ParquetFormatConfig formatConfig) { super(plugin, PATTERNS, MAGIC_STRINGS); + this.formatConfig = formatConfig; } @Override @@ -218,7 +221,7 @@ public DrillTable isReadable(DrillFileSystem fs, FileSelection selection, // create a metadata context that will be used for the duration of the query for this table MetadataContext metaContext = new MetadataContext(); - ParquetTableMetadataDirs mDirs = Metadata.readMetadataDirs(fs, dirMetaPath.toString(), metaContext); + ParquetTableMetadataDirs mDirs = Metadata.readMetadataDirs(fs, dirMetaPath.toString(), metaContext, formatConfig); if (mDirs.getDirectories().size() > 0) { FileSelection dirSelection = FileSelection.createFromDirectories(mDirs.getDirectories(), selection, selection.getSelectionRoot() /* cacheFileRoot initially points to selectionRoot */); diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java index ec34e7a8980..649282b2962 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java @@ -43,7 +43,6 @@ import org.apache.drill.exec.physical.base.ScanStats; import org.apache.drill.exec.physical.base.ScanStats.GroupScanProperty; import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint; -import org.apache.drill.exec.store.ParquetOutputRecordWriter; import org.apache.drill.exec.store.StoragePluginRegistry; import org.apache.drill.exec.store.dfs.DrillFileSystem; import org.apache.drill.exec.store.dfs.DrillPathFilter; @@ -81,10 +80,10 @@ import org.apache.drill.exec.vector.ValueVector; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.Path; +import org.joda.time.DateTimeUtils; import org.apache.parquet.io.api.Binary; import org.apache.parquet.schema.OriginalType; import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName; -import org.joda.time.DateTimeUtils; import com.fasterxml.jackson.annotation.JacksonInject; import com.fasterxml.jackson.annotation.JsonCreator; @@ -394,6 +393,7 @@ public MajorType getTypeForColumn(SchemaPath schemaPath) { return columnTypeMap.get(schemaPath); } + // Map from file names to maps of column name to partition value mappings private Map> partitionValueMap = Maps.newHashMap(); public void populatePruningVector(ValueVector v, int index, SchemaPath column, String file) { @@ -479,7 +479,8 @@ public void populatePruningVector(ValueVector v, int index, SchemaPath column, S case DATE: { NullableDateVector dateVector = (NullableDateVector) v; Integer value = (Integer) partitionValueMap.get(f).get(column); - dateVector.getMutator().setSafe(index, DateTimeUtils.fromJulianDay(value - ParquetOutputRecordWriter.JULIAN_DAY_EPOC - 0.5)); + dateVector.getMutator().setSafe(index, DateTimeUtils.fromJulianDay( + value + ParquetReaderUtility.SHIFT_PARQUET_DAY_COUNT_TO_JULIAN_DAY)); return; } case TIME: { @@ -582,7 +583,10 @@ public long getRowCount() { // we only select the files that are part of selection (by setting fileSet appropriately) // get (and set internal field) the metadata for the directory by reading the metadata file - this.parquetTableMetadata = Metadata.readBlockMeta(fs, metaFilePath.toString(), selection.getMetaContext()); + this.parquetTableMetadata = Metadata.readBlockMeta(fs, metaFilePath.toString(), selection.getMetaContext(), formatConfig); + if (formatConfig.autoCorrectCorruptDates) { + ParquetReaderUtility.correctDatesInMetadataCache(this.parquetTableMetadata); + } List fileStatuses = selection.getStatuses(fs); if (fileSet == null) { @@ -616,7 +620,7 @@ public long getRowCount() { if (status.isDirectory()) { //TODO [DRILL-4496] read the metadata cache files in parallel final Path metaPath = new Path(status.getPath(), Metadata.METADATA_FILENAME); - final Metadata.ParquetTableMetadataBase metadata = Metadata.readBlockMeta(fs, metaPath.toString(), selection.getMetaContext()); + final Metadata.ParquetTableMetadataBase metadata = Metadata.readBlockMeta(fs, metaPath.toString(), selection.getMetaContext(), formatConfig); for (Metadata.ParquetFileMetadata file : metadata.getFiles()) { fileSet.add(file.getPath()); } @@ -664,9 +668,11 @@ private void init(MetadataContext metaContext) throws IOException { } if (metaPath != null && fs.exists(metaPath)) { usedMetadataCache = true; - parquetTableMetadata = Metadata.readBlockMeta(fs, metaPath.toString(), metaContext); + if (parquetTableMetadata == null) { + parquetTableMetadata = Metadata.readBlockMeta(fs, metaPath.toString(), metaContext, formatConfig); + } } else { - parquetTableMetadata = Metadata.getParquetTableMetadata(fs, p.toString()); + parquetTableMetadata = Metadata.getParquetTableMetadata(fs, p.toString(), formatConfig); } } else { Path p = Path.getPathWithoutSchemeAndAuthority(new Path(selectionRoot)); @@ -674,17 +680,25 @@ private void init(MetadataContext metaContext) throws IOException { if (fs.isDirectory(new Path(selectionRoot)) && fs.exists(metaPath)) { usedMetadataCache = true; if (parquetTableMetadata == null) { - parquetTableMetadata = Metadata.readBlockMeta(fs, metaPath.toString(), metaContext); + parquetTableMetadata = Metadata.readBlockMeta(fs, metaPath.toString(), metaContext, formatConfig); } if (fileSet != null) { - parquetTableMetadata = removeUnneededRowGroups(parquetTableMetadata); + if (parquetTableMetadata == null) { + parquetTableMetadata = removeUnneededRowGroups(Metadata.readBlockMeta(fs, metaPath.toString(), metaContext, formatConfig)); + } else { + parquetTableMetadata = removeUnneededRowGroups(parquetTableMetadata); + } + } else { + if (parquetTableMetadata == null) { + parquetTableMetadata = Metadata.readBlockMeta(fs, metaPath.toString(), metaContext, formatConfig); + } } } else { final List fileStatuses = Lists.newArrayList(); for (ReadEntryWithPath entry : entries) { getFiles(entry.getPath(), fileStatuses); } - parquetTableMetadata = Metadata.getParquetTableMetadata(fs, fileStatuses); + parquetTableMetadata = Metadata.getParquetTableMetadata(fs, fileStatuses, formatConfig); } } diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java index 2f56aa03785..9d0886f8809 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java @@ -18,9 +18,32 @@ package org.apache.drill.exec.store.parquet; import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.common.expression.PathSegment; +import org.apache.drill.common.expression.SchemaPath; import org.apache.drill.exec.planner.physical.PlannerSettings; import org.apache.drill.exec.server.options.OptionManager; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.ParquetOutputRecordWriter; import org.apache.drill.exec.work.ExecErrorConstants; +import org.apache.parquet.SemanticVersion; +import org.apache.parquet.VersionParser; +import org.apache.parquet.column.ColumnDescriptor; +import org.apache.parquet.column.statistics.Statistics; +import org.apache.parquet.format.ConvertedType; +import org.apache.parquet.format.FileMetaData; +import org.apache.parquet.format.SchemaElement; +import org.apache.parquet.format.converter.ParquetMetadataConverter; +import org.apache.parquet.hadoop.ParquetFileWriter; +import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData; +import org.apache.parquet.hadoop.metadata.ColumnPath; +import org.apache.parquet.hadoop.metadata.ParquetMetadata; +import org.apache.parquet.schema.OriginalType; +import org.joda.time.Chronology; +import org.joda.time.DateTimeUtils; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; /* * Utility class where we can capture common logic between the two parquet readers @@ -28,6 +51,29 @@ public class ParquetReaderUtility { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ParquetReaderUtility.class); + // Note the negation symbol in the beginning + public static final double CORRECT_CORRUPT_DATE_SHIFT = -ParquetOutputRecordWriter.JULIAN_DAY_EPOC - 0.5; + public static final double SHIFT_PARQUET_DAY_COUNT_TO_JULIAN_DAY = ParquetOutputRecordWriter.JULIAN_DAY_EPOC - 0.5; + // The year 5000 is the threshold for auto-detecting date corruption. + // This balances two possible cases of bad auto-correction. External tools writing dates in the future will not + // be shifted unless they are past this threshold (and we cannot identify them as external files based on the metadata). + // On the other hand, historical dates written with Drill wouldn't risk being incorrectly shifted unless they were + // something like 10,000 years in the past. + private static final Chronology UTC = org.joda.time.chrono.ISOChronology.getInstanceUTC(); + public static final int DATE_CORRUPTION_THRESHOLD = + (int) (DateTimeUtils.toJulianDayNumber(UTC.getDateTimeMillis(5000, 1, 1, 0)) - ParquetOutputRecordWriter.JULIAN_DAY_EPOC); + + /** + * For most recently created parquet files, we can determine if we have corrupted dates (see DRILL-4203) + * based on the file metadata. For older files that lack statistics we must actually test the values + * in the data pages themselves to see if they are likely corrupt. + */ + public enum DateCorruptionStatus { + META_SHOWS_CORRUPTION, // metadata can determine if the values are definitely CORRUPT + META_SHOWS_NO_CORRUPTION, // metadata can determine if the values are definitely CORRECT + META_UNCLEAR_TEST_VALUES // not enough info in metadata, parquet reader must test individual values + } + public static void checkDecimalTypeEnabled(OptionManager options) { if (options.getOption(PlannerSettings.ENABLE_DECIMAL_DATA_TYPE_KEY).bool_val == false) { throw UserException.unsupportedError() @@ -45,4 +91,190 @@ public static int getIntFromLEBytes(byte[] input, int start) { } return out; } + + public static Map getColNameToSchemaElementMapping(ParquetMetadata footer) { + HashMap schemaElements = new HashMap<>(); + FileMetaData fileMetaData = new ParquetMetadataConverter().toParquetMetadata(ParquetFileWriter.CURRENT_VERSION, footer); + for (SchemaElement se : fileMetaData.getSchema()) { + schemaElements.put(se.getName(), se); + } + return schemaElements; + } + + public static int autoCorrectCorruptedDate(int corruptedDate) { + return (int) (corruptedDate - 2 * ParquetOutputRecordWriter.JULIAN_DAY_EPOC); + } + + public static void correctDatesInMetadataCache(Metadata.ParquetTableMetadataBase parquetTableMetadata) { + DateCorruptionStatus cacheFileContainsCorruptDates; + String drillVersionStr = parquetTableMetadata.getDrillVersion(); + if (drillVersionStr != null) { + try { + cacheFileContainsCorruptDates = ParquetReaderUtility.drillVersionHasCorruptedDates(drillVersionStr); + } catch (VersionParser.VersionParseException e) { + cacheFileContainsCorruptDates = DateCorruptionStatus.META_SHOWS_CORRUPTION; + } + } else { + cacheFileContainsCorruptDates = DateCorruptionStatus.META_SHOWS_CORRUPTION; + } + if (cacheFileContainsCorruptDates == DateCorruptionStatus.META_SHOWS_CORRUPTION) { + for (Metadata.ParquetFileMetadata file : parquetTableMetadata.getFiles()) { + // Drill has only ever written a single row group per file, only need to correct the statistics + // on the first row group + Metadata.RowGroupMetadata rowGroupMetadata = file.getRowGroups().get(0); + for (Metadata.ColumnMetadata columnMetadata : rowGroupMetadata.getColumns()) { + OriginalType originalType = columnMetadata.getOriginalType(); + if (originalType != null && originalType.equals(OriginalType.DATE) && + columnMetadata.hasSingleValue() && + (Integer) columnMetadata.getMaxValue() > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) { + int newMinMax = ParquetReaderUtility.autoCorrectCorruptedDate((Integer)columnMetadata.getMaxValue()); + columnMetadata.setMax(newMinMax); + columnMetadata.setMin(newMinMax); + } + } + } + } + } + + /** + * Check for corrupted dates in a parquet file. See Drill-4203 + * @param footer + * @param columns + * @return + */ + public static DateCorruptionStatus detectCorruptDates(ParquetMetadata footer, + List columns, + boolean autoCorrectCorruptDates) { + // old drill files have "parquet-mr" as created by string, and no drill version, need to check min/max values to see + // if they look corrupt + // - option to disable this auto-correction based on the date values, in case users are storing these + // dates intentionally + + // migrated parquet files have 1.8.1 parquet-mr version with drill-r0 in the part of the name usually containing "SNAPSHOT" + + // new parquet files 1.4+ have drill version number + // - below 1.9.0 dates are corrupt + // - this includes 1.9.0-SNAPSHOT + + String drillVersion = footer.getFileMetaData().getKeyValueMetaData().get(ParquetRecordWriter.DRILL_VERSION_PROPERTY); + String createdBy = footer.getFileMetaData().getCreatedBy(); + try { + if (drillVersion == null) { + // Possibly an old, un-migrated Drill file, check the column statistics to see if min/max values look corrupt + // only applies if there is a date column selected + if (createdBy.equals("parquet-mr")) { + // loop through parquet column metadata to find date columns, check for corrupt valuues + return checkForCorruptDateValuesInStatistics(footer, columns, autoCorrectCorruptDates); + } else { + // check the created by to see if it is a migrated Drill file + VersionParser.ParsedVersion parsedCreatedByVersion = VersionParser.parse(createdBy); + // check if this is a migrated Drill file, lacking a Drill version number, but with + // "drill" in the parquet created-by string + SemanticVersion semVer = parsedCreatedByVersion.getSemanticVersion(); + String pre = semVer.pre + ""; + if (semVer != null && semVer.major == 1 && semVer.minor == 8 && semVer.patch == 1 && pre.contains("drill")) { + return DateCorruptionStatus.META_SHOWS_CORRUPTION; + } else { + // written by a tool that wasn't Drill, the dates are not corrupted + return DateCorruptionStatus.META_SHOWS_NO_CORRUPTION; + } + } + } else { + // this parser expects an application name before the semantic version, just prepending Drill + // we know from the property name "drill.version" that we wrote this + return drillVersionHasCorruptedDates(drillVersion); + } + } catch (VersionParser.VersionParseException e) { + // Default value of "false" if we cannot parse the version is fine, we are covering all + // of the metadata values produced by historical versions of Drill + // If Drill didn't write it the dates should be fine + return DateCorruptionStatus.META_SHOWS_CORRUPTION; + } + } + + public static DateCorruptionStatus drillVersionHasCorruptedDates(String drillVersion) throws VersionParser.VersionParseException { + VersionParser.ParsedVersion parsedDrillVersion = parseDrillVersion(drillVersion); + SemanticVersion semVer = parsedDrillVersion.getSemanticVersion(); + if (semVer == null || semVer.compareTo(new SemanticVersion(1, 9, 0)) < 0) { + return DateCorruptionStatus.META_SHOWS_CORRUPTION; + } else { + return DateCorruptionStatus.META_SHOWS_NO_CORRUPTION; + } + + } + + public static VersionParser.ParsedVersion parseDrillVersion(String drillVersion) throws VersionParser.VersionParseException { + return VersionParser.parse("drill version " + drillVersion + " (build 1234)"); + } + + /** + * Detect corrupt date values by looking at the min/max values in the metadata. + * + * This should only be used when a file does not have enough metadata to determine if + * the data was written with an older version of Drill, or an external tool. Drill + * versions 1.3 and beyond should have enough metadata to confirm that the data was written + * by Drill. + * + * This method only checks the first Row Group, because Drill has only ever written + * a single Row Group per file. + * + * @param footer + * @param columns + * @param autoCorrectCorruptDates user setting to allow enabling/disabling of auto-correction + * of corrupt dates. There are some rare cases (storing dates thousands + * of years into the future, with tools other than Drill writing files) + * that would result in the date values being "corrected" into bad values. + * @return + */ + public static DateCorruptionStatus checkForCorruptDateValuesInStatistics(ParquetMetadata footer, + List columns, + boolean autoCorrectCorruptDates) { + // Users can turn-off date correction in cases where we are detecting corruption based on the date values + // that are unlikely to appear in common datasets. In this case report that no correction needs to happen + // during the file read + if (! autoCorrectCorruptDates) { + return DateCorruptionStatus.META_SHOWS_NO_CORRUPTION; + } + // Drill produced files have only ever have a single row group, if this changes in the future it won't matter + // as we will know from the Drill version written in the files that the dates are correct + int rowGroupIndex = 0; + Map schemaElements = ParquetReaderUtility.getColNameToSchemaElementMapping(footer); + findDateColWithStatsLoop : for (SchemaPath schemaPath : columns) { + List parquetColumns = footer.getFileMetaData().getSchema().getColumns(); + for (int i = 0; i < parquetColumns.size(); ++i) { + ColumnDescriptor column = parquetColumns.get(i); + // this reader only supports flat data, this is restricted in the ParquetScanBatchCreator + // creating a NameSegment makes sure we are using the standard code for comparing names, + // currently it is all case-insensitive + if (AbstractRecordReader.isStarQuery(columns) || new PathSegment.NameSegment(column.getPath()[0]).equals(schemaPath.getRootSegment())) { + int colIndex = -1; + ConvertedType convertedType = schemaElements.get(column.getPath()[0]).getConverted_type(); + if (convertedType != null && convertedType.equals(ConvertedType.DATE)) { + List colChunkList = footer.getBlocks().get(rowGroupIndex).getColumns(); + for (int j = 0; j < colChunkList.size(); j++) { + if (colChunkList.get(j).getPath().equals(ColumnPath.get(column.getPath()))) { + colIndex = j; + break; + } + } + } + if (colIndex == -1) { + // column does not appear in this file, skip it + continue; + } + Statistics statistics = footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics(); + Integer max = (Integer) statistics.genericGetMax(); + if (statistics.hasNonNullValue()) { + if (max > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) { + return DateCorruptionStatus.META_SHOWS_CORRUPTION; + } + } else { + // no statistics, go check the first page + return DateCorruptionStatus.META_UNCLEAR_TEST_VALUES; + } + } + } + } + return DateCorruptionStatus.META_SHOWS_NO_CORRUPTION; + } } diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java index 6c7bc41ddfe..8f7ace13234 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java @@ -104,6 +104,9 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS logger.trace("ParquetTrace,Read Footer,{},{},{},{},{},{},{}", "", e.getPath(), "", 0, 0, 0, timeToRead); footers.put(e.getPath(), footer ); } + boolean autoCorrectCorruptDates = rowGroupScan.formatConfig.autoCorrectCorruptDates; + ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = ParquetReaderUtility.detectCorruptDates(footers.get(e.getPath()), rowGroupScan.getColumns(), + autoCorrectCorruptDates); if (!context.getOptions().getOption(ExecConstants.PARQUET_NEW_RECORD_READER).bool_val && !isComplex(footers.get(e.getPath()))) { readers.add( new ParquetRecordReader( @@ -112,12 +115,13 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS fs.getConf(), new ParquetDirectByteBufferAllocator(oContext.getAllocator()), 0), footers.get(e.getPath()), - rowGroupScan.getColumns() + rowGroupScan.getColumns(), + containsCorruptDates ) ); } else { ParquetMetadata footer = footers.get(e.getPath()); - readers.add(new DrillParquetReader(context, footer, e, columnExplorer.getTableColumns(), fs)); + readers.add(new DrillParquetReader(context, footer, e, columnExplorer.getTableColumns(), fs, containsCorruptDates)); } Map implicitValues = columnExplorer.populateImplicitColumns(e, rowGroupScan.getSelectionRoot()); diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java index e38c51cd49f..ea65615307a 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java @@ -95,7 +95,19 @@ static ColumnReader createFixedColumnReader(ParquetRecordReader recordReader, return new FixedByteAlignedReader.FixedBinaryReader(recordReader, allocateSize, descriptor, columnChunkMetaData, (VariableWidthVector) v, schemaElement); } } else if (columnChunkMetaData.getType() == PrimitiveType.PrimitiveTypeName.INT32 && convertedType == ConvertedType.DATE){ - return new FixedByteAlignedReader.DateReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (DateVector) v, schemaElement); + switch(recordReader.getDateCorruptionStatus()) { + case META_SHOWS_CORRUPTION: + return new FixedByteAlignedReader.CorruptDateReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (DateVector) v, schemaElement); + case META_SHOWS_NO_CORRUPTION: + return new FixedByteAlignedReader.DateReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (DateVector) v, schemaElement); + case META_UNCLEAR_TEST_VALUES: + return new FixedByteAlignedReader.CorruptionDetectingDateReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (DateVector) v, schemaElement); + default: + throw new ExecutionSetupException( + String.format("Issue setting up parquet reader for date type, " + + "unrecognized date corruption status %s. See DRILL-4203 for more info.", + recordReader.getDateCorruptionStatus())); + } } else{ if (columnChunkMetaData.getEncodings().contains(Encoding.PLAIN_DICTIONARY)) { switch (columnChunkMetaData.getType()) { @@ -144,7 +156,19 @@ static ColumnReader createFixedColumnReader(ParquetRecordReader recordReader, return new NullableBitReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (NullableBitVector) v, schemaElement); } else if (columnChunkMetaData.getType() == PrimitiveType.PrimitiveTypeName.INT32 && convertedType == ConvertedType.DATE){ - return new NullableFixedByteAlignedReaders.NullableDateReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (NullableDateVector) v, schemaElement); + switch(recordReader.getDateCorruptionStatus()) { + case META_SHOWS_CORRUPTION: + return new NullableFixedByteAlignedReaders.NullableCorruptDateReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (NullableDateVector)v, schemaElement); + case META_SHOWS_NO_CORRUPTION: + return new NullableFixedByteAlignedReaders.NullableDateReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (NullableDateVector) v, schemaElement); + case META_UNCLEAR_TEST_VALUES: + return new NullableFixedByteAlignedReaders.CorruptionDetectingNullableDateReader(recordReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, (NullableDateVector) v, schemaElement); + default: + throw new ExecutionSetupException( + String.format("Issue setting up parquet reader for date type, " + + "unrecognized date corruption status %s. See DRILL-4203 for more info.", + recordReader.getDateCorruptionStatus())); + } } else if (columnChunkMetaData.getType() == PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY) { if (convertedType == ConvertedType.DECIMAL) { int length = schemaElement.type_length; diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedByteAlignedReader.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedByteAlignedReader.java index d4b43d86c04..cccb06fbc3e 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedByteAlignedReader.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/FixedByteAlignedReader.java @@ -22,7 +22,6 @@ import org.apache.drill.common.exceptions.ExecutionSetupException; import org.apache.drill.exec.expr.holders.Decimal28SparseHolder; import org.apache.drill.exec.expr.holders.Decimal38SparseHolder; -import org.apache.drill.exec.store.ParquetOutputRecordWriter; import org.apache.drill.exec.store.parquet.ParquetReaderUtility; import org.apache.drill.exec.util.DecimalUtility; import org.apache.drill.exec.vector.DateVector; @@ -119,9 +118,11 @@ public void writeData() { public static class DateReader extends ConvertedReader { + private final DateVector.Mutator mutator; DateReader(ParquetRecordReader parentReader, int allocateSize, ColumnDescriptor descriptor, ColumnChunkMetaData columnChunkMetaData, boolean fixedLength, DateVector v, SchemaElement schemaElement) throws ExecutionSetupException { super(parentReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, v, schemaElement); + mutator = v.getMutator(); } @Override @@ -133,7 +134,67 @@ void addNext(int start, int index) { intValue = readIntLittleEndian(bytebuf, start); } - valueVec.getMutator().set(index, DateTimeUtils.fromJulianDay(intValue - ParquetOutputRecordWriter.JULIAN_DAY_EPOC - 0.5)); + mutator.set(index, DateTimeUtils.fromJulianDay(intValue + ParquetReaderUtility.SHIFT_PARQUET_DAY_COUNT_TO_JULIAN_DAY)); + } + } + + /** + * Old versions of Drill were writing a non-standard format for date. See DRILL-4203 + */ + public static class CorruptDateReader extends ConvertedReader { + + private final DateVector.Mutator mutator; + + CorruptDateReader(ParquetRecordReader parentReader, int allocateSize, ColumnDescriptor descriptor, ColumnChunkMetaData columnChunkMetaData, + boolean fixedLength, DateVector v, SchemaElement schemaElement) throws ExecutionSetupException { + super(parentReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, v, schemaElement); + mutator = v.getMutator(); + } + + @Override + void addNext(int start, int index) { + int intValue; + if (usingDictionary) { + intValue = pageReader.dictionaryValueReader.readInteger(); + } else { + intValue = readIntLittleEndian(bytebuf, start); + } + + mutator.set(index, DateTimeUtils.fromJulianDay(intValue + ParquetReaderUtility.CORRECT_CORRUPT_DATE_SHIFT)); + } + + } + + /** + * Old versions of Drill were writing a non-standard format for date. See DRILL-4203 + *

+ * For files that lack enough metadata to determine if the dates are corrupt, we must just + * correct values when they look corrupt during this low level read. + */ + public static class CorruptionDetectingDateReader extends ConvertedReader { + + private final DateVector.Mutator mutator; + + CorruptionDetectingDateReader(ParquetRecordReader parentReader, int allocateSize, ColumnDescriptor descriptor, ColumnChunkMetaData columnChunkMetaData, + boolean fixedLength, DateVector v, SchemaElement schemaElement) throws ExecutionSetupException { + super(parentReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, v, schemaElement); + mutator = v.getMutator(); + } + + @Override + void addNext(int start, int index) { + int intValue; + if (usingDictionary) { + intValue = pageReader.dictionaryValueReader.readInteger(); + } else { + intValue = readIntLittleEndian(bytebuf, start); + } + + if (intValue > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) { + mutator.set(index, DateTimeUtils.fromJulianDay(intValue + ParquetReaderUtility.CORRECT_CORRUPT_DATE_SHIFT)); + } else { + mutator.set(index, DateTimeUtils.fromJulianDay(intValue + ParquetReaderUtility.SHIFT_PARQUET_DAY_COUNT_TO_JULIAN_DAY)); + } } } diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java index 800d4225c67..10e0c72e400 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java @@ -23,7 +23,6 @@ import org.apache.drill.common.exceptions.ExecutionSetupException; import org.apache.drill.exec.expr.holders.NullableDecimal28SparseHolder; import org.apache.drill.exec.expr.holders.NullableDecimal38SparseHolder; -import org.apache.drill.exec.store.ParquetOutputRecordWriter; import org.apache.drill.exec.store.parquet.ParquetReaderUtility; import org.apache.drill.exec.util.DecimalUtility; import org.apache.drill.exec.vector.NullableBigIntVector; @@ -328,12 +327,72 @@ void addNext(int start, int index) { intValue = readIntLittleEndian(bytebuf, start); } - valueVec.getMutator().set(index, DateTimeUtils.fromJulianDay(intValue - ParquetOutputRecordWriter.JULIAN_DAY_EPOC - 0.5)); + valueVec.getMutator().set(index, DateTimeUtils.fromJulianDay(intValue + ParquetReaderUtility.SHIFT_PARQUET_DAY_COUNT_TO_JULIAN_DAY)); } } + /** + * Old versions of Drill were writing a non-standard format for date. See DRILL-4203 + */ + public static class NullableCorruptDateReader extends NullableConvertedReader { + + NullableCorruptDateReader(ParquetRecordReader parentReader, int allocateSize, ColumnDescriptor descriptor, ColumnChunkMetaData columnChunkMetaData, + boolean fixedLength, NullableDateVector v, SchemaElement schemaElement) throws ExecutionSetupException { + super(parentReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, v, schemaElement); + } + + @Override + void addNext(int start, int index) { + int intValue; + if (usingDictionary) { + intValue = pageReader.dictionaryValueReader.readInteger(); + } else { + intValue = readIntLittleEndian(bytebuf, start); + } + + valueVec.getMutator().set(index, DateTimeUtils.fromJulianDay(intValue + ParquetReaderUtility.CORRECT_CORRUPT_DATE_SHIFT)); + } + + } + + /** + * Old versions of Drill were writing a non-standard format for date. See DRILL-4203 + * + * For files that lack enough metadata to determine if the dates are corrupt, we must just + * correct values when they look corrupt during this low level read. + */ + public static class CorruptionDetectingNullableDateReader extends NullableConvertedReader { + + NullableDateVector dateVector; + + CorruptionDetectingNullableDateReader(ParquetRecordReader parentReader, int allocateSize, + ColumnDescriptor descriptor, ColumnChunkMetaData columnChunkMetaData, + boolean fixedLength, NullableDateVector v, SchemaElement schemaElement) + throws ExecutionSetupException { + super(parentReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, v, schemaElement); + dateVector = (NullableDateVector) v; + } + + @Override + void addNext(int start, int index) { + int intValue; + if (usingDictionary) { + intValue = pageReader.dictionaryValueReader.readInteger(); + } else { + intValue = readIntLittleEndian(bytebuf, start); + } + + if (intValue > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) { + dateVector.getMutator().set(index, DateTimeUtils.fromJulianDay(intValue + ParquetReaderUtility.CORRECT_CORRUPT_DATE_SHIFT)); + } else { + dateVector.getMutator().set(index, DateTimeUtils.fromJulianDay(intValue + ParquetReaderUtility.SHIFT_PARQUET_DAY_COUNT_TO_JULIAN_DAY)); + } + } + } + public static class NullableDecimal28Reader extends NullableConvertedReader { + NullableDecimal28Reader(ParquetRecordReader parentReader, int allocateSize, ColumnDescriptor descriptor, ColumnChunkMetaData columnChunkMetaData, boolean fixedLength, NullableDecimal28SparseVector v, SchemaElement schemaElement) throws ExecutionSetupException { super(parentReader, allocateSize, descriptor, columnChunkMetaData, fixedLength, v, schemaElement); diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java index 23c0759c70c..99cf0f56065 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java @@ -27,6 +27,7 @@ import com.google.common.collect.ImmutableList; import org.apache.drill.common.exceptions.DrillRuntimeException; import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.PathSegment; import org.apache.drill.common.expression.SchemaPath; import org.apache.drill.common.types.TypeProtos; import org.apache.drill.common.types.TypeProtos.DataMode; @@ -40,13 +41,19 @@ import org.apache.drill.exec.record.MaterializedField; import org.apache.drill.exec.store.AbstractRecordReader; import org.apache.drill.exec.store.parquet.ParquetReaderStats; +import org.apache.drill.exec.store.parquet.ParquetReaderUtility; +import org.apache.drill.exec.store.parquet.ParquetRecordWriter; import org.apache.drill.exec.vector.AllocationHelper; import org.apache.drill.exec.vector.NullableIntVector; import org.apache.drill.exec.vector.ValueVector; import org.apache.drill.exec.vector.complex.RepeatedValueVector; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; +import org.apache.parquet.SemanticVersion; +import org.apache.parquet.VersionParser; import org.apache.parquet.column.ColumnDescriptor; +import org.apache.parquet.column.statistics.Statistics; +import org.apache.parquet.format.ConvertedType; import org.apache.parquet.format.FileMetaData; import org.apache.parquet.format.SchemaElement; import org.apache.parquet.format.converter.ParquetMetadataConverter; @@ -109,18 +116,21 @@ public class ParquetRecordReader extends AbstractRecordReader { int rowGroupIndex; long totalRecordsRead; private final FragmentContext fragmentContext; + ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus; public ParquetReaderStats parquetReaderStats = new ParquetReaderStats(); public ParquetRecordReader(FragmentContext fragmentContext, - String path, - int rowGroupIndex, - FileSystem fs, - CodecFactory codecFactory, - ParquetMetadata footer, - List columns) throws ExecutionSetupException { + String path, + int rowGroupIndex, + FileSystem fs, + CodecFactory codecFactory, + ParquetMetadata footer, + List columns, + ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus) + throws ExecutionSetupException { this(fragmentContext, DEFAULT_BATCH_LENGTH_IN_BITS, path, rowGroupIndex, fs, codecFactory, footer, - columns); + columns, dateCorruptionStatus); } public ParquetRecordReader( @@ -131,17 +141,29 @@ public ParquetRecordReader( FileSystem fs, CodecFactory codecFactory, ParquetMetadata footer, - List columns) throws ExecutionSetupException { + List columns, + ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus) throws ExecutionSetupException { this.hadoopPath = new Path(path); this.fileSystem = fs; this.codecFactory = codecFactory; this.rowGroupIndex = rowGroupIndex; this.batchSize = batchSize; this.footer = footer; + this.dateCorruptionStatus = dateCorruptionStatus; this.fragmentContext = fragmentContext; setColumns(columns); } + /** + * Flag indicating if the old non-standard data format appears + * in this file, see DRILL-4203. + * + * @return true if the dates are corrupted and need to be corrected + */ + public ParquetReaderUtility.DateCorruptionStatus getDateCorruptionStatus() { + return dateCorruptionStatus; + } + public CodecFactory getCodecFactory() { return codecFactory; } @@ -207,6 +229,31 @@ public OperatorContext getOperatorContext() { return operatorContext; } + /** + * Returns data type length for a given {@see ColumnDescriptor} and it's corresponding + * {@see SchemaElement}. Neither is enough information alone as the max + * repetition level (indicating if it is an array type) is in the ColumnDescriptor and + * the length of a fixed width field is stored at the schema level. + * + * @param column + * @param se + * @return the length if fixed width, else -1 + */ + private int getDataTypeLength(ColumnDescriptor column, SchemaElement se) { + if (column.getType() != PrimitiveType.PrimitiveTypeName.BINARY) { + if (column.getMaxRepetitionLevel() > 0) { + return -1; + } + if (column.getType() == PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY) { + return se.getType_length() * 8; + } else { + return getTypeLengthInBits(column.getType()); + } + } else { + return -1; + } + } + @Override public void setup(OperatorContext operatorContext, OutputMutator output) throws ExecutionSetupException { this.operatorContext = operatorContext; @@ -233,16 +280,11 @@ public void setup(OperatorContext operatorContext, OutputMutator output) throws // TODO - figure out how to deal with this better once we add nested reading, note also look where this map is used below // store a map from column name to converted types if they are non-null - HashMap schemaElements = new HashMap<>(); - fileMetaData = new ParquetMetadataConverter().toParquetMetadata(ParquetFileWriter.CURRENT_VERSION, footer); - for (SchemaElement se : fileMetaData.getSchema()) { - schemaElements.put(se.getName(), se); - } + Map schemaElements = ParquetReaderUtility.getColNameToSchemaElementMapping(footer); // loop to add up the length of the fixed width columns and build the schema for (int i = 0; i < columns.size(); ++i) { column = columns.get(i); - logger.debug("name: " + fileMetaData.getSchema().get(i).name); SchemaElement se = schemaElements.get(column.getPath()[0]); MajorType mt = ParquetToDrillTypeConverter.toMajorType(column.getType(), se.getType_length(), getDataMode(column), se, fragmentContext.getOptions()); @@ -251,18 +293,11 @@ public void setup(OperatorContext operatorContext, OutputMutator output) throws continue; } columnsToScan++; - // sum the lengths of all of the fixed length fields - if (column.getType() != PrimitiveType.PrimitiveTypeName.BINARY) { - if (column.getMaxRepetitionLevel() > 0) { - allFieldsFixedLength = false; - } - if (column.getType() == PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY) { - bitWidthAllFixedFields += se.getType_length() * 8; - } else { - bitWidthAllFixedFields += getTypeLengthInBits(column.getType()); - } - } else { + int dataTypeLength = getDataTypeLength(column, se); + if (dataTypeLength == -1) { allFieldsFixedLength = false; + } else { + bitWidthAllFixedFields += dataTypeLength; } } // rowGroupOffset = footer.getBlocks().get(rowGroupIndex).getColumns().get(0).getFirstDataPageOffset(); diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetGroupConverter.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetGroupConverter.java index 5bc8ad227fa..32295b97b79 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetGroupConverter.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetGroupConverter.java @@ -25,6 +25,7 @@ import java.util.Iterator; import java.util.List; +import org.apache.drill.common.exceptions.DrillRuntimeException; import org.apache.drill.common.expression.PathSegment; import org.apache.drill.common.expression.SchemaPath; import org.apache.drill.exec.expr.holders.BigIntHolder; @@ -44,7 +45,6 @@ import org.apache.drill.exec.expr.holders.VarCharHolder; import org.apache.drill.exec.physical.impl.OutputMutator; import org.apache.drill.exec.server.options.OptionManager; -import org.apache.drill.exec.store.ParquetOutputRecordWriter; import org.apache.drill.exec.store.parquet.ParquetReaderUtility; import org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader; import org.apache.drill.exec.util.DecimalUtility; @@ -87,16 +87,23 @@ public class DrillParquetGroupConverter extends GroupConverter { private MapWriter mapWriter; private final OutputMutator mutator; private final OptionManager options; + // See DRILL-4203 + private final ParquetReaderUtility.DateCorruptionStatus containsCorruptedDates; - public DrillParquetGroupConverter(OutputMutator mutator, ComplexWriterImpl complexWriter, MessageType schema, Collection columns, OptionManager options) { - this(mutator, complexWriter.rootAsMap(), schema, columns, options); + public DrillParquetGroupConverter(OutputMutator mutator, ComplexWriterImpl complexWriter, MessageType schema, + Collection columns, OptionManager options, + ParquetReaderUtility.DateCorruptionStatus containsCorruptedDates) { + this(mutator, complexWriter.rootAsMap(), schema, columns, options, containsCorruptedDates); } // This function assumes that the fields in the schema parameter are in the same order as the fields in the columns parameter. The // columns parameter may have fields that are not present in the schema, though. - public DrillParquetGroupConverter(OutputMutator mutator, MapWriter mapWriter, GroupType schema, Collection columns, OptionManager options) { + public DrillParquetGroupConverter(OutputMutator mutator, MapWriter mapWriter, GroupType schema, + Collection columns, OptionManager options, + ParquetReaderUtility.DateCorruptionStatus containsCorruptedDates) { this.mapWriter = mapWriter; this.mutator = mutator; + this.containsCorruptedDates = containsCorruptedDates; converters = Lists.newArrayList(); this.options = options; @@ -144,10 +151,12 @@ public DrillParquetGroupConverter(OutputMutator mutator, MapWriter mapWriter, Gr c.add(s); } if (rep != Repetition.REPEATED) { - DrillParquetGroupConverter converter = new DrillParquetGroupConverter(mutator, mapWriter.map(name), type.asGroupType(), c, options); + DrillParquetGroupConverter converter = new DrillParquetGroupConverter( + mutator, mapWriter.map(name), type.asGroupType(), c, options, containsCorruptedDates); converters.add(converter); } else { - DrillParquetGroupConverter converter = new DrillParquetGroupConverter(mutator, mapWriter.list(name).map(), type.asGroupType(), c, options); + DrillParquetGroupConverter converter = new DrillParquetGroupConverter( + mutator, mapWriter.list(name).map(), type.asGroupType(), c, options, containsCorruptedDates); converters.add(converter); } } else { @@ -173,7 +182,19 @@ private PrimitiveConverter getConverterForType(String name, PrimitiveType type) } case DATE: { DateWriter writer = type.getRepetition() == Repetition.REPEATED ? mapWriter.list(name).date() : mapWriter.date(name); - return new DrillDateConverter(writer); + switch(containsCorruptedDates) { + case META_SHOWS_CORRUPTION: + return new DrillCorruptedDateConverter(writer); + case META_SHOWS_NO_CORRUPTION: + return new DrillDateConverter(writer); + case META_UNCLEAR_TEST_VALUES: + return new CorruptionDetectingDateConverter(writer); + default: + throw new DrillRuntimeException( + String.format("Issue setting up parquet reader for date type, " + + "unrecognized date corruption status %s. See DRILL-4203 for more info.", + containsCorruptedDates)); + } } case TIME_MILLIS: { TimeWriter writer = type.getRepetition() == Repetition.REPEATED ? mapWriter.list(name).time() : mapWriter.time(name); @@ -325,6 +346,40 @@ public void addInt(int value) { } } + public static class CorruptionDetectingDateConverter extends PrimitiveConverter { + private DateWriter writer; + private DateHolder holder = new DateHolder(); + + public CorruptionDetectingDateConverter(DateWriter writer) { + this.writer = writer; + } + + @Override + public void addInt(int value) { + if (value > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) { + holder.value = DateTimeUtils.fromJulianDay(value + ParquetReaderUtility.CORRECT_CORRUPT_DATE_SHIFT); + } else { + holder.value = DateTimeUtils.fromJulianDay(value + ParquetReaderUtility.SHIFT_PARQUET_DAY_COUNT_TO_JULIAN_DAY); + } + writer.write(holder); + } + } + + public static class DrillCorruptedDateConverter extends PrimitiveConverter { + private DateWriter writer; + private DateHolder holder = new DateHolder(); + + public DrillCorruptedDateConverter(DateWriter writer) { + this.writer = writer; + } + + @Override + public void addInt(int value) { + holder.value = DateTimeUtils.fromJulianDay(value + ParquetReaderUtility.CORRECT_CORRUPT_DATE_SHIFT); + writer.write(holder); + } + } + public static class DrillDateConverter extends PrimitiveConverter { private DateWriter writer; private DateHolder holder = new DateHolder(); @@ -335,7 +390,7 @@ public DrillDateConverter(DateWriter writer) { @Override public void addInt(int value) { - holder.value = DateTimeUtils.fromJulianDay(value - ParquetOutputRecordWriter.JULIAN_DAY_EPOC - 0.5); + holder.value = DateTimeUtils.fromJulianDay(value + ParquetReaderUtility.SHIFT_PARQUET_DAY_COUNT_TO_JULIAN_DAY); writer.write(holder); } } diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java index 224d6ebc1d7..68d3bbb2f4d 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java @@ -42,6 +42,7 @@ import org.apache.drill.exec.store.AbstractRecordReader; import org.apache.drill.exec.store.dfs.DrillFileSystem; import org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator; +import org.apache.drill.exec.store.parquet.ParquetReaderUtility; import org.apache.drill.exec.store.parquet.RowGroupReadEntry; import org.apache.drill.exec.vector.AllocationHelper; import org.apache.drill.exec.vector.NullableIntVector; @@ -104,9 +105,12 @@ public class DrillParquetReader extends AbstractRecordReader { private List columnsNotFound=null; boolean noColumnsFound = false; // true if none of the columns in the projection list is found in the schema + // See DRILL-4203 + private final ParquetReaderUtility.DateCorruptionStatus containsCorruptedDates; public DrillParquetReader(FragmentContext fragmentContext, ParquetMetadata footer, RowGroupReadEntry entry, - List columns, DrillFileSystem fileSystem) { + List columns, DrillFileSystem fileSystem, ParquetReaderUtility.DateCorruptionStatus containsCorruptedDates) { + this.containsCorruptedDates = containsCorruptedDates; this.footer = footer; this.fileSystem = fileSystem; this.entry = entry; @@ -263,7 +267,7 @@ public void setup(OperatorContext context, OutputMutator output) throws Executio // Discard the columns not found in the schema when create DrillParquetRecordMaterializer, since they have been added to output already. final Collection columns = columnsNotFound == null || columnsNotFound.size() == 0 ? getColumns(): CollectionUtils.subtract(getColumns(), columnsNotFound); recordMaterializer = new DrillParquetRecordMaterializer(output, writer, projection, columns, - fragmentContext.getOptions()); + fragmentContext.getOptions(), containsCorruptedDates); primitiveVectors = writer.getMapVector().getPrimitiveVectors(); recordReader = columnIO.getRecordReader(pageReadStore, recordMaterializer); } diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetRecordMaterializer.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetRecordMaterializer.java index 6b7edc44e01..2d778bd4a09 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetRecordMaterializer.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetRecordMaterializer.java @@ -20,6 +20,7 @@ import org.apache.drill.common.expression.SchemaPath; import org.apache.drill.exec.physical.impl.OutputMutator; import org.apache.drill.exec.server.options.OptionManager; +import org.apache.drill.exec.store.parquet.ParquetReaderUtility; import org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter; import org.apache.parquet.io.api.GroupConverter; @@ -35,9 +36,10 @@ public class DrillParquetRecordMaterializer extends RecordMaterializer { private ComplexWriter complexWriter; public DrillParquetRecordMaterializer(OutputMutator mutator, ComplexWriter complexWriter, MessageType schema, - Collection columns, OptionManager options) { + Collection columns, OptionManager options, + ParquetReaderUtility.DateCorruptionStatus containsCorruptedDates) { this.complexWriter = complexWriter; - root = new DrillParquetGroupConverter(mutator, complexWriter.rootAsMap(), schema, columns, options); + root = new DrillParquetGroupConverter(mutator, complexWriter.rootAsMap(), schema, columns, options, containsCorruptedDates); } public void setPosition(int position) { diff --git a/exec/java-exec/src/test/java/org/apache/drill/DrillTestWrapper.java b/exec/java-exec/src/test/java/org/apache/drill/DrillTestWrapper.java index 9df913931f2..7033be61dd9 100644 --- a/exec/java-exec/src/test/java/org/apache/drill/DrillTestWrapper.java +++ b/exec/java-exec/src/test/java/org/apache/drill/DrillTestWrapper.java @@ -732,8 +732,8 @@ private void compareResults(List> expectedRecords, List + * http://www.apache.org/licenses/LICENSE-2.0 + *

+ * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.writer; + +import org.apache.drill.PlanTestBase; +import org.apache.drill.TestBuilder; +import org.apache.drill.common.util.TestTools; +import org.apache.drill.exec.ExecConstants; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.joda.time.DateTime; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.io.IOException; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * Tests for compatibility reading old parquet files after date corruption + * issue was fixed in DRILL-4203. + * + * Drill was writing non-standard dates into parquet files for all releases + * before 1.9.0. The values have been read by Drill correctly by Drill, but + * external tools like Spark reading the files will see corrupted values for + * all dates that have been written by Drill. + * + * This change corrects the behavior of the Drill parquet writer to correctly + * store dates in the format given in the parquet specification. + * + * To maintain compatibility with old files, the parquet reader code has + * been updated to check for the old format and automatically shift the + * corrupted values into corrected ones automatically. + * + * The test cases included here should ensure that all files produced by + * historical versions of Drill will continue to return the same values they + * had in previous releases. For compatibility with external tools, any old + * files with corrupted dates can be re-written using the CREATE TABLE AS + * command (as the writer will now only produce the specification-compliant + * values, even if after reading out of older corrupt files). + * + * While the old behavior was a consistent shift into an unlikely range + * to be used in a modern database (over 10,000 years in the future), these are still + * valid date values. In the case where these may have been written into + * files intentionally, and we cannot be certain from the metadata if Drill + * produced the files, an option is included to turn off the auto-correction. + * Use of this option is assumed to be extremely unlikely, but it is included + * for completeness. + */ +public class TestCorruptParquetDateCorrection extends PlanTestBase { + + // 4 files are in the directory: + // - one created with the fixed version of the reader, right before 1.9 + // - the code was changed to write the version number 1.9 (without snapshot) into the file + // - for compatibility all 1.9-SNAPSHOT files are read to correct the corrupt dates + // - one from and old version of Drill, before we put in proper created by in metadata + // - this is read properly by looking at a Max value in the file statistics, to see that + // it is way off of a typical date value + // - this behavior will be able to be turned off, but will be on by default + // - one from the 0.6 version of Drill, before files had min/max statistics + // - detecting corrupt values must be deferred to actual data page reading + // - one from 1.4, where there is a proper created-by, but the corruption is present + private static final String MIXED_CORRUPTED_AND_CORRECTED_DATES_PATH = + "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions"; + // partitioned with 1.4.0, date values are known to be corrupt + private static final String CORRUPTED_PARTITIONED_DATES_1_4_0_PATH = + "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203"; + // partitioned with 1.2.0, no certain metadata that these were written with Drill + // the value will be checked to see that they look corrupt and they will be corrected + // by default. Users can use the format plugin option autoCorrectCorruptDates to disable + // this behavior if they have foreign parquet files with valid rare date values that are + // in the similar range as Drill's corrupt values + private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH = + "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2"; + private static final String PARQUET_DATE_FILE_WITH_NULL_FILLED_COLS = + "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet"; + private static final String CORRECTED_PARTITIONED_DATES_1_9_PATH = + "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption"; + private static final String VARCHAR_PARTITIONED = + "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition"; + private static final String DATE_PARTITIONED = + "[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition"; + + private static FileSystem fs; + private static Path path; + static String PARTITIONED_1_2_FOLDER = "partitioned_with_corruption_4203_1_2"; + static String MIXED_CORRUPTED_AND_CORRECTED_PARTITIONED_FOLDER = "mixed_partitioned"; + + @BeforeClass + public static void initFs() throws Exception { + Configuration conf = new Configuration(); + conf.set(FileSystem.FS_DEFAULT_NAME_KEY, "local"); + fs = FileSystem.get(conf); + path = new Path(getDfsTestTmpSchemaLocation()); + + // Move files into temp directory, rewrite the metadata cache file to contain the appropriate absolute + // path + copyDirectoryIntoTempSpace(CORRUPTED_PARTITIONED_DATES_1_2_PATH); + copyMetaDataCacheToTempReplacingInternalPaths("parquet/4203_corrupt_dates/drill.parquet.metadata_1_2.requires_replace.txt", + PARTITIONED_1_2_FOLDER); + copyDirectoryIntoTempSpace(CORRUPTED_PARTITIONED_DATES_1_2_PATH, MIXED_CORRUPTED_AND_CORRECTED_PARTITIONED_FOLDER); + copyDirectoryIntoTempSpace(CORRECTED_PARTITIONED_DATES_1_9_PATH, MIXED_CORRUPTED_AND_CORRECTED_PARTITIONED_FOLDER); + copyDirectoryIntoTempSpace(CORRUPTED_PARTITIONED_DATES_1_4_0_PATH, MIXED_CORRUPTED_AND_CORRECTED_PARTITIONED_FOLDER); + } + + /** + * Test reading a directory full of partitioned parquet files with dates, these files have a drill version + * number of 1.9.0 in their footers, so we can be certain they do not have corruption. The option to disable the + * correction is passed, but it will not change the result in the case where we are certain correction + * is NOT needed. For more info see DRILL-4203. + */ + @Test + public void testReadPartitionedOnCorrectedDates() throws Exception { + try { + for (String selection : new String[]{"*", "date_col"}) { + // for sanity, try reading all partitions without a filter + TestBuilder builder = testBuilder() + .sqlQuery("select " + selection + " from table(dfs.`" + CORRECTED_PARTITIONED_DATES_1_9_PATH + "`" + + "(type => 'parquet', autoCorrectCorruptDates => false))") + .unOrdered() + .baselineColumns("date_col"); + addDateBaselineVals(builder); + builder.go(); + + String query = "select " + selection + " from table(dfs.`" + CORRECTED_PARTITIONED_DATES_1_9_PATH + "` " + + "(type => 'parquet', autoCorrectCorruptDates => false))" + " where date_col = date '1970-01-01'"; + // verify that pruning is actually taking place + testPlanMatchingPatterns(query, new String[]{"numFiles=1"}, null); + + // read with a filter on the partition column + testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("date_col") + .baselineValues(new DateTime(1970, 1, 1, 0, 0)) + .go(); + } + } finally { + test("alter session reset all"); + } + } + + @Test + public void testVarcharPartitionedReadWithCorruption() throws Exception { + testBuilder() + .sqlQuery("select date_col from " + + "dfs.`" + VARCHAR_PARTITIONED + "`" + + "where length(varchar_col) = 12") + .baselineColumns("date_col") + .unOrdered() + .baselineValues(new DateTime(2039, 4, 9, 0, 0)) + .baselineValues(new DateTime(1999, 1, 8, 0, 0)) + .go(); + } + + @Test + public void testDatePartitionedReadWithCorruption() throws Exception { + testBuilder() + .sqlQuery("select date_col from " + + "dfs.`" + DATE_PARTITIONED + "`" + + "where date_col = '1999-04-08'") + .baselineColumns("date_col") + .unOrdered() + .baselineValues(new DateTime(1999, 4, 8, 0, 0)) + .go(); + + String sql = "select date_col from dfs.`" + DATE_PARTITIONED + "` where date_col > '1999-04-08'"; + testPlanMatchingPatterns(sql, new String[]{"numFiles=6"}, null); + } + + /** + * Test reading a directory full of partitioned parquet files with dates, these files have a drill version + * number of 1.4.0 in their footers, so we can be certain they are corrupt. The option to disable the + * correction is passed, but it will not change the result in the case where we are certain correction + * is needed. For more info see DRILL-4203. + */ + @Test + public void testReadPartitionedOnCorruptedDates() throws Exception { + try { + for (String selection : new String[]{"*", "date_col"}) { + // for sanity, try reading all partitions without a filter + TestBuilder builder = testBuilder() + .sqlQuery("select " + selection + " from table(dfs.`" + CORRUPTED_PARTITIONED_DATES_1_4_0_PATH + "`" + + "(type => 'parquet', autoCorrectCorruptDates => false))") + .unOrdered() + .baselineColumns("date_col"); + addDateBaselineVals(builder); + builder.go(); + + String query = "select " + selection + " from table(dfs.`" + CORRUPTED_PARTITIONED_DATES_1_4_0_PATH + "` " + + "(type => 'parquet', autoCorrectCorruptDates => false))" + " where date_col = date '1970-01-01'"; + // verify that pruning is actually taking place + testPlanMatchingPatterns(query, new String[]{"numFiles=1"}, null); + + // read with a filter on the partition column + testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("date_col") + .baselineValues(new DateTime(1970, 1, 1, 0, 0)) + .go(); + } + } finally { + test("alter session reset all"); + } + } + + @Test + public void testReadPartitionedOnCorruptedDates_UserDisabledCorrection() throws Exception { + try { + for (String selection : new String[]{"*", "date_col"}) { + // for sanity, try reading all partitions without a filter + TestBuilder builder = testBuilder() + .sqlQuery("select " + selection + " from table(dfs.`" + CORRUPTED_PARTITIONED_DATES_1_2_PATH + "`" + + "(type => 'parquet', autoCorrectCorruptDates => false))") + .unOrdered() + .baselineColumns("date_col"); + addCorruptedDateBaselineVals(builder); + builder.go(); + + String query = "select " + selection + " from table(dfs.`" + CORRUPTED_PARTITIONED_DATES_1_2_PATH + "` " + + "(type => 'parquet', autoCorrectCorruptDates => false))" + " where date_col = cast('15334-03-17' as date)"; + // verify that pruning is actually taking place + testPlanMatchingPatterns(query, new String[]{"numFiles=1"}, null); + + // read with a filter on the partition column + testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("date_col") + .baselineValues(new DateTime(15334, 03, 17, 0, 0)) + .go(); + } + } finally { + test("alter session reset all"); + } + } + + @Test + public void testCorruptValDetectionDuringPruning() throws Exception { + try { + for (String selection : new String[]{"*", "date_col"}) { + // for sanity, try reading all partitions without a filter + TestBuilder builder = testBuilder() + .sqlQuery("select " + selection + " from dfs.`" + CORRUPTED_PARTITIONED_DATES_1_2_PATH + "`") + .unOrdered() + .baselineColumns("date_col"); + addDateBaselineVals(builder); + builder.go(); + + String query = "select " + selection + " from dfs.`" + CORRUPTED_PARTITIONED_DATES_1_2_PATH + "`" + + " where date_col = date '1970-01-01'"; + // verify that pruning is actually taking place + testPlanMatchingPatterns(query, new String[]{"numFiles=1"}, null); + + // read with a filter on the partition column + testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("date_col") + .baselineValues(new DateTime(1970, 1, 1, 0, 0)) + .go(); + } + } finally { + test("alter session reset all"); + } + } + + /** + * To fix some of the corrupted dates fixed as part of DRILL-4203 it requires + * actually looking at the values stored in the file. A column with date values + * actually stored must be located to check a value. Just because we find one + * column where the all values are null does not mean we can safely avoid reading + * date columns with auto-correction, although null values do not need fixing, + * other columns may contain actual corrupt date values. + * + * This test checks the case where the first columns in the file are all null filled + * and a later column must be found to identify that the file is corrupt. + */ + @Test + public void testReadCorruptDatesWithNullFilledColumns() throws Exception { + testBuilder() + .sqlQuery("select null_dates_1, null_dates_2, non_existent_field, date_col from dfs.`" + PARQUET_DATE_FILE_WITH_NULL_FILLED_COLS + "`") + .unOrdered() + .baselineColumns("null_dates_1", "null_dates_2", "non_existent_field", "date_col") + .baselineValues(null, null, null, new DateTime(1970, 1, 1, 0, 0)) + .baselineValues(null, null, null, new DateTime(1970, 1, 2, 0, 0)) + .baselineValues(null, null, null, new DateTime(1969, 12, 31, 0, 0)) + .baselineValues(null, null, null, new DateTime(1969, 12, 30, 0, 0)) + .baselineValues(null, null, null, new DateTime(1900, 1, 1, 0, 0)) + .baselineValues(null, null, null, new DateTime(2015, 1, 1, 0, 0)) + .go(); + } + + @Test + public void testUserOverrideDateCorrection() throws Exception { + // read once with the flat reader + readFilesWithUserDisabledAutoCorrection(); + + try { + test(String.format("alter session set %s = true", ExecConstants.PARQUET_NEW_RECORD_READER)); + // read all of the types with the complex reader + readFilesWithUserDisabledAutoCorrection(); + } finally { + test("alter session reset all"); + } + + } + + /** + * Test reading a directory full of parquet files with dates, some of which have corrupted values + * due to DRILL-4203. + * + * Tests reading the files with both the vectorized and complex parquet readers. + * + * @throws Exception + */ + @Test + public void testReadMixedOldAndNewBothReaders() throws Exception { + /// read once with the flat reader + readMixedCorruptedAndCorrectedDates(); + + try { + // read all of the types with the complex reader + test(String.format("alter session set %s = true", ExecConstants.PARQUET_NEW_RECORD_READER)); + readMixedCorruptedAndCorrectedDates(); + } finally { + test(String.format("alter session set %s = false", ExecConstants.PARQUET_NEW_RECORD_READER)); + } + } + + public void addDateBaselineVals(TestBuilder builder) { + builder + .baselineValues(new DateTime(1970, 1, 1, 0, 0)) + .baselineValues(new DateTime(1970, 1, 2, 0, 0)) + .baselineValues(new DateTime(1969, 12, 31, 0, 0)) + .baselineValues(new DateTime(1969, 12, 30, 0, 0)) + .baselineValues(new DateTime(1900, 1, 1, 0, 0)) + .baselineValues(new DateTime(2015, 1, 1, 0, 0)); + } + + /** + * These are the same values added in the addDateBaselineVals, shifted as corrupt values + */ + public void addCorruptedDateBaselineVals(TestBuilder builder) { + builder + .baselineValues(new DateTime(15334, 03, 17, 0, 0)) + .baselineValues(new DateTime(15334, 03, 18, 0, 0)) + .baselineValues(new DateTime(15334, 03, 15, 0, 0)) + .baselineValues(new DateTime(15334, 03, 16, 0, 0)) + .baselineValues(new DateTime(15264, 03, 16, 0, 0)) + .baselineValues(new DateTime(15379, 03, 17, 0, 0)); + } + + public void readFilesWithUserDisabledAutoCorrection() throws Exception { + // ensure that selecting the date column explicitly or as part of a star still results + // in checking the file metadata for date columns (when we need to check the statistics + // for bad values) to set the flag that the values are corrupt + for (String selection : new String[] {"*", "date_col"}) { + TestBuilder builder = testBuilder() + .sqlQuery("select " + selection + " from table(dfs.`" + MIXED_CORRUPTED_AND_CORRECTED_DATES_PATH + "`" + + "(type => 'parquet', autoCorrectCorruptDates => false))") + .unOrdered() + .baselineColumns("date_col"); + addDateBaselineVals(builder); + addDateBaselineVals(builder); + addCorruptedDateBaselineVals(builder); + addCorruptedDateBaselineVals(builder); + builder.go(); + } + } + + private static String replaceWorkingPathInString(String orig) { + return orig.replaceAll(Pattern.quote("[WORKING_PATH]"), Matcher.quoteReplacement(TestTools.getWorkingPath())); + } + + private static void copyDirectoryIntoTempSpace(String resourcesDir) throws IOException { + copyDirectoryIntoTempSpace(resourcesDir, null); + } + + private static void copyDirectoryIntoTempSpace(String resourcesDir, String destinationSubDir) throws IOException { + Path destination = path; + if (destinationSubDir != null) { + destination = new Path(path, destinationSubDir); + } + fs.copyFromLocalFile( + new Path(replaceWorkingPathInString(resourcesDir)), + destination); + } + + /** + * Metadata cache files include full paths to the files that have been scanned. + * + * There is no way to generate a metadata cache file with absolute paths that + * will be guarenteed to be available on an arbitrary test machine. + * + * To enable testing older metadata cache files, they were generated manually + * using older drill versions, and the absolute path up to the folder where + * the metadata cache file appeared was manually replaced with the string + * REPLACED_IN_TEST. Here the file is re-written into the given temporary + * location after the REPLACED_IN_TEST string has been replaced by the actual + * location generated during this run of the tests. + * + * @param srcFileOnClassPath + * @param destFolderInTmp + * @throws IOException + */ + private static void copyMetaDataCacheToTempReplacingInternalPaths(String srcFileOnClassPath, String destFolderInTmp) throws IOException { + String metadataFileContents = getFile(srcFileOnClassPath); + Path newMetaCache = new Path(new Path(path, destFolderInTmp), ".drill.parquet_metadata"); + FSDataOutputStream outSteam = fs.create(newMetaCache); + outSteam.writeBytes(metadataFileContents.replace("REPLACED_IN_TEST", path.toString())); + outSteam.close(); + } + + @Test + public void testReadOldMetadataCacheFile() throws Exception { + // for sanity, try reading all partitions without a filter + String query = "select date_col from dfs.`" + new Path(path, PARTITIONED_1_2_FOLDER) + "`"; + TestBuilder builder = testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("date_col"); + addDateBaselineVals(builder); + builder.go(); + testPlanMatchingPatterns(query, new String[]{"usedMetadataFile=true"}, null); + } + + @Test + public void testReadOldMetadataCacheFileWithPruning() throws Exception { + String query = "select date_col from dfs.`" + new Path(path, PARTITIONED_1_2_FOLDER) + "`" + + " where date_col = date '1970-01-01'"; + // verify that pruning is actually taking place + testPlanMatchingPatterns(query, new String[]{"numFiles=1", "usedMetadataFile=true"}, null); + + // read with a filter on the partition column + testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("date_col") + .baselineValues(new DateTime(1970, 1, 1, 0, 0)) + .go(); + } + + @Test + public void testReadOldMetadataCacheFileOverrideCorrection() throws Exception { + // for sanity, try reading all partitions without a filter + TestBuilder builder = testBuilder() + .sqlQuery("select date_col from table(dfs.`" + new Path(path, PARTITIONED_1_2_FOLDER) + "`" + + "(type => 'parquet', autoCorrectCorruptDates => false))") + .unOrdered() + .baselineColumns("date_col"); + addCorruptedDateBaselineVals(builder); + builder.go(); + + String query = "select date_col from table(dfs.`" + new Path(path, PARTITIONED_1_2_FOLDER) + "` " + + "(type => 'parquet', autoCorrectCorruptDates => false))" + " where date_col = cast('15334-03-17' as date)"; + // verify that pruning is actually taking place + testPlanMatchingPatterns(query, new String[]{"numFiles=1", "usedMetadataFile=true"}, null); + + // read with a filter on the partition column + testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("date_col") + .baselineValues(new DateTime(15334, 03, 17, 0, 0)) + .go(); + } + + @Test + public void testReadNewMetadataCacheFileOverOldAndNewFiles() throws Exception { + String table = "dfs.`" + new Path(path, MIXED_CORRUPTED_AND_CORRECTED_PARTITIONED_FOLDER) + "`"; + copyMetaDataCacheToTempReplacingInternalPaths("parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt", + MIXED_CORRUPTED_AND_CORRECTED_PARTITIONED_FOLDER); + // for sanity, try reading all partitions without a filter + TestBuilder builder = testBuilder() + .sqlQuery("select date_col from " + table) + .unOrdered() + .baselineColumns("date_col"); + addDateBaselineVals(builder); + addDateBaselineVals(builder); + addDateBaselineVals(builder); + builder.go(); + + String query = "select date_col from " + table + + " where date_col = date '1970-01-01'"; + // verify that pruning is actually taking place + testPlanMatchingPatterns(query, new String[]{"numFiles=3", "usedMetadataFile=true"}, null); + + // read with a filter on the partition column + testBuilder() + .sqlQuery(query) + .unOrdered() + .baselineColumns("date_col") + .baselineValues(new DateTime(1970, 1, 1, 0, 0)) + .baselineValues(new DateTime(1970, 1, 1, 0, 0)) + .baselineValues(new DateTime(1970, 1, 1, 0, 0)) + .go(); + } + + /** + * Read a directory with parquet files where some have corrupted dates, see DRILL-4203. + * @throws Exception + */ + public void readMixedCorruptedAndCorrectedDates() throws Exception { + // ensure that selecting the date column explicitly or as part of a star still results + // in checking the file metadata for date columns (when we need to check the statistics + // for bad values) to set the flag that the values are corrupt + for (String selection : new String[] {"*", "date_col"}) { + TestBuilder builder = testBuilder() + .sqlQuery("select " + selection + " from dfs.`" + MIXED_CORRUPTED_AND_CORRECTED_DATES_PATH + "`") + .unOrdered() + .baselineColumns("date_col"); + for (int i = 0; i < 4; i++) { + addDateBaselineVals(builder); + } + builder.go(); + } + } + +} diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/store/dfs/TestFormatPluginOptionExtractor.java b/exec/java-exec/src/test/java/org/apache/drill/exec/store/dfs/TestFormatPluginOptionExtractor.java index cdeafae99be..c341295731e 100644 --- a/exec/java-exec/src/test/java/org/apache/drill/exec/store/dfs/TestFormatPluginOptionExtractor.java +++ b/exec/java-exec/src/test/java/org/apache/drill/exec/store/dfs/TestFormatPluginOptionExtractor.java @@ -53,9 +53,11 @@ public void test() { assertEquals(NamedFormatPluginConfig.class, d.pluginConfigClass); assertEquals("(type: String, name: String)", d.presentParams()); break; + case "parquet": + assertEquals(d.typeName, "(type: String, autoCorrectCorruptDates: boolean)", d.presentParams()); + break; case "json": case "sequencefile": - case "parquet": case "avro": assertEquals(d.typeName, "(type: String)", d.presentParams()); break; diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java b/exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java index 26ba316611c..51fa45c7561 100644 --- a/exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java +++ b/exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java @@ -639,7 +639,7 @@ public void testPerformance(@Injectable final DrillbitContext bitContext, for(int i = 0; i < 25; i++) { final ParquetRecordReader rr = new ParquetRecordReader(context, 256000, fileName, 0, fs, CodecFactory.createDirectCodecFactory(dfsConfig, new ParquetDirectByteBufferAllocator(allocator), 0), - f.getParquetMetadata(), columns); + f.getParquetMetadata(), columns, ParquetReaderUtility.DateCorruptionStatus.META_SHOWS_CORRUPTION); final TestOutputMutator mutator = new TestOutputMutator(allocator); rr.setup(null, mutator); final Stopwatch watch = Stopwatch.createStarted(); diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_1.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_1.parquet new file mode 100644 index 0000000000000000000000000000000000000000..7aa8e610028374eec93c8a1953677a3150639299 GIT binary patch literal 257 zcmZWk!Ab)$6pYVWg6Pc`63AhNwkUMNhRuc*ym<3idg#H6B)bt7R#)TJuknNYs~T;= zgL4>$nSo(ecWaFVFOV-0l9(cud~*4@+wGZ=gwq)$K0f-f5ybL-5{F?#AP95dv~^>6 zY=(Y*Mi>Yw}&d^EP&UuRcP2?}itCqScKm2w%Ut z__hsW{v5MImT6Tho&WapSY%gz+x2Z`y)7zdf-5R-ZP3p8r`nWdRXX2zQCbac6 K`AUyg>HIgWDnHx+ literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_2.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_2.parquet new file mode 100644 index 0000000000000000000000000000000000000000..8c43cee145649241e1085a3824749317415bdd0c GIT binary patch literal 257 zcmZWk%}T^D7>u9m5=3u)A%Prva2JJbXlNQL?8TeMWe*;_NYX|qRI9P;Gx}6^;}3Y) zISj+hz%ZNJtww?u$X5tS%#lhy`|sgQ2Ls6mA|yX=W7qfUBgFS^c;P2n-Kc`_?W>D# zyD;X@G5gCht!kz7pMD;T?8@)DzRj$+MdeIzMdhsx+FAcpo3gA*=NoV8(m2zEw!S8x J>Cq;g-vdboMjQYD literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_3.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_3.parquet new file mode 100644 index 0000000000000000000000000000000000000000..2d2415a4df34f47503dc22ab103b79dd2ef287c9 GIT binary patch literal 257 zcmZWk!Ab)$6pYVWg6Pc`63Af>wzSX<8#WtO=%qK0r3VjQB-xFyu(}$zexrY-X)Abe z4#O}rFwAzd(@5|J`3fP4DN@NN-w7vXB;oQF5}%(PYy`2opTuDp5eUKzIBne+o|>Vb zT@eOC3OT1T)haTj;x9#Dz#iD+oUZ>3Bp-;7{JM=@->c6MKf2+SUubow3c`19F23)> zn7_pAH_NoDmCld-JQmr)@4LRuthYtwOmIcztqs~)|6H50tV-t_Z|c%G(}cEOlCSh= Ho6i3LOt?lN literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_4.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_4.parquet new file mode 100644 index 0000000000000000000000000000000000000000..ff5ce24c7c3c27b5eb65a27785907fca0094e664 GIT binary patch literal 257 zcmWG=3^EjD5ET)X&=F+OhA&E5yTc{lGO2FDay|;5oH2$ zRGB0=QW8s2S?`iAl*xCKkpfi6+UZDHfUxCxCtl H0D2w(uz@l) literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_5.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption/0_0_5.parquet new file mode 100644 index 0000000000000000000000000000000000000000..4a5d4fb20b4afb7db34c1bfdc61bd258e3ad6629 GIT binary patch literal 257 zcmZWkPfNov9R2O*96{X8gamTv!Ho^p(9kqg*kw15V+RjjBxy4!RI9P`Gy1WNQ}E#B z@&3W%?RN)Ff!6#jzmy_F_P`#OWczO*`3OStc^kXFS05q1cf$+6)ap(Zzz?r3zMaCD zKgaAh%e1PM&d>ck7TL<5y1vb<4cb|MtW8-~rSpw9b!nVwLR+ulGkw{o F^Ix`68#WtO=%L;`mL5EKkz_Z*!s=?=`i*{;X)Abe z4#T{efyZp`cbWpdrhG*)i76$LPs|43#0(Nh53*@AAV|Z?c zesKv5g%om0WU5s#CE_oo$dEm^IA_s+G=<{X7=gjo){Dn^|v*%9-Gb%3B+>v;L(vWm%QZH{R5xai$4veGA`d I(Ke0$0bB<(c>n+a literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/drill.parquet.metadata_1_2.requires_replace.txt b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/drill.parquet.metadata_1_2.requires_replace.txt new file mode 100644 index 00000000000..bfca095be39 --- /dev/null +++ b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/drill.parquet.metadata_1_2.requires_replace.txt @@ -0,0 +1,119 @@ +{ + "metadata_version" : "v1", + "files" : [ { + "path" : "REPLACED_IN_TEST/partitioned_with_corruption_4203_1_2/0_0_1.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : "`date_col`", + "primitiveType" : "INT32", + "originalType" : "DATE", + "max" : 4855609, + "min" : 4855609, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/partitioned_with_corruption_4203_1_2/0_0_2.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : "`date_col`", + "primitiveType" : "INT32", + "originalType" : "DATE", + "max" : 4881174, + "min" : 4881174, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/partitioned_with_corruption_4203_1_2/0_0_3.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : "`date_col`", + "primitiveType" : "INT32", + "originalType" : "DATE", + "max" : 4881175, + "min" : 4881175, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/partitioned_with_corruption_4203_1_2/0_0_4.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : "`date_col`", + "primitiveType" : "INT32", + "originalType" : "DATE", + "max" : 4881176, + "min" : 4881176, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/partitioned_with_corruption_4203_1_2/0_0_5.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : "`date_col`", + "primitiveType" : "INT32", + "originalType" : "DATE", + "max" : 4881177, + "min" : 4881177, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/partitioned_with_corruption_4203_1_2/0_0_6.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : "`date_col`", + "primitiveType" : "INT32", + "originalType" : "DATE", + "max" : 4897612, + "min" : 4897612, + "nulls" : 0 + } ] + } ] + } ], + "directories" : [ ] +} diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_1.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_1.parquet new file mode 100644 index 0000000000000000000000000000000000000000..9890a054083d4c5da40bd6f662d7c350276e2d17 GIT binary patch literal 1226 zcmZWpKTq5+6nA{#&J8K55Vm|(hAwAOsU4__bVC*-kPt(KIxtnLa@-v`nrk408~OvRFUA_kQo6f64yV#|?@oq2U{f6(AA!J%o@x z86f0`(g5BUbj=APtx@_<25^g1IEjp-nR$`!L1uYYnI&w7$*SC!XB8%4vw@57`{WmR zI5DVdz+8dC=u4;P&i+P!v;S_J8=wfw>mA5axquVVR}LpQ&ayAZXTx-S#2u`dEAInf z)&b2?cCS`5wJZQXL(X}5Wqbc=I4VHTR&_6Pb!jTrXzq<~x$eU# z>WqC8H-7Bu*u^@*^-@HME?&Q)9rt<(#(kn{u>eX?y+-}ecGd-OtgvgO%iXsliZs`1 mLNWU+7WeHKWNwtsezzG$>%mD&V0VE?)U?`P}ulwxnjh%IZ2qxYGVhex+-*FK_ z@}z)}<8cSv-XcvS49{LYe_?d@sSZ$y` zee&j)Xh5O70c8RuAa{(O-RpO7?#uq} zah&>PQ>3{4v;iqfVb$T>pTD2OcHzv2u!#&CksqX(=T7CJ4E*eRP^x{_Xl zE;YBH)`D=^nO!i{o;2gk2F8Mh*%dJI_R5RY9bN$qE9n+Uws=tsh$p8jZCtP?bjmJj zp($w-%M9m^&R{-|NbuF7-z;|SgbS)(FlD04rGQ}BO7W(8_{$~!G-ZvmSv??b{x-$4 z>&a(zm|W?)K(gF5aFOI{g$U*&8nSC(7P_Y=%hX2q-NPvTGLEygQHpR4zG?W&{Ri|( B^%MXA literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_11.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_11.parquet new file mode 100644 index 0000000000000000000000000000000000000000..29839d2c00d7e07d057e0e3dbc47701d53c3a232 GIT binary patch literal 1238 zcmZuxJ#W)M7(QRF$>AO-LcPO59vH}}10WR%DT5_ciAoSqRWWoy0y$0Eq$G7QZdN3O zs;WD%c0ir7vqX&i1k|xJ;umz{y*vAC$6Zd(`|-Tr_SX9AHI6vt>;cCbP~Rg>gphd! zAVio_fVIh23T|}x>O`55XOVWNerBjf{`8{@k=2_JJfdDrAxyz)L+tOHZ*T8BUH52)(uOh-D45p^#W~1x%q$&P5pK<& zlQ@bJz5+rUd5zIQ@@^*@BThhad!s$8P-fZ>lYz3gnenH|Xkrw%4CSU$d`!a0p?15C zI0GJoD1&$!j7CDF(*E-O3AKq~(K$fUch s`y3gry$sFlh(KP}1332V-o?MVz-ZMy4C4<+gXG~PMyLi~G5pc~1NhnP_y7O^ literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_12.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_12.parquet new file mode 100644 index 0000000000000000000000000000000000000000..af0261cef7b4b6838f2aef868e286502c4d8c35d GIT binary patch literal 1258 zcmZWp%W4}j6qV$OqfW=6k4TI`i?(3VMO$o0p@mtbkVVNNn6`v;RmQQCnAjP2I`*RL zKpu3JUufy7pJ1}eDj(1W`W=NVdnL_i#xomn?$tS0_euwK-@NNE#3=KgF`@w#zVi?v zWL6FcIi5G@XiUqgwgw$0k%(Yg$>sDsAWFW6gKen$i@kHgG_8 z_)loUrx{8c$U=}jVY1M3qubf|@VdSB@)x?$aFU>ymS9PqNow zkD6mRt7KexPI5(dPgZeGKvD9rya7ep-r_~t4sSqaotz`n;$=-R&rG*)<63=+`8O;t zYeC7;rO;Hmc5J2MGwDJz{#9N(LDAJkYzka02}VsT`J39|ua;o=lE>vmT`)I&oBY}B z!bAq0_lQ8}~9VE}kF{1E4bm5o#506am AlK=n! literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_13.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_13.parquet new file mode 100644 index 0000000000000000000000000000000000000000..67eff64b716c463b5b4e8b32f6d0e6ff539fc488 GIT binary patch literal 1226 zcmZWpJ!lj`6rP=I*12^fDw$zU78|+2B7p=HjBH^*K~jXEl2&$;i+AQOdv5lwO@E~U z?IeYzRl2~DTCq)$LeSDGrG+5z&CJg1=C&EWdGCGm{=VIf)fdYYQ9^?SiWMLscW)tt z^vM7rH;@MK-lX%DFmiAI@Tc^1i&QuXjidhZr_XEfvOJ5IC9H@2Sv7*rHsu0Xj%?^bGVY%D*0wX*c!;Y(gZMVeb`Kr7Y_OisfJldN}kJHzfM z;T9=gut0!kob^^FKfXQVt~`rYB&@gc`2F{Tuj7*^oETKJVy-~J)T+a>Rx6&S>F5LX2`^Flin{3BPhgr!v68!9bEURj(k1 zn_=^dAR%!Ff%~VVro5_yBrkXgA zYf7eKj7x@@lVyxCFs2kM&Vf;scXg1q!gHWm-Ihp~csdi(KzDa#;)*?Bxj3DPDhjSZ z%|CW6zq%Ynf6Mm7u?xwKXiKEYN3$U%I(fN8+wRdUv^%A0aRC&+*haZ%+v^2ztguU@ u)17%CMVf0hp_q?YB+k4DWNwtset8U|`~D!#-VNLNVn0KFF?_Y~7yA#qa|=2E literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_14.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_14.parquet new file mode 100644 index 0000000000000000000000000000000000000000..7be1e7918b672ede4500efc7a25ca2c948e07c78 GIT binary patch literal 1201 zcmZWpJ#Q015Zzm!?ak#t$jYwP=n5LIQa}+ZifbYi2}J@01Q8%4bpFCeY%V#@`2msA zq@t##BchBVWTl`%bg5AO0~GuOWYcxT*@n(?HJv4_mz^KHZg5|&rcA=RfXVN{ z^Pft9MMi+#0;NM)MRRj!q%zl{-!7pu$ki=xNeZv(@m z>e`RLo<8KhT#a@K>)QW5J%90S{ArUDi;Q-=1q!CEiW<-3o~MdD& z_7HEUABwkwY>KE4yP%)Ga{-wq!=xC=d@t1Xq)2Bva8pxm$iT-UneD5x4x%COLg+He z50j}1f(>MF^)Sf?lO(q>0YfS`!6|x( zbe(pJ8D`t;i6C*!gta`b9d0S3o1ynI+N^FB%aI_35tNxMII#Z^T6-Hny$0SA6yzulgv8 zHe?gX$IiVD=#fXwh!Q=rd^>l@qb3}3$;;vrD1mu{@_F3}FM;FiTp~U0tQAqDxy}iS l`GCdZtQCW~t-sli7Q<*g*iZ6<`KY)z%h5j!A1VA<{Rhno@xK57 literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_15.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_15.parquet new file mode 100644 index 0000000000000000000000000000000000000000..89be46ded2abcf198d8d1f51c0f7a10d7d2e1ddc GIT binary patch literal 1216 zcmZWpKTq5+6nC6(=Z3VZ5Vm|()a@(=I)GH^gt8!YpbQndfuT}W<+$ULHb)@Mp-%k( z%=8;nO#J{1C)AONsRLd45FI-8*^ZrXS*)Mmd%yS3zvN(dxJ3~qG}xqA0TObLB82qG z03k1s2Jk+o>y0qtejVINKetGQlh8QYng9IQhs^SKJ0R0P&I_P0tM4fqanxPj-!Gc#Q;ZQi7Kc%-#~P`1iG^b;*)`Zw3kPZ~2~=F*nql^2 z85bBBOIj88z^K}Lyhz*OJH*QD+J>8>?EB2Tr;>L}vdM(i0E6;M952L7` z*onCEps*2ajWqdkC8R_bu3ylOdbt9lE-6|(0L8CfqJCsM=L2x8sB5Il-F8BXG}r1u lF$XLXx19(iZdA^GcNj(+{%KsC%_rrnS%Lmy_+H`P^&f?_>tz4{ literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_16.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_16.parquet new file mode 100644 index 0000000000000000000000000000000000000000..0a00b2e7c89712b26ffca9efa8e11dff2b40d930 GIT binary patch literal 1253 zcmZWpPiqrF6rb5_*U5GxDxG0d77s!O4=M>LsO%wvLak7W#$0uiCN`3EDC(e$ zv6DDX=@ulmkbh%xlzte-Q$%gB+`;6NafmQEh|-aWw>*_k(#cFYZYoMcIKHOQY%bI8 zAnF0ngDRusB$`Sm%t$ylPoiXa7$w?Az*iqn<0w<$g6$r}$AhW70R{^WGKgbY1u?4` zG=FdsKo>1UnnpXdn8kM49w%XCM|nS>wn{HFCm91a=KU~V(j3X>;!A99GS6VHT4K1X zw6*-)${G5d4CCAaMrnchB``91>knxZUILBiz0Hx5xT*!zQ{An-xMZJUw!nN<3rp)< zGF7Du+f*u!Nfw&3lwa6kVW$f)6S5IdqJ-oVHLb5Up!L#*^Mw;oH=9g8bkq03fxh)t u=SW?54P2zSW2lpQw1n&d@ literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_17.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_17.parquet new file mode 100644 index 0000000000000000000000000000000000000000..ed37e629ee18815f0cc77ec0aeae460731474c54 GIT binary patch literal 1231 zcmZWpL2J}N6rN0{>ugL#v@>KS6fZk?P-sy>A&82gR*=$`UNzg@Z3DZ_8ngWeUW$Lg zo9@X|&rAIqr9Z&4V2>VrGn2ebH<#g?H}9MGzBga8^ zus{dk^rl#K()e5U?4dpn5t(!zRR_(RU(Pn+X1$g-YuHTtc>61)>9tJLu-U+&{Ppdf zryHnH*+8`g${n%h^xWRsc(wa<{n6u{u!e@Tvfc(Mc-&<6hX+|Y9ppvWM(8jx)>L!! zoxcv5UQ0C^Hr1rrpD#b|{(T@x-LvA|e1@0JjYC<78xl zP*?}AA1C?#VUkxd4zc!dlqF>cKJ5K|Htmf}1$ayd@LragTZp8csPj|t2vl?l?_lqO zN-8(yd&T31FV$Wom`!)Y8AeYI)m~g1vqom7YVKhr90Qh`dr)h^sp^E*jA~BEIHAB; zFs(WVM%mugL1KsJKtt=+$hLU46p6s9yE1XZp2$L-Eya$B9ccNd4EgTkApMj99bM}{ zOP?)Ag4^=-JEGv(G6-JqwYmg~U%p5E)Ts6nIB4w}*>V@HND#|KR~Y6K8LNv{3`VcK bE`PTqCHMSEl7E^GiU;GI{G;&c!msQC(=Dkn5@p69dec$K(x+htC_GXD9N@(u6?v^kWq=%hP0*uo-r>c!;?#PZK6#vw^|)_fO7L z0~VPE>=h_?>AhOdt+k~W@0NSL-dkQlMOx^!Ad_2J6qwzA~S@&0tM5c){2L?r}qcjr?@kH=@)rk(l%VW zh~F4~DL-!HBSd|;h1qc18Kj$Rre$CHyOEA3<#4PGw=`uz8orh3c%s7YAQ}NLf-L=F zHytS>6qd%DyJ@j8NDFHtU~6rS^3-$?!adx~ce0VH08b4DlI6L|f*9_^ont{_n4*hF z$MsGz!)%qk79?rNQLM+**WtC{6eD9>Vm+z!SRu7jG4)&}w+2^g>Orj;frIL1XNAoc1jH1OQP(t$()g!w(FM;DkT_IiWJcudMT&D-c le997W9wZ=ft8(@yU>H3JCu#9@r(Z6P3v`9yw+jDV{{f;A?xFwy literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_19.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_19.parquet new file mode 100644 index 0000000000000000000000000000000000000000..ce72a9c7f16b38244e81f4b96c5b2a10aaa24448 GIT binary patch literal 1186 zcmZWpy-wUf5MJ;3%*HVYq1)A-Fpr!09^eHDKh!;C-c_E>dt3+k^JaR}cGevs}&1GHk^GQ{X3vRv&9li-Y z%gd}hIm=3$BanM>IxRAzAcTE5D(3lARe*Fv8k*_X59@kccA+=x+&a?bZiAR2&2=77%pI1B+aLv7 dTW7PofMK*5o@eF7d|bVqmFO>qUncxJ{ReaN-6;S7 literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_2.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_2.parquet new file mode 100644 index 0000000000000000000000000000000000000000..3b967e54b8659e2c2040fb5c99c8614ec58d471a GIT binary patch literal 1268 zcmZWpF>KR76uont#0h)P~6fj|*L6`-!jY23yr!NGOP%8;oO z0~14bB&G_ss3UX7&J3`zFtYIP&Ubd><@D~~d;k9b_w2ph7h4ojLfs8&NkDwhwh%&U zWq^?FiU6FpX{SUZe|LU;s|%_@l1_YmP|bWV?n7p==9vYo(mwoo05q}YGy$sxp!v_y z4_Salc>#J0lAH8aspi(+)}vQDo7?wadIglCwape}GRL4?ryC_@`_1TRIE%CQQ5w&Z zD9yY|MCR!{YeSw@a<#?V-}{d}TdaAD0#;j`{Qmdl(~~>99?>XoQEx$F(MG9Ij6IAc z2_sCc?((UfCP_v+KGWr6=K72Ei>NciXTUocPd150S2^om$LdefFG_pkh|O?^9dROU+5efF1Jvpm1w}= zwV>qe5>)y#+vv-WBjuklTVQ@>2UTZRBP!iR-KRvAQ@pDgz(pMfu;O#RX!z90-=}zb z)BU0W-D__bNL@E?`V=XxafwtuWC5Qy12De!KKs~Y7Fu_vVftYG^?=Y)fdcNc^a5Bs|D=- z_;a)N@Aj6H%rXUJE>IRIE7V-uTzUI+gX-Z&WV?d^AZ zyyPkFeCPwW2&%Sn(d#_tt~^aE(yX>}|NG^Glb4THIms;3O6CFu(|Vye2XRMF5!OI) zWAegH;y9&?;It6G(?3YxZ^r{fUGThif5!&oHR(iYSKhaLjVEb;r~`AFa!&?6rqS?F z!4?qpf#<_3-DDgMR1gx&;N5YQZ0|-%79$Xw?G56{RN#T{b>dNbpfbS1Awb%3tX@G3 zH-g4DK?3Nag-FBJPBFu5oxKtytcX%<1=Q8>LUW4IW<9YL=BmssHB&KhY$iJfn`+`f zt|^&{Bb+m`Ia$UL2F8?P#W^rcdF4Uc3eSONb(>qd#M7C8`ntQ)#1;DudnZn3LZjlM zn8jDu@`_7Qv`-HE>;NJxE4l3H^vAV;5}iKZqiy)O1`SW?T3iFgGyBL_ZM(e&j+J(9 v>2zoHfFjMcs!+^lEEH$;5QLs-o&Bmaj2?T3QSxEbO_zrW`itTFh5xbtfmj2l literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_21.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_21.parquet new file mode 100644 index 0000000000000000000000000000000000000000..c1ac138b5897a8bd3dfb66064102477576990c7d GIT binary patch literal 1231 zcmZWpPiqu06wjp7HI8E~(u6GJ;AMk{7Pg>NVGs(6rH4{f_M)h>-Cet~+o|Jp4t|7Q zir+vl^`eNk3QG@u1`nkdZ+?Ovh4m$wWTrcp@XLGe_x|NKv%UUgl_E-LxJ2s zyD#sKKToGqPAn=Kuvef6daKfNu)W%T`0ReW-G0g|Sko5n1Tah)0Ax-@g-o{cyf@hB zJmwB*Vz^9zXr1+GTHn4r;=VeIX%wta^Wx~&%7dS;*EzAMn8sd#f@!1DkmqsF(?K5c zC~i(R{4_~2x&(nO#IFqZvR6CF2vHwyL3jAZ8I+s!;;gUyt<0p1)8q?;r< z3u3q_nqMUmFhvWIrt6(zhS?_Dk|e6>Qg#IOO?YKE&FHd$>_nw7OQhE&C!VY1*5FD_ zJg7CJRC$O?hTW569AaS1h*q8fqiFB)B3*}PKy%V9k!kUCPSDUycWL9AowE<}bS|oD zxQb1N6W?X1K95fDmhH(CAIe+VmdKEg=LIFEeEE)U;Ny81ct+Rq0w_W89_3TJX)l1| yq+KFY?z|x=GF+z$&Ai7VdESUX=vM3OZ-ZfUJJ^rYcL)7!Wt^ga7(QJ1mHh`0^#KY1 literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_3.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_3.parquet new file mode 100644 index 0000000000000000000000000000000000000000..fbed0cf123c4098740b12a0b239a1831e4daba39 GIT binary patch literal 1278 zcmZuxzi-n(6h2?B$ze>@M!l0Gd0=Qxh+j&jB0%y|r~p+6MHF=eInB=|!L93NKtf`H zp;N`&Ap=u|m^(yW`4>9$AK<^>-JN}Q<1VM~-o5X;_uhSWbL0IQLyR)#A;St#&m{yQ zq*o3I5l05#bcwCx(&)eH>eH%BHAtmXFFhzKznyM@vRo5o35#jdmp`#fOSS+Cb z=j)Tds(=(F1*BV`co#a*2|&Kj9_Pr*y}jY6f7tH! zhI{?t#`fkbw-{1Vtho$Qh4f;hryqa3c8Oe*MkOpZ`hB=+`47ImaA}H?M$;`&Fe~S3 zwUBG+8C&plxHfx3qA-lv3T*1gtqeZLpLW9$Vgz<}YtSz&XN9WUcFxG)Z zO}QfjhjB38S7}wm9N;;i(upR)NCn|y8N59SqTPNFr7;SzTisC@Bo)}Ox7*=SYou-f zJ3xT8!ceIoj%!}+x1b($QAf09YG;^ZzQx}O>K804c05LOdagOeXz{Mt@iWh6NX=Bt zEOV0?gSl#E!CiAE7RNYaq;oQiV+@QrKZ{FXB;{Qnq^a-{Xf(BPhIEM+C676}yK56y z>dvgbDwI4i>hQh#SiUw2aR7}8wh6vceTeQ{Fp!ThQJ`Duk?bkE)oqJyJO{Ae7Z Lr4oGS@aOvv8d&#U literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_4.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_4.parquet new file mode 100644 index 0000000000000000000000000000000000000000..33a1989e1f85e11e3afb340411d52d377b0ee0ca GIT binary patch literal 1242 zcmZWpJ!{)Q7`~Gg_be-IDY-*#MJS~{N=QN3641~EJnZ z%i1App_6BmzfkCuUl8cpU(omNq|@12j_3V&-jBP7>+QeWp@F%451|%-rhjy#BBk1i^c+f|9fzw4jm>19Ou;Gnnm@;ov;~8Yk&!>Y0XQ zKiF*p+eoSJ^rjUbc(yo;odm4!6m0#x`g85>d!$6gPNfMHi#CnPF!nH(Nf=>jbruJ9 zmLxgdfW$8H?v2j!Pshm^Q5!6$KRVG45hjBuk43!es(g}GE4(6yq&lq@M5=XrxqZh17^Q4WGJdyQLeK?3CsFrV$!aHlX6q zZOu0yN7BEs8Rm04T$1yWQPf!TDN!?2kEyqwuff(U;^woaPo3gXs`u6_&zi7uDd-BR rc6Y!L2J}N6rRk+-7#iGr86WDFTxHzXxT*zDnVMXAU#M~OMCDj*><}IvSv-Q^(1)k zAjO;a;#vF&OHYE(lN1j6u=V80OlA0^Oi6_kKAR|3qs68FWo|8d13 zxs@rAy8ywoSvrG(97A1#0e6et*)ubV% z_B*!9CuulVf$Iv=7J>aV7*Ax{Ektd=*}yVP4uVJqp|A*EI|!1U*FloU2*j@LMRAaI zV1aM##?wJ0E5N)!fDGbTvLJ?C&;8Dc2dC&D;_BWhW|*zB=bZSpndVy_HC1})a1t?K zdwk0;hPtq%rhI1TN?Hb8YGy#Kxhdy|xIpAOX~ZE0#N5F1GeBhRRfW_Yo&k*Hb6!}g z#p8xYZFRbe;u8JIV4n?s-0nM3^24)g(6WlI0-m0R3ITiuJ@zY=9)I7ofw!~ zMy8t>82Jm((Ut8chKj$SDsFg#mqs-m}on(ZOMVcz7^MhJ9{3ig|mx z2}}!8pQAHxzT~bvi#eoOpX0}~0AGCmxx-0m6?2p>P%!nZ&K$%Yy+Q|G2DfHU-7HOW zx&clb@f+i#{M}xfAnJnW_r@a=kk_Oi=L31)4m6(R78PRS%@V90OxXuHp(9MSF{jG#y?6jg%^_EZySeN=O4e-NMEd`!#zjE?1&L!nTsd)HOW& zQ55YX`{V#1T9EOAqiataAtkzY^@wKLX#=KR61BJnieEfN_0DG0Yv34NSC%d}_d<#^ p*GNJ!w^=0SUIZ#Hh0gwX45R!0ah#o;4)TYS4E=-OPZ$1R{{ac7>WlyY literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_7.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_7.parquet new file mode 100644 index 0000000000000000000000000000000000000000..55ac2cf9c5993508de718f8da74366fad07b37ad GIT binary patch literal 1273 zcmZuxPixdb6rW_Xn=y^0*v^=gTzBX}g%*T81W_oZ7L+1b6c3Vgce|-sQ)9L_zrcO~ z??UN8dhjf34_^ELLg~lwBp!V;lbNKOy$rv3@BQAt`6c^LUThLXFmdk@Qve)z7Dfok zlLA7P%MIXNA#1fT{Pe5M-N@5OgyTRtnwh_U*@MjdEHiW14Ey_rwLOcUWf+Ie21Y;s zZ2uAsC{#9}u0Ww=z1Fk6zu9}(+uHUN5QL=-K1k^T&I7Vl<&RJPj%V?+oRY)ll)ss@Vmf38VI|lKiJh{Up)2V% z=u$HSYRxH_P0fm-_M{n86Bu)1X4k+d+q=0)-QhLRu+ngaWQ&)rfVgtH8ygqwJ{_^k zR;cBzf#jZBnp=MuLA@~OH-nv9;X-XMG|Hq`?SNp}Q1!BU1y^lY!JN|BStlS)`9js> z>-o<*FuziFg=D#@9S|hBS|@_pqamBxAxK}TpMJKfi8h>*DE}}S6!*tDT5iFI4nMvB DN(lc2 literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_8.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_8.parquet new file mode 100644 index 0000000000000000000000000000000000000000..1ba9706303652ae686a0c343df841bc40f55f56b GIT binary patch literal 1263 zcmZ`(L2uJA6t>f}H6cuBnyp02wHAkQV3iPuNLvL6G$B+mX@~=YbWOKvDcRC=2Yy14 zIPVB|egKDR;x?(jpuc9~*-q@N*)G=e^Y^{?J^M@gy${bQqJ+BJ)RF-SJiCn$QY(W9 z*{(>y>0`Q8fsr5Ed;hS2>PW&#P#)APuYT`CWw92OMOcUZI`1?+TdW04gmnQYqvh_k z6i`NmfN~3z4Z2y;-01Io^28#6ZLa}r*a2>bKCj^DIE}Meews{&qu#-*<2ZZs@pb0i zotCxyuM}$CNvqG-?H;`JY_S&eiLgFjw>$m*d3oLQNEsFLm0O@#v{liKv4^n)VT7sW zFW%Z&n&xy1fF0zu$KUcVM`?nn4LfHrKGq3wOomY&i}Q}F@FX8kRANI>9*M*>k0xgl zwuz_X~x_F#+r@!3K)gF+ZU-7u7E~L_nJsaylMp0Rny(Namjwi`uwU9N^={B z^PjfnR3AiA{$jSkfC%sL^*w1NX4woVQDV%i>hYIN7=O*>{IV5Lr+6Ln@Ot*k7R+8c u+eE6}4RDd-YL7_f3l{PlU>fJMd7D}2iE|cZ=d(C}I?0d&-+3Q?b^ihTJ^gk7 literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_9.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition/0_0_9.parquet new file mode 100644 index 0000000000000000000000000000000000000000..a1a27fa09446ee168daa33295086e88f6363b8b7 GIT binary patch literal 1268 zcmZWpzi-n(6uxuZAUyd_kP`H@7;a6LlGs^-KLfV#P{qvLP(tq z5VBoi0B?gf3t{Bf>sM{z_Y9KZ#Mh2;=HK6XkXf8rW&z7#FFwDlc(ypRFagUAjM|&M zpRxgsvIg`OC^u-k&~trn=ic+%-EQ}pmqSU~=!zkY8<<;kyTEMhKG~1P^LTI+?e0hK z-jBSJMdoQfYCxXV;PQxnChPY-Tb$X50+vVo)co;a^WWb)9?>WpQD1>#(Q2Vlj6IBH z5Js3r^IiQ|+uA+UqI+W15I<}jWhYQxRxkKY=DaFaop4u!wts(6x)r^;|mQ8t9( zC=I7G8Fn2}7kDmY879Z!L>fUY!nl4MCWoUi(KZ6MdNhf{tOEz`&LE!mC$a(v0Su%c z$1)3IR?BaF=fsC8I*7DP@6=)z>#@h21SPTZ1E1O|yfB<(^jX9Yf?TaRlI!A2Y$};G zm{Lm&wN~WIPpzDx_hcBS7BE(n%rAkFwRd%qro&615iQ^xsTR*GK6TY}S2ix$GnVr6 zN>EUC0V@8PZTQ8ljA6h1p{b4gtkp`7a=E6NfpNNwdPX%wUkAR(X?f~vF?pi0P&9cpmv;5c1a zD_Mcz3qXZu}GIa{Bz<``*3x?t9*Ru+=7rU?MLQkpZl! zQVAgxoeCf%$#FdTVy%ItgJuZ8M#y6 zo{ZLaM(^J5t2qnHQ+-;2JhQ?17JqvSk5q}Caf=k@Tl`r4{&?x{pZh9KP~2kj2^5jc zX9^VrRS;N^fCM7e&Ywtu?}wxcflZ_q-NW$B(Dx9L;Hzx8Z@GcGv2BJ9^*7}x9*6Eh zWVjVk7O7zpng?SRb{i2Hcrs*hf+N#oMocHa5)D9 zn9;d@I?;HPxHtgeC)!s&7Znk;%40M5 zFm=L}g8<0|*l+s3yV(i0e|+XQJCW?vf3Xu==}t6`oM7VmhmH1Xf3>}02d?L>1V*D} vOitaVIlXm$IYuGt+rcK?;0_xZq@e4Zf-kp7Rn*IXR<@A0%@ArG#>pLxi2qyXiqH+RTrnZC- z60ZRi(sU*O?GjlonBn2`PfU$WNpW**Wr}&v@4jz>XZEI^898tE?Z;Ei)YzMvG4hH8 z8ox@vMFJ_6C6KZJG05FQ&Xw(!({&y`HZy_)7FR3aB~<~YPVN;jdKd?LueUazO~-G1 zbB5#SC07S@(P+M)uOB`?F*Wvbh^8}gz99Y2i!U$lJX$w#Mx_O1EI?FJE+nTQQxUyX zU_@%|bVEzxctVzeSwUtc98X^L;|LKAxT+WKNdZ%1pH2d%U(tmW>76-$j?-k>tVjpYn>^9GG!p+HoMX*RApQSANWC4o4wA=OtK&t6X@^Q%LRQG2gD(~-#KDei;e(+yZD|Rm7za{%D6M44iKrYUvg|`Y z!G)f4Z14Sm9!%&h;4kpkgwC#{m18GAP;yyjXWyHhnSHapZofeg!KAcDL}p-3m5KkamFSwlsJWTI z>2UGC|hKECFfl4;G;?x1}v&MV8ae@2gF>#5ifAVN%YkpJHzhLs}nbT z^Wk-<-tA_w%3lb?x|6n9pJ5!mR3*A5^-*)PKEoJ)`?R?3syMZh`qC{>L{eDETM$%1 zU?2erL@dwWN?{O0WCws%q!zui=<^`(5s_dg+ujLx&|~aakxS33r5KMRZxlPWV#*Wh z7)RFVoWZscDFLqpQQUB9`OFDYq0YxsD;%6!Vd}%+do=U|E6G5Hz3c=N+h;d`3c!Kw zAYf6DB9^uCZyjq;MHS&P&z*>hs2nQqb==uds(!2yDTc2U#~8LU)Q>xvR5OdUOP>q; zCf){q)m(tP*0ih7#Eg;li8E#*FxF(OFM*Myce{~1!zIvgs@}|sQ@qM)q!d?otK*FQ zPU-1axegO|rf~g-#5t=E!YF?PWi9}sbC0g?amUFl@*2T$GTE#0@)vn1e@*53WkDly z@;ceW^V%;9P^zoiQOho!cv6lSb6VcBmqB3>Eu@{_G8vE`2M$HL5->-%B%C0@}e5YdCX1!%K kU8~-*P19{yraf#o4WngP_P}noji%KcxK8^S>L-)+4>b(J9RL6T literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_12.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_12.parquet new file mode 100644 index 0000000000000000000000000000000000000000..65f494c81a48c43b74c8100b4d9b83b0e928dbf3 GIT binary patch literal 2114 zcmeHH&2G~`5MDcOx|pN_l(ihmhX@g?NThO<9|b8$6gZShKvif2CuAoMvE2RjGP z2L~r^s4t4l#qGy=aIs*vTAcdh*Hh>&e52whTrKX;w9xr`Z+~0IF^YPyiVaT5 zQ^oPf9v(5XJBZ|fmjf?uII(@EguqmJb7F_xfgQ#=47wYAKd_?=RFKtPFm`+v0agGM z*a-sW1!;28D1JAw0acU{E{fcVtjXH0_R_@dl{Gax22r?qsyN1Qw7%JCCwrP8)>CsX ziAbUiBGg=hSWAm*p2`U$?vr4g%D`CKVe=9gQF@Dw6d7Iu4acjSAfDoR#vnOf-9pD1 zd!kLv^GrLv(rL^+oGHRXdOXa*7fJgnnP*CSX{(n+?&xC8Aeg6;%v2PAu|`L;xZ~z5 zYY;V>O)_>-`78^S$9tV19_|Xbh;zk;FlI|@n^(Y0_Fnt6uE}UqJ+i}(V|P>;h6rci zrs-eq`X1QriJo8I!+&!R`^i0&CT@7_1&5{TR%@$T>4l!}S3;|_>5M(US86&53T>UbpQYW literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_13.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_13.parquet new file mode 100644 index 0000000000000000000000000000000000000000..55b39fd5a5ef6a9fa41a2b8a2f9f459fa1d2fd02 GIT binary patch literal 2128 zcmeHHF>ljA6h6C7a|terP|k9Y1sRCdN^MCLCrTwRl|Y~fp{i6}k)1e=ky8iPDJw&# zP7F*8*^!tk*rJZi9Xm6?#=^+LyR&_Ed4-9AL7YaE%vyxaTVU2vmo>_%Ze z0d)3b{GBC`pg4hK3zYlhK_=(YUhB!hPSfbTQd7uCYt0fw;>W;TCz}~&<<0nLICH~y zPT#AFK;k*sDC z6$Difn3I46BGwkqrNH+?QU%8bQj6|!_a1vURDRaArhzvX#qBy~+3Q|@Rj5AEc8a3w&WajA5tplkF?))spA}6IexiNlLs1dYdUp|G3`T?rpcWhJovOTS2qFZqHnA sSnt~1X2&udtJ$%;U8iMr?NQHYw>xdi9@s{&ZCJ*@8TK~NK|EXk0DvORw*UYD literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_14.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_14.parquet new file mode 100644 index 0000000000000000000000000000000000000000..bbd122732c47b53d0035fc131a5e799b99f59bbe GIT binary patch literal 2068 zcmeHHJ#W)M7``}ea=4~KRqx~=4_RWR5~&jMArD!KK&m>RszWCvWXDe9;?%`)GVlvf z8JO5Rb>b&P9T@lp9Xj$0vV$0S@6JBE%|Ae0tmoc)pZDvY*GK!u9fAlZ`U|3PfK5X! zA%w)20z#_J3~>95>=fK^*`NGn4u(YB9h=$3dgRsQ5F)d08kxa$H#gj*`}*TICHn@>P})j%1d2i`g|bS>D2egF>ky^3 zJW%5(O2`fz8px;y=gIp~6e6O+Nppi!X<*mba}uB3H+12TlVBG;>?D@_GBDJPFjHmXU&V|ZpbshOM(HrQN*Jls?^Nd zcMF@)MFZiQteq%Sp@;OXg@DoZ-^3bu!FVnIk?^Ygv++=zc_=k3 zf#6=Nl3RQjPF>JX*0nk;>bQtSfU6CYV39n3McMAv2DJM?)z%6qTKXFKL(5)Qz>&hv zkw|x2F$oe}sR_=s>Cn2Z3_<3Z(&?`XRZvsAaN>`1KWR;4^hbfsWq+~94dB}*Xa2YW z>{s%iZoqNA0riC+&w}W@-r2MFIxQ~_!mt&$>rHnagkF8%4%&UE=R57bI~e#KXW)+Q aUboxtI_}8r+1;Mg8~L8Si(Zw*CjA31qoGp( literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_15.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_15.parquet new file mode 100644 index 0000000000000000000000000000000000000000..469e2239b160500eaec5dba7dc387f7476870aea GIT binary patch literal 2054 zcmeHHy=vP~6u$b2ee0;Dlw2VfQM_2V5C^;dK@?L6bVxIl;2}ey6v>utrLvVMvgy31o_nu!+gNsHE@cOA<`>QWKvjZNRkXmVnGs;h+JOo zsZkilWD5eT$SC@=_|qT^5K-Y)a{aL|P&Xbru}A$?jmP8IpK`+!PI*iXr?E3VW?|0| z(SWBx7B5;j0W*RgsBv@QM1zSFr8W$4#CF@~#;tixWW#|*J`SxZS&5^E5pmJ-xj z6R342XN+`Df^jATV@<2pEijVy?k`ewcndU~N;E^f#Vf@m8lUdo#u+3l$2dg3S;opKq81^TOE+ic!xuVAM55TX#SyB`=XZvY7J@I6~AJ z;^nRjCPADl^uU<5-m|U?J&<^+a{X69mr<>B>_jJXFRo7`^hbv8mHxXP)X%;&vGWJ@ z)8EK{sQ!MY{>s9OPW^CJY3|rN&H6C%gPL2wp(qdJ@AJ1OY}zJU(r7`AgEmc literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_16.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_16.parquet new file mode 100644 index 0000000000000000000000000000000000000000..46a7b59f99c7d381fd831bd2a48f45d638050ea0 GIT binary patch literal 2114 zcmeHH&1%~~5MIfOvsLBLpRABU2e%NBLTzL$8y9>?A%~JfFi8pNsYsTbh{~3$$UgL3 zXd8OUE41{~M=&|%lm}=7eU3toon1*Q*?xrPvV1c;-|Wonx7&Yt&?JapqC6)e16bFj z#|WWhRRJMMp$^!6KsF0u_*aSULMTTp9P8P}YUVHB_aHM})65jEh8=%;Q_>{5rePGW z7H}#at3RL!4QD7VATxpTlo*AYEB)r~>ldcgZZu8vP+K$^`Qc6lT3QH$EIO$bvdHgF zrhXiS-fXNDFyn7nGVt?&)#m=}KiSnJx~9!hxZ0fj``cvd{@u}VhSKIT6DT4n7s?U@ zO%Pa+fCM5|=dYwF3}dnhfm=vh_s8+uK^Pz+fh9Zs33t$K?AozM_qUWJ9>@MDacm@% z3U!>t_GrSw9wDLtPk}66G_wQd1f5Xl{h1vNPVFf3Ves7>2BDo+AcHNt;nWG(2_OM* zU?&V&7Nm+*z53I@I&`sxaFy3iL{(Jx)O`c@R%B`%=|oDx3(YZxqYjOuUar#|v2hu5 zfuF?3z>k^>aMqG=jdL+)WOL$-a}gLz8a8f#k(ReONM7L$(72Lwl$3Z`(utCEw=i+W z{w%0p1>>^RTT=9rnPj?_c&6etu|zZVgK;hOimEQ+B*E31PH<96ey6m&M(T)d8X?eCCA+Y7bRS-5XS7PJ>wRbx$f1E>#B&h<%u1=pL%h96d?@% zL!bU~ckh9{JwEfh_i*^XyoZDQJ=A7ibmoWSTC>q@H0y5U2SGhDYunD$58PVE>6mTX l@@%v1bUI$s?l{A))oQg{wli?7Zp*T*f#-Igp?>#_NL5uTfvQSGAP#7qIE}%ri|gdZ z3*-T~BUL@19yp`Y0~a0usp?~JLL8ahwbyBqH=vi*x3lxj&dh$Z`%e#Q1QARM8${&* zn}&uFLSoecA+5j+aJo!ZC*1I}FFAIM42ifqHj{(d$lpHiL1ebZkr|wI`}4+HFf_Kt zZVb*QF!=F%`zKEzL2&}f7AQJdo5(r0U$dL`*0zxVPFP%Lg~Ug|sgkV;&Wiv3-C^Gw zhEeeH_0Dde4u+mF&&Xpve6<3x=1^v<{Qj}HV`yxRt7LGt%3sf&<;QD(pBOknah1sy zC@RTLWUL@V5e-)0-PGdct`-JCL{?!_Mn=v%j^1_y9}x|9dB;1H26l{lG;-N_xgh*; zejx!v+Z*|(K zU{l0(_>nv-MeEdS`5{6|x9kWNKuon|ogEqQu&whkD@pSzImz~56 literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_18.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_18.parquet new file mode 100644 index 0000000000000000000000000000000000000000..8df4d0555b6349250c8925114c0451d1cdb19b39 GIT binary patch literal 2223 zcmeHHPiqrF6rbI!$ruw+=`0&qAqZI_G$A2>h!Ug)3(|v>w6q5gvYTw;!loNHn|KmD zc#z`Fd+{uOgwT^9^d!YYPvQsYhw#nJ?ri?peuC^_c>jLyz4^_1ySLq`5=1b`Y!aCX zSl5(W2qAu|AR#3~4dA^;Hs-?ckH4>E>eqzC!m%DXRx`U_cOf%9!^~7%4MXKGvZl~8 z45Q*|10TM3JgL+h{&19o?AOBo!vvab0~2sYOn|t`M2&vq zk6Rj!q>zb71&Bzmh-3hlfh@K^HC<+e!qj+YYWm$brXSfb*lzb- z&kQ?AgP!hr6U${4U|zt0Eze_FkSgc&+;;=(aEk)MIng_jRaxCtUmCc*GSkMPP81$K zcR7Zz)V^`pj)yvy7^lWe5|soEqSQ=+T1!)IoX9aE;z=M*WI!wpym1MLu)RehMTeIF z!_l0_5^wP=p_2^1-2&qb{Yg?kOU7BEy)^zyD$jJTh)na(#1j3I)ZdbEuC#BM`Wsxl zz-dw^n710gxgh5>Ny%B<0pkiF>G191u@KB$0Y5=Kbo!CKX=%Zo+ehu`Ox!xDHn}YN8zU;5}h3`uz-k1E;_Q#HQoUfK! zM}hShA^RkLaJTlHG4Q8TTsRd3bmX1#0oT945v G+`E5GgVbXH literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_19.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_19.parquet new file mode 100644 index 0000000000000000000000000000000000000000..dbc1d9f78e050c84fc28efa183c740a8f209160b GIT binary patch literal 2072 zcmeHHKX21O6hAvoxCT>IRnKxHBRmjOBoZYgjWT$t1OinQRUImjkdPfat&0;EH^~QJ zVCc@m)DHmPpz6+wI&?vTfekvcfGF?o?6Z@61JuQN@7?cr@6Y{Sw$`_s1QATsdqidi zHgshXA;iBLSV&Q608W?4N)CqW5B0QXKfY|h&GehxOwDHqyQR>6`@PNCw7?Vy$yvB;`#HQdD>jG@;wf;w7FI45ISxrY6qV z&$JiTX~9lZoENk8UJ<@2{x<`i;UU+!k>5 zd(ivI1jHy35Fdf^m}n`@#pC+R-ZR~3*6O-`s;nOw`F^(u8P~!f%R}-gl_mFP_Z&&u~YyXuZ(Qi~4h1-=u5_Hh}(^YVcqRQeUP(+eVX%YlQ5Ll3a z1R|D}2a@l30ojAV3R3Pkli=OJa}kl?l((D@+(6xUXa+X*R|;V~4xDjlxEoT6)Nme{ z<0%V!fJgy&1;}FibJJx;cq3}uotysP%=BX$23v0AdS;{`59j>Qn^`Wq0Uj0%*z!D< z1<7JbEq&Lp3RP4PE^+NdltuYaKGJY!Yeu!cN~AD6RUBhj@<{7ni z1Z|b+Y>e~9G0b!+@l5GwVu_}5pj}Fx)J&&fBxFY=I24k+C@+1r1Ep_FxVFfuBpgB&#Sw#E!sp)^3*+F&eqihBinf~Bz zw}EXQ@A>UEoc8nu5T<2WVO1q40#lFy4IXOM1!}{Ne`>T^G88C+qA)T9XonV(B_{&pC{bjO zeSi#ofKDAUcE|%5UiuOl@&ui-bnlUrC?$G=0$Hfz-F>|8j-RCI{$ZaWf{FHuNX)>d zA#Wpu_^N}2WQ{r?`i#^IY548*b;44G#nQ2v16)HQ&2rk$^H$A}>6)fd zvrORR$D3BS0$g!sr3A7qKz7J(fpc}b_sO6OOWZQ5&=~9>uTG3Yp0W&**pH0#$+Few z3S{#@new%9^v;m!no_1_ney+y?OX5ji+uxUR!TYB0z@K8fl37#6+SKKlt_*BdpVAx zgw$cvLdI5bntYu_AtEyDR5v&l4jRUullU~=(s({jf(3W%aL6`wTqMrojHT@&q5)2W zQ~Y@4gv<$rsq^W|iDxHHocS>Lp3K9@Ng1fHSG{QIhO7cyEI6;lPhRW}fS4D{Z%wp!2Qa4Jf{Lw8%_M3%{rqA@Z<`wQmMJ&4c)2UZ zB#2{$1{iXp+t!t0gRC=U>(`1dp|*PF#OF&t=`3ROM}jw#ew`i_&)zOE^GC&JpCC4p z|6B2gx#F8EKfVZ}(`LUr>GnHb9E4#f?ls%)G6=oq*d6yq&d_&yBX>OZ`_9;%PlkiR ZXyCXrcQ_dgo#D*)CNI$*{M6{1^ABdZls5nX literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_21.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_21.parquet new file mode 100644 index 0000000000000000000000000000000000000000..277d681a6bd499d97e9b6dfe185c8d5b633bce81 GIT binary patch literal 2033 zcmeHHzi-n(6h1pna!n0FQSam+4_RUbiB<{up$u6ngoG-AN(B`lBxENJF?H(JP4Wj+ zV(G-p%tpmnRVeDfg4jBt{tqzlC-CmhKD+UMfVy11d+&YkeRuac+nYO0f(R!1eWG%J zO+zaqgv3e#Ax&oi$lf7qsWSZY_al}@hD5v^n{guB`P=6$=*-rrGlSW(^TXFULt|@H z#$Z;!_}Ae14=x}^k$`v$lro_y&H3%-s|QaE@xzN(s(I+Z!%$7h-BfjP=FPml_j^VR zIAgV51V)s~YX9)%^r4}#HPX&tR{Ni4=P$mFK5iN~Mv?Y-3lxumaple9#6fA2;3HwTP$!iwI>I> ztc-{bJRQ2Y;oSCl5NyDLt8+UXjO{RvVTi4a{J@SBlwe;P1~bR!6X1m*fSn-VT@Y0( zX64kvCY+*%a7Ah-DplzgeQM!;#)a0dNibY$)J6B{OoUzfE|0SxM$m6_8uo#)Vojl}m9uB$WW$O|srRWWk=anbC1dg`(0#cht;mrZFKCmYd1Phx9c|Tt~2Vj aTdhvZb_Pzn*J|7CfjjKoL%p2%e*OWE*09+C literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_3.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_3.parquet new file mode 100644 index 0000000000000000000000000000000000000000..9d1f5f32c3225dc752156f77881fbe7ea927e29c GIT binary patch literal 2068 zcmeHHzi-n(6uvlaxC9pg>RAr5P$U$qm6|G1oG73SfIvbWs;Z!_$W9z$aN^)NS=lpE z2{zP$k*OO})EUIup#u_(OdT1JP@fL@-e{iNpZb zHTecYh<{Z;NLHu;yw}NEDh%HlAN-(xO&}JI_1Lj|^M{Yya5MczH&eJA_PcT*X)^sr zVH7SG(9gYl|LgDhxrSpD6%ZeRa*Nzf)tuk0J$iPxYBru~31pM4qFNA@*an}kR>Mtui9*6FL8?JH6 zDm9ElYcOJAHxW^Qr@$>vFtt2pgv8W%b!r8jz7@na47PmF^R1|Y0-Ot7e`0$q13Wkw zuW+ciOQJOPbt3cdRCA1Bt36}Cov1QFY^KIM zBQl9Gh*a|oZTb@>sM9;^e~>onW3kc}3Cg;R>|7plag`D23=X zl7|+(o&iS)J3&0%Nlqt-bA=`tv!%9;lUy5Q9xGk_nNuaSUKm-y>xmOq1_Amb!Pe5h z*yRSWZ4)!U+<f_34P4Kw1l7{IJ#oEmscAQ>4a;<_ kYQt_eoto9Odo8nGZ`3WjW1FqIX_+0T+uA@6vwV~O0Rj@$BLDyZ literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_4.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_4.parquet new file mode 100644 index 0000000000000000000000000000000000000000..e3e265a42be3960fbe08741eb6d146340026cfcf GIT binary patch literal 2075 zcmeHH&2G~`5MDcOSd1wMbuCBoA(vQ1YAQ$hAxIHYr6Q^xs;Zz|K!BY%G{}jI>*T;A z^iuH#a48oM;#Q=n2cCgL)eASCphu)IyKArA_zh4mtIsp@&Cl*P-rd-%5kxT2)`-ji zHVx$tLWrL_Af#y20N%S~B@>1}JoxpU`VE0tI5rbUKJ)Y6+mM-_ab^nhVIPmb$%aDD zIE=!41N$$Vqfhhs+`tKn8%REZVv=&EXLYw`Z$5iy+xAlwpm>_(6DTq%W*U+t zLy}mKgd`%DW?M=a1QA(*zzQ;M49C%{Uf?65z?bR{-v|SBb`KE^cp7B!!l~mkBM5^UZ%v)Bx9^0B4TJ6Gzz>|b0~Nm2elT%;Rsr4| z4A>0&i-s)rPRDtq|Y*WO^Crc*5ynOnOV&Ib{7+r8`ibdg0+P7?*47?e$u<9}a!L8rtPmcQW+*<(AvBn@+=X k?55jlc{Qiy4%&@+y;*nMp4({G8&0F=_1pK+W`Xb0KPt}01poj5 literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_5.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_5.parquet new file mode 100644 index 0000000000000000000000000000000000000000..e2fd4118120d79d9c2138a06a511d5d88cf41c40 GIT binary patch literal 2075 zcmeHH!EVz)5ZyR#*aR0L=vof4P%g2GM5;t_q98>GP$i&9s4A#eSjSFl$%%vOR48K;NcA4Lhh=*e{brf%YvD|~3**CeF!D86Q`(Gf8eUmT-iv=9( zKfk^yu>w+*6p$W)@_;av{fRK#5tL14IEm?G#>4I*q5)5XTfAsN z18#)G%y@f2qtP*qQX2+absU5=sh|eunj6lYfM;T_4+4GwnNO$PJ+<}8^2O0}6 zI`PA4qq8&E>9pL)4}wp<~ literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_6.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_6.parquet new file mode 100644 index 0000000000000000000000000000000000000000..a9df63337e09175ef6331797b189d84373da8eaf GIT binary patch literal 2091 zcmeHH&2G~`5MDcO*d#7RsB1Z3RS}|AK~3c-KQdAdRN_!7fuaJvB0F}9ky8i92^S7I zSKPRA=oN`m&QV_hiBq4T5?7vp*|oiPk~cuTtiJjAW@lz+ws&~aAc$Zh?-P*$tf^7~ zArzkqASB6D0oHqDJLiVy@AiLDyUG!B$69JwkNolS7$Va%iA>?TTkZP@+~}FOQMjJK z^RJH|{$UBEC`lj{f%1Ut?c8ZX{HGYsR6nn6a|G)fSOn}NAQ9_5&ovw<^?KD)ul zo2Yq#vA4_$uyZc!HN7giN2)~6q$Ud2Ycihx`1bo!K2&jvlA2NxC?YB5G7|(<5SWvI z1R_=!M^YFB5!nXEU8LUe&ZGDJz(+&^scd^^TtUs)u_Bk+cjee0N8T`2+>I%lR56aM z;fT52M??mm3{l*0YWYkF<3W{`sTKA=SYfKepeqghz)CVGgIsokiS4r+;7LJ&?I2)L zkRn#K>Q5bOP{l67Rh~N$6;U}>UhBBKHl6yZMx@w1R~%#5%0NHuX5*P5)-HV^@SAuW z_*Dx5?pm2w{X)zbX`eXbLIlRj=<3(NNYYzwB+u{~XgHnN48zGf_%VeSc0jou<8vpHV%6hAvoxF+U9RnKxH3q?q=iqupI`H+PoK{tp5L>(%Skid?eG_6w?H_61n zTo{=$F);EMpwN{q6GO#c(3O#mcX#&LN&W-W<^0~e_j_OWd+zS-*9an*s5?Yv0ycCd zhY;eY1`<+KYJl52WGm%{AN~0y(|cV=%pDu?#d_q?mkvaxXB3%=>uzuL&$6!2GjgNi zdI5*|uMdB+0%9pDAXWjgNwieW*>3ICgJ(MbFz1a@79ub+c;6-WQr?BZ(Xrb-^AK#y<^2bj<9_u)kqCR33ATr6NDwB{d@exY! zGIDYLLYB(F) zqf_R#iHHg~6{2|I)b^PX5>sPgYKOf;JB)1@Y#V()u%im{py%CS;`l5ByfzrH69g;@ z(&VC1{AOYUx+o)D6txpsleLca+{EoQMw$l(QMh}mIfihwzIo72D49rXrsiA{nZy`G zs<{NYRyb;&%LyW$lR%uyfLNicc?pQ9yyZcP3NHbMW0ocoFYzK{5S4egG;xN0sl7HY zGVO@MX(pR9Mc~qp!eH;U4-z=EmuS4C@Z7VkK`_sqJfi4!mW6ItNNrvLBp*FS^3I~w zE5Ha|ClXJ$$QcB2te^x#T3Xv&s{ukvfg0d;o-C!@@ZZJdr;F58Am)yZ_#zwm&13^2({B`+!mQim#Sd9m=r?ks zFk3+X_m{>WRzQrR0^%c33{pzftZr7V^>@!K%i7jcn9=6yD#S`0fc%J5Qsjl+?$GZa zcDlXcZhyG8z5Y_qnph~Zxd5Ru?%769KYV|sEA$&RN@2FqZ^LEDe*E>hj$;%x8Xtio zli5_Q64E6;V+o#4E?&G=!XOxtB{-}gJwMnVeQXClA_|;TchD0CdW}2I$fNfw1@4bW zgFSAT;jOirR^+$=ZhY*2H#3vgVFK6z-mCjxk)VYwp;I zW)s9_YF9t`BpAmsFsAfuUI8O2@8%#yg;zksv5XVMOT5e( zq`%HoYoE=_oSl|;8uJL}iinVY9u{CCX+I_NT(NIa_bnzK@Os7|n3t5itr)`f z3=CmP>*m?4LDcAtlIIr_IGcqD#QIJUPj`|x2;y9!6vk|6wt14bLHx1)wbQ&NqgD05 z2|pcqqw-#e=5w&;^v`#<7iN0Y)o`0luj(}2ZmV9aHENFAcI&NL-Kn>|PU|6hAMMvaw4Aw# literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_9.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition/0_0_9.parquet new file mode 100644 index 0000000000000000000000000000000000000000..d32fc3f08e65424d06d0c8716c7f361752c320f4 GIT binary patch literal 2054 zcmeHHJ#W)M7``~JxfmBAl(QUUp$yb2QWH7Khk_I#5Dd@(RRvW{AUkmygHs3B33J!3 zbmj-3P8}htx*+ujAn4p5(V6$|?6aHv1JvdG+d z=-a#Y87m+~Ndf5=DEG;OT+OAe#*4R)ZQFihWRQ_o?Gi*1$G|KTBga%mAN?=8BP01T zj`~mm4nJ6K;^5cub3>+U(gcOeP5gMJNk9LdJvDHOk|xqEP$Z({N+}?tz~=|kCsJkp zQjWqfCS};HA>)?67k}u70U|Q&RM+1X26~JKPVCY1HI4h@*dKGl9Zp%LhW*$XPng?1 zL^R-O5XFmTPQZ+im>O@-oTxu?qSS`LRvHGOlT@I>ULAx}H((jys=+CR zv4u_OVh!PnsGUfaxT06Z=j~QY!wdMtpNsK|Hnin9~ zfdqVevoO`NeOdTgB*dpYgq zAm^XRf}g(}Mt`rn)`{FpHJpAq@o2>)m?zF|P_%lq0fg@PU!8f;z8~&Y8})9zQ5!^l5Y!^Oy6#T>U{LM29lPzcJjZUkosQRVI_|LB aYBt+V$L+hVZnNdI`re@X2t6tCSM(1n?5y+v literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/4203_corrected_dates.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/4203_corrected_dates.parquet new file mode 100644 index 0000000000000000000000000000000000000000..cee02a2d4427629c28053d151860dec3ff256e2e GIT binary patch literal 278 zcmWG=3^EjD5Va9?&=KVUGT1~pWF%Nj92giRSQO{{|1ZV>1ac)zK$6)W#AXEI|3L5$ zh(QWKN=2C@bv#&#^7BhXnSdNsCJBy|#FEtb)s$rs8QksQ{v9YOf bVp4LFiG{IAqDgXUiiIY_NucKffFS|^hxSEI literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/4203_corrupt_dates.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/4203_corrupt_dates.parquet new file mode 100644 index 0000000000000000000000000000000000000000..64f7568e596df4f7214d374e9ab74cda7e03bc27 GIT binary patch literal 181 zcmWG=3^EjD5Va9?&=KVUGT1~pWF%O8c6%{Muvm(FF^Dk$fm{g_kYu)(sPX}~N0mu}BPFpUH9k2%N0o;`j7L&NQbvtK2B^%GgN;EF rAuGxx#v@iI*2JjBVS`~SP=bd+gR3C1sIWA(L^rpHAp_`&0HA{bwJ#ud literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/4203_corrupted_dates_1.4.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/4203_corrupted_dates_1.4.parquet new file mode 100644 index 0000000000000000000000000000000000000000..62429fea4c7f18359826d2458064f799283a0b4f GIT binary patch literal 278 zcmZXQ!Ait15Qbx{OAhgvYyv@g*+pR+8k&|03+hdZhh<+tk~Tu2T8&*Vf(Kt~pTM_r zRs;_Y{KE|Y2R}2Lo2_Apt=aX8Q5a^7D3RRl8zhp;d4o8DI-4-`<16vrAPJg-IOqTz zP=@8|Hi_dnF$~KB3(~nM+_&Sfpomkcq*@XMSUE#P{<#cu%0Kw$(UPqHrGi0`RN}OY z{V?da5MTT8Nt_sMb&kaL*~OP#n6k&1{(7m=wKmxgpT#^q_q%@R(#ltP>1=R$>8mOj cSNVt97DZV&-+EgY*4Z|6^#%HZGdJ*&Z^qp`_W%F@ literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/drill_0_6_currupt_dates_no_stats.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions/drill_0_6_currupt_dates_no_stats.parquet new file mode 100644 index 0000000000000000000000000000000000000000..984074ef0ecfbe41e0489ce6791ce50ed72a73c4 GIT binary patch literal 181 zcmWG=3^EjD5Va9?&=KVUGT1~pWF%O8c6%{Muvm(FF^Dk$fm{g_kYu)(sPX}~N0mu}BPFpUH9k2%N0o;`j7L&NQbvtK2B^%Gje$cF rAuGxx#v@iI*2JjBVS`~SP=bd+gR3C1sIWA(L^rpHAp_`&0HA{bwIv{T literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt new file mode 100644 index 00000000000..7fdb5b28ec1 --- /dev/null +++ b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt @@ -0,0 +1,301 @@ +{ + "metadata_version" : "v2", + "columnTypeInfo" : { + "date_col" : { + "name" : [ "date_col" ], + "primitiveType" : "INT32", + "originalType" : "DATE" + } + }, + "files" : [ { + "path" : "REPLACED_IN_TEST/mixed_partitioned/1_9_0_partitioned_no_corruption/0_0_1.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : -25567, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/1_9_0_partitioned_no_corruption/0_0_2.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : -2, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/1_9_0_partitioned_no_corruption/0_0_3.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : -1, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/1_9_0_partitioned_no_corruption/0_0_4.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : 0, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/1_9_0_partitioned_no_corruption/0_0_5.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : 1, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/1_9_0_partitioned_no_corruption/0_0_6.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : 16436, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/partitioned_with_corruption_4203/0_0_1.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : -25567, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/partitioned_with_corruption_4203/0_0_2.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : -2, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/partitioned_with_corruption_4203/0_0_3.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : -1, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/partitioned_with_corruption_4203/0_0_4.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : 0, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/partitioned_with_corruption_4203/0_0_5.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : 1, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/partitioned_with_corruption_4203/0_0_6.parquet", + "length" : 257, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : 16436, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/0_0_1.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : -25567, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/0_0_2.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : -2, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/0_0_3.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : -1, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/0_0_4.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : 0, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/0_0_5.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : 1, + "nulls" : 0 + } ] + } ] + }, { + "path" : "REPLACED_IN_TEST/mixed_partitioned/0_0_6.parquet", + "length" : 160, + "rowGroups" : [ { + "start" : 4, + "length" : 45, + "rowCount" : 1, + "hostAffinity" : { + "localhost" : 1.0 + }, + "columns" : [ { + "name" : [ "date_col" ], + "mxValue" : 16436, + "nulls" : 0 + } ] + } ] + } ], + "directories" : [ "file:REPLACED_IN_TEST/mixed_partitioned/1_9_0_partitioned_no_corruption", "file:REPLACED_IN_TEST/mixed_partitioned/partitioned_with_corruption_4203" ], + "drillVersion" : "1.9.0" +} diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet new file mode 100644 index 0000000000000000000000000000000000000000..c5c0b1af69d4080669e6a14f7c4dbb6470d50da6 GIT binary patch literal 364 zcmWG=3^EjD5akgS&;b$*qHLlZGG;ss3=C`{OhA&Efsmw)r~^W$1dGpZF9rz~OK~p- zF`#w^xe~A`_7c@z43a=B4#Z+W3{n76D#|43 zyFw$lLMN!g8iYc)ePBg(Voi){H8vPN7JJ1g#>1e&RghRzSeja*n_I;26c`Wzz;Fft Dh*v)E literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_1.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_1.parquet new file mode 100644 index 0000000000000000000000000000000000000000..31723cc225e8fcabc3c469d8bbfaf1b6aadd71f1 GIT binary patch literal 257 zcmWG=3^EjD5ET)X&=F+NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{HZ@L6N=`DdFg8gvNls0%&}29P J^icrN^8n<5H8KDI literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_2.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_2.parquet new file mode 100644 index 0000000000000000000000000000000000000000..0c558ed7b4eb5e941a17c3e3a142fc981b50b864 GIT binary patch literal 257 zcmWG=3^EjD5ET)X&=F+q!NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{HZ@L6N=`DdFg8gvNls0%&}29P J^icrN^8nJ$Hg^C3 literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_3.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_3.parquet new file mode 100644 index 0000000000000000000000000000000000000000..f069ddfdc1e1db1b702079d87741c88d84efb520 GIT binary patch literal 257 zcmWG=3^EjD5ET)X&=F+NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{HZ@L6N=`DdFg8gvNls0%&}29P J^icrN^8nU>Hhcg8 literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_4.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_4.parquet new file mode 100644 index 0000000000000000000000000000000000000000..2c0dd7b1f397da12915fabdcf3358a1ffaed908e GIT binary patch literal 257 zcmWG=3^EjD5ET)X&=F+q!NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{HZ@L6N=`DdFg8gvNls0%&}29P J^icrN^8ng1Hh};D literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_5.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_5.parquet new file mode 100644 index 0000000000000000000000000000000000000000..19a436b77f8a8a44ce525e2c277e7b17f210f71f GIT binary patch literal 257 zcmWG=3^EjD5ET)X&=F+NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{HZ@L6N=`DdFg8gvNls0%&}29P J^icrN^8nrCHiiHI literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_6.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203/0_0_6.parquet new file mode 100644 index 0000000000000000000000000000000000000000..49020b53b762dc193efae4fa21f40b76997ba793 GIT binary patch literal 257 zcmWG=3^EjD5ET)X&=F+D^uCr7U=wWv5VKTm?yP|rlqfI%XtAhD>hG_^!Gw@3jhr(mdOp=YQIRnrL8}WM-CXm}r)iW?^D%Y-*gCl$>N@VQi9UlAM}ip~-Lp J=%WCj=K+PdIK=<} literal 0 HcmV?d00001 diff --git a/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2/0_0_1.parquet b/exec/java-exec/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2/0_0_1.parquet new file mode 100644 index 0000000000000000000000000000000000000000..ba99a37d7863b06cbcf4d8906544ec2be4b4e166 GIT binary patch literal 160 zcmWG=3^EjD5ET)X&=F+NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{q!NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{q!NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{NoGcnuqczHjt5IoetwB4 z6Og0IB*BrASdto_oS&o0!yv{aDI+PP#vub#X3D|9CW(+0WfEf&3lob{#wL~|!h#?(lVF1u=01B@hMF0Q* literal 0 HcmV?d00001