NIFI-15568 Fix Timestamp Partitioning in PutIcebergRecord#10996
NIFI-15568 Fix Timestamp Partitioning in PutIcebergRecord#10996pvillard31 merged 2 commits intoapache:mainfrom
Conversation
| final Types.TimestampType timestampType = (Types.TimestampType) fieldType; | ||
| if (timestampType.shouldAdjustToUTC()) { | ||
| converter = dateTime -> DateTimeUtil.nanosFromTimestamptz((OffsetDateTime) dateTime); | ||
| } else { | ||
| converter = dateTime -> DateTimeUtil.nanosFromTimestamp((LocalDateTime) dateTime); | ||
| } |
There was a problem hiding this comment.
Copy/paste I guess?
| final Types.TimestampType timestampType = (Types.TimestampType) fieldType; | |
| if (timestampType.shouldAdjustToUTC()) { | |
| converter = dateTime -> DateTimeUtil.nanosFromTimestamptz((OffsetDateTime) dateTime); | |
| } else { | |
| converter = dateTime -> DateTimeUtil.nanosFromTimestamp((LocalDateTime) dateTime); | |
| } | |
| final Types.TimestampType timestampType = (Types.TimestampType) fieldType; | |
| if (timestampType.shouldAdjustToUTC()) { | |
| converter = dateTime -> DateTimeUtil.microsFromTimestamptz((OffsetDateTime) dateTime); | |
| } else { | |
| converter = dateTime -> DateTimeUtil.microsFromTimestamp((LocalDateTime) dateTime); | |
| } |
The existing test testWriteDataFilesPartitionedTimestamp does not catch this because it only verifies that dataFiles has length 1 and a recordCount() of 1, it never asserts on the actual partition values of the output DataFile.
There was a problem hiding this comment.
Good catch, I will adjust and add some testing for this behavior.
|
|
||
| private StructLike wrapped = null; | ||
|
|
||
| @SuppressWarnings("unchecked") |
There was a problem hiding this comment.
Do we want to keep those?
There was a problem hiding this comment.
Yes, this is needed for casting the array of functional converters.
| /** | ||
| * Record Converter handles translating field values to types compatible with Apache Iceberg Records | ||
| */ | ||
| class RecordConverter { |
There was a problem hiding this comment.
We do not support nested records. The PartitionKeyRecord does handle nested structs via its STRUCT converter, but not here. Might be an acceptable limitation for know but wanted to point it out.
There was a problem hiding this comment.
Thanks for noting this detail, yes, the PartitionKeyRecord follows the pattern of the implementation from Apache Iceberg, but this RecordConverter is more narrowly implemented. The difference in capability is acceptable for now, but I will track it for subsequent improvement with more general support for nested structures.
- Added conditional conversion of partition key fields for ParquetIcebergWriter - Added conditional conversion of java.sql types to java.time types for PutIcebergRecord
6269845 to
6686cca
Compare
|
Thanks for the review @pvillard31, I pushed a commit to use the |
Summary
NIFI-15568 Corrects partition by Timestamp, Date, and Time fields in the
PutIcebergRecordProcessor and theParquetIcebergWriterController Service.Based on the initial approach in #10877, changes include selected conversion of fields with
java.sqltypes to correspondingjava.timetypes in theDelegatedRecordclass. The conversion process evaluates the record schema to determine the presence of field types requiring conversion, avoiding unnecessary object creation.Instead of adding the
iceberg-datalibrary, which brings in additional transitive dependencies, changes to theParquetIcebergWriterinclude aPartitionKeyRecordfor wrapping input IcebergRecordobjects and returning primitive partition keys following the pattern of the Iceberg InternalRecordWrapper.New unit tests verify the behavior of the
DelegatedRecordwith Timestamp, Date, and Time fields, and also verify partition key handling forParquetIcebergWriter.Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000VerifiedstatusPull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
./mvnw clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation