ParquetPartitioningStreamWriter does two things: It adds two columns (i.e. transformation) and writes the dataframe partitioned (special write). With #116 the two responsibilities can be separated: ParquetStreamWriter is enhanced to write partitioned. Thus, only the transformation is left for ParquetPartitioningStreamWriter.
Tasks
- Refactor
ParquetPartitioningStreamWriter to a transformer and rename
- Merge
AbstractParquetStreamWriter with ParquetStreamWriter
How to migrate Hyperdrive-Trigger
- Replace
"component.writer=za.co.absa.hyperdrive.ingestor.implementation.writer.parquet.ParquetPartitioningStreamWriter"
with
"component.transformer.id.2=add.date.version", "component.transformer.class.add.date.version=za.co.absa.hyperdrive.ingestor.implementation.transformer.add.dateversion.AddDateVersionTransformer",
"component.writer=za.co.absa.hyperdrive.ingestor.implementation.writer.parquet.ParquetStreamWriter"
- Replace
writer.parquet.partitioning.report.date
with
transformer.add.date.version.report.date
- Replace
"writer.parquet.destination
with
"transformer.add.date.version.destination=${writer.parquet.destination}", "writer.parquet.partition.columns=hyperdrive_date, hyperdrive_version", "writer.parquet.destination
Make sure there is no workflow using ParquetPartitioning and partition columns at the same time
ParquetPartitioningStreamWriterdoes two things: It adds two columns (i.e. transformation) and writes the dataframe partitioned (special write). With #116 the two responsibilities can be separated:ParquetStreamWriteris enhanced to write partitioned. Thus, only the transformation is left forParquetPartitioningStreamWriter.Tasks
ParquetPartitioningStreamWriterto a transformer and renameAbstractParquetStreamWriterwithParquetStreamWriterHow to migrate Hyperdrive-Trigger
with
with
with
Make sure there is no workflow using ParquetPartitioning and partition columns at the same time