Skip to content

Add Trigger.ProcessingTime to Writers #84

@kevinwallimann

Description

@kevinwallimann

Currently, all jobs are ingested with Trigger.Once, i.e. all data is ingested into one parquet file (per kafka partition). Certain jobs may produce very large output files, leading to out of memory errors.

To prevent this, Trigger.ProcessingTime should be used.

New configuration property: writer.parquet.trigger
The expected value is the number of milliseconds.
If the value is not a number or the property is not present, all data should be ingested at once, as it is the case now.

The change should be available for both ParquetStreamWriter and ParquetPartitioningStreamWriter

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions