Skip to content

Improve repartition buffering #4865

@crepererum

Description

@crepererum

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In #4820 @alamb and I discussed that the repartition node could have a slightly smarter buffering. This is a tracking issue for this.

Describe the solution you'd like
While the repartition node needs an unbounded buffer to prevent dead locks, it doesn't need to buffer unlimited amount of data in all cases. To be precise: if ALL output channels have data (i.e. are not empty), than the input workers can be paused. However if it least one output channel is empty, we need to drive the input workers. In the worst case, a few channels will fill up with unbounded data but one channel will forever stay empty. Realistically, this will not happen for any reasonable repartition configuration.

Describe alternatives you've considered
Keeping the current state.

Additional context
-

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions