Is your feature request related to a problem or challenge?
RepartitionExec emits one small RecordBatch per (input batch × non-empty output partition), then coalesces them back to target size on the consumer side. The channel layer (memory accounting, sender gate, await suspensions) therefore does work proportional to num_output_partitions per input batch, even though each sub-batch only carries ~batch_size / num_output_partitions rows.
This becomes a real cost at high fanout. In datafusion-distributed, RepartitionExec is the backbone for network shuffles and is scaled to P * W partitions (P ≈ 12–24, W up to thousands), where per-batch channel overhead dominates.
Additional context
A candidate implementation is in #22010.
Is your feature request related to a problem or challenge?
RepartitionExecemits one smallRecordBatchper (input batch × non-empty output partition), then coalesces them back to target size on the consumer side. The channel layer (memory accounting, sender gate, await suspensions) therefore does work proportional tonum_output_partitionsper input batch, even though each sub-batch only carries ~batch_size / num_output_partitionsrows.This becomes a real cost at high fanout. In
datafusion-distributed,RepartitionExecis the backbone for network shuffles and is scaled toP * Wpartitions (P ≈ 12–24, W up to thousands), where per-batch channel overhead dominates.Additional context
A candidate implementation is in #22010.