The blocking operators (HJ buid side, Aggregation) are often planned on top of a RepartitionExec with a CoalesceBatchesExec in-between. However, one of the first things these operators do is concatenate the freshly coalesced batches - Maybe there is no benefit from adding the coalesce's here?
I've prototyped an implementation here: #15479