[GLUTEN-10988][VL] Do not resize batches for sort-based/rss-sort shuffle#10991
[GLUTEN-10988][VL] Do not resize batches for sort-based/rss-sort shuffle#10991marin-ma merged 5 commits intoapache:mainfrom
Conversation
|
Could you only remove this for the specific shuffle writer type |
@marin-ma Thank you. I have only removed support for |
| case shuffle: ColumnarShuffleExchangeExec | ||
| if shuffle.shuffleWriterType == HashShuffleWriterType && | ||
| VeloxConfig.get.veloxResizeBatchesShuffleInput => | ||
| case ColumnarResizeableShuffleExchangeExec(shuffle) if resizeBatchesShuffleInputEnabled => |
There was a problem hiding this comment.
It seems this change assumes a specific shuffle writer type requires both shuffle input and output require resizing at the same time, but I don't think that's necessarily true. We may assume for a shuffle type it may need input resizing but not output resizing, and vice versa. (Although for the existing types they are the same).
Maybe we can generalize this by adding these flags to the ShuffleWriterType trait:
trait ShuffleWriterType {
val name: String
val requiresResizingShuffleInput: Boolean
val requiresResizingShuffleOutput: Boolean
}
What do you think?
There was a problem hiding this comment.
Thanks, sounds good. I’ll make that change.
There was a problem hiding this comment.
@marin-ma Thank you for your suggestion, I have updated it. Also, do we need to change the default value of COLUMNAR_VELOX_RESIZE_BATCHES_SHUFFLE_OUTPUT to true?
https://github.com/apache/incubator-gluten/blob/95f271d7f683deb9d61fb019806d6d221d0fc6c9/backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala#L279-L286
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
| trait ShuffleWriterType { | ||
| val name: String | ||
| val requiresResizingShuffleInput: Boolean = true | ||
| val requiresResizingShuffleOutput: Boolean = true |
There was a problem hiding this comment.
Please move these into HashShuffleWriterType
|
Run Gluten Clickhouse CI on x86 |
What changes are proposed in this pull request?
The sort-based shuffle reader already respects the batch size configuration, so we don't need to support
resizeBatches.shuffleOutputfor it.sort:
https://github.com/apache/incubator-gluten/blob/2f7b138f24f16a249cea57217e50962d3b1a8ee4/cpp/velox/shuffle/VeloxShuffleReader.cc#L602
rss-sort:
https://github.com/apache/incubator-gluten/blob/2f7b138f24f16a249cea57217e50962d3b1a8ee4/cpp/velox/shuffle/VeloxShuffleReader.cc#L766
closes #10988
How was this patch tested?