Skip to content

[VL] Dictionary support in shuffle #8855

@FelixYBW

Description

@FelixYBW

Description

one common performance issue of Spark the shuffle. Shuffle data size may impact the performance directly.
While one advantage of columnar shuffle is that we can easily use dictionary based, which is expected to decrease the shuffle data size. Meanwhile the Velox pipeline supports dictionary but now all data is flattened after shuffle. With dictionary shuffle support, the next stage can still use the dictionary data which is expected to save memory as well.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions