Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
[SPARK-29544] [SQL] optimize skewed partition based on data size #26434
What changes were proposed in this pull request?
Skew Join is common and can severely downgrade performance of queries, especially those with joins. This PR aim to optimization the skew join based on the runtime Map output statistics by adding "OptimizeSkewedPartitions" rule. And The details design doc is here. Currently we can support "Inner, Cross, LeftSemi, LeftAnti, LeftOuter, RightOuter" join type.
Why are the changes needed?
To optimize the skewed partition in runtime based on AQE
Does this PR introduce any user-facing change?
How was this patch tested?