-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33472][SQL][2.4] Adjust RemoveRedundantSorts rule order #30437
Conversation
This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection. `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception: ``` IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions. ``` No Unit test Closes apache#30373 from allisonwang-db/sort-follow-up. Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit a03c540) Signed-off-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #131394 has finished for PR 30437 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @allisonwang-db and @maropu .
Merged to branch-2.4.
Backport #30373 for branch-2.4. ### What changes were proposed in this pull request? This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection. ### Why are the changes needed? `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception: ``` IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions. ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test Closes #30437 from allisonwang-db/spark-33472-2.4. Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Backport #30373 for branch-2.4.
What changes were proposed in this pull request?
This PR switched the order for the rule
RemoveRedundantSorts
andEnsureRequirements
so thatEnsureRequirements
will be invoked beforeRemoveRedundantSorts
to avoid IllegalArgumentException when instantiating PartitioningCollection.Why are the changes needed?
RemoveRedundantSorts
rule uses SparkPlan'soutputPartitioning
to check whether a sort node is redundant. Currently, it is added beforeEnsureRequirements
. SincePartitioningCollection
requires left and right partitioning to have the same number of partitions, which is not necessarily true before applyingEnsureRequirements
, the rule can fail with the following exception:Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test