Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33472][SQL][2.4] Adjust RemoveRedundantSorts rule order #30437

Closed

Conversation

allisonwang-db
Copy link
Contributor

Backport #30373 for branch-2.4.

What changes were proposed in this pull request?

This PR switched the order for the rule RemoveRedundantSorts and EnsureRequirements so that EnsureRequirements will be invoked before RemoveRedundantSorts to avoid IllegalArgumentException when instantiating PartitioningCollection.

Why are the changes needed?

RemoveRedundantSorts rule uses SparkPlan's outputPartitioning to check whether a sort node is redundant. Currently, it is added before EnsureRequirements. Since PartitioningCollection requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying EnsureRequirements, the rule can fail with the following exception:

IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test

This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection.

`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception:
```
IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions.
```

No

Unit test

Closes apache#30373 from allisonwang-db/sort-follow-up.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit a03c540)
Signed-off-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35998/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35998/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Test build #131394 has finished for PR 30437 at commit dbc38d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @allisonwang-db and @maropu .
Merged to branch-2.4.

dongjoon-hyun pushed a commit that referenced this pull request Nov 20, 2020
Backport #30373 for branch-2.4.

### What changes were proposed in this pull request?
This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection.

### Why are the changes needed?
`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception:
```
IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions.
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

Closes #30437 from allisonwang-db/spark-33472-2.4.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@allisonwang-db allisonwang-db deleted the spark-33472-2.4 branch January 19, 2024 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants