Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33472][SQL] Adjust RemoveRedundantSorts rule order #30373

Closed
wants to merge 3 commits into from

Conversation

allisonwang-db
Copy link
Contributor

@allisonwang-db allisonwang-db commented Nov 13, 2020

What changes were proposed in this pull request?

This PR switched the order for the rule RemoveRedundantSorts and EnsureRequirements so that EnsureRequirements will be invoked before RemoveRedundantSorts to avoid IllegalArgumentException when instantiating PartitioningCollection.

Why are the changes needed?

RemoveRedundantSorts rule uses SparkPlan's outputPartitioning to check whether a sort node is redundant. Currently, it is added before EnsureRequirements. Since PartitioningCollection requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying EnsureRequirements, the rule can fail with the following exception:

IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test

@github-actions github-actions bot added the SQL label Nov 13, 2020
@SparkQA
Copy link

SparkQA commented Nov 14, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35688/

@SparkQA
Copy link

SparkQA commented Nov 14, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35688/

@SparkQA
Copy link

SparkQA commented Nov 14, 2020

Test build #131085 has finished for PR 30373 at commit e6b0bb3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

EnsureRequirements,
RemoveRedundantSorts,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave some comments here about why we need to put this rule after EnsureRequirements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @maropu 's comment.

@@ -135,6 +135,26 @@ abstract class RemoveRedundantSortsSuiteBase
}
}
}

test("shuffled join with different left and right side partition numbers") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you add the prefix: SPARK-33183:

(0 to 100).toDF("key").createOrReplaceTempView("t2")

// left side partitioning: RangePartitioning(key ASC, 2)
// right side partitioning: UnknownPartitioning(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add assert to check if the query below has the output partitions above?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allisonwang-db . Please split this PR.

  1. Please create a new JIRA for RemoveRedundantSorts. This looks worth.
  2. Version update PR can be a follow-up.

@allisonwang-db allisonwang-db changed the title [SPARK-33183][SQL][FOLLOW-UP] Adjust RemoveRedundantSorts rule order and update config version [SPARK-33472][SQL] Adjust RemoveRedundantSorts rule order Nov 18, 2020
@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35844/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35844/

@dongjoon-hyun
Copy link
Member

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131240 has finished for PR 30373 at commit fa6050a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35859/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35859/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131256 has finished for PR 30373 at commit fa6050a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35887/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35887/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131282 has finished for PR 30373 at commit fa6050a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35923/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35923/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131319 has finished for PR 30373 at commit 4e684df.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in a03c540 Nov 19, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan
Copy link
Contributor

@allisonwang-db can you create backport PRs for 2.4 and 3.0?

allisonwang-db added a commit to allisonwang-db/spark that referenced this pull request Nov 20, 2020
This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection.

`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception:
```
IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions.
```

No

Unit test

Closes apache#30373 from allisonwang-db/sort-follow-up.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit a03c540)
Signed-off-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
allisonwang-db added a commit to allisonwang-db/spark that referenced this pull request Nov 20, 2020
This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection.

`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception:
```
IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions.
```

No

Unit test

Closes apache#30373 from allisonwang-db/sort-follow-up.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit a03c540)
Signed-off-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
@dongjoon-hyun
Copy link
Member

Thank you, @allisonwang-db , @maropu , @cloud-fan !

dongjoon-hyun pushed a commit that referenced this pull request Nov 20, 2020
Backport #30373 for branch-3.0.

### What changes were proposed in this pull request?
This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection.

### Why are the changes needed?
`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception:
```
IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions.
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

Closes #30438 from allisonwang-db/spark-33472-3.0.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Nov 20, 2020
Backport #30373 for branch-2.4.

### What changes were proposed in this pull request?
This PR switched the order for the rule `RemoveRedundantSorts` and `EnsureRequirements` so that `EnsureRequirements` will be invoked before `RemoveRedundantSorts` to avoid IllegalArgumentException when instantiating PartitioningCollection.

### Why are the changes needed?
`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check whether a sort node is redundant. Currently, it is added before `EnsureRequirements`. Since `PartitioningCollection` requires left and right partitioning to have the same number of partitions, which is not necessarily true before applying `EnsureRequirements`, the rule can fail with the following exception:
```
IllegalArgumentException: requirement failed: PartitioningCollection requires all of its partitionings have the same numPartitions.
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

Closes #30437 from allisonwang-db/spark-33472-2.4.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
EnsureRequirements,
// `RemoveRedundantSorts` needs to be added before `EnsureRequirements` to guarantee the same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before -> after ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed it. @allisonwang-db could you fix it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching it! Will create a fix.

dongjoon-hyun pushed a commit that referenced this pull request Dec 4, 2020
### What changes were proposed in this pull request?
This PR is a follow-up for #30373 that updates the comment for RemoveRedundantSorts in QueryExecution.

### Why are the changes needed?
To update an incorrect comment.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

Closes #30584 from allisonwang-db/spark-33472-followup.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Dec 4, 2020
### What changes were proposed in this pull request?
This PR is a follow-up for #30373 that updates the comment for RemoveRedundantSorts in QueryExecution.

### Why are the changes needed?
To update an incorrect comment.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

Closes #30584 from allisonwang-db/spark-33472-followup.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 960d6af)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@allisonwang-db allisonwang-db deleted the sort-follow-up branch January 19, 2024 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants