Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28863][SQL][FOLLOWUP][3.0] Make sure optimized plan will not be re-analyzed #30872

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

backport #30777 to 3.0


What changes were proposed in this pull request?

It's a known issue that re-analyzing an optimized plan can lead to various issues. We made several attempts to avoid it from happening, but the current solution AlreadyOptimized is still not 100% safe, as people can inject catalyst rules to call analyzer directly.

This PR proposes a simpler and safer idea: we set the analyzed flag to true after optimization, and analyzer will skip processing plans whose analyzed flag is true.

Why are the changes needed?

make the code simpler and safer

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests.

…analyzed

It's a known issue that re-analyzing an optimized plan can lead to various issues. We made several attempts to avoid it from happening, but the current solution `AlreadyOptimized` is still not 100% safe, as people can inject catalyst rules to call analyzer directly.

This PR proposes a simpler and safer idea: we set the `analyzed` flag to true after optimization, and analyzer will skip processing plans whose `analyzed` flag is true.

make the code simpler and safer

no

existing tests.

Closes apache#30777 from cloud-fan/ds.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit b4bea1a)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@cloud-fan cloud-fan changed the title [SPARK-28863][SQL][FOLLOWUP] Make sure optimized plan will not be re-analyzed [SPARK-28863][SQL][FOLLOWUP][3.0] Make sure optimized plan will not be re-analyzed Dec 21, 2020
@cloud-fan
Copy link
Contributor Author

cc @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Dec 21, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37769/

@SparkQA
Copy link

SparkQA commented Dec 21, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37769/

@SparkQA
Copy link

SparkQA commented Dec 21, 2020

Test build #133170 has finished for PR 30872 at commit ccc1807.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to branch-3.0.

HyukjinKwon pushed a commit that referenced this pull request Dec 22, 2020
…e re-analyzed

backport #30777 to 3.0

----------

### What changes were proposed in this pull request?

It's a known issue that re-analyzing an optimized plan can lead to various issues. We made several attempts to avoid it from happening, but the current solution `AlreadyOptimized` is still not 100% safe, as people can inject catalyst rules to call analyzer directly.

This PR proposes a simpler and safer idea: we set the `analyzed` flag to true after optimization, and analyzer will skip processing plans whose `analyzed` flag is true.

### Why are the changes needed?

make the code simpler and safer

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests.

Closes #30872 from cloud-fan/ds.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants