Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30326][SQL] Raise exception if analyzer exceed max iterations #26977

Closed
wants to merge 11 commits into from

Conversation

Eric5553
Copy link
Contributor

@Eric5553 Eric5553 commented Dec 22, 2019

What changes were proposed in this pull request?

Enhance RuleExecutor strategy to take different actions when exceeding max iterations. And raise exception if analyzer exceed max iterations.

Why are the changes needed?

Currently, both analyzer and optimizer just log warning message if rule execution exceed max iterations. They should have different behavior. Analyzer should raise exception to indicates the plan is not fixed after max iterations, while optimizer just log warning to keep the current plan. This is more feasible after SPARK-30138 was introduced.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add test in AnalysisSuite

@Eric5553
Copy link
Contributor Author

@cloud-fan @dongjoon-hyun @HyukjinKwon
Would you please help review, thanks!

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 23, 2019

If the analyzer hits max iteration, and the plan is resolved, shall we log warning or fail?

@Eric5553
Copy link
Contributor Author

@cloud-fan Good catch!

IMO, literally the description The max number of iterations the analyzer runs. infers we should try to iterate the plan to the maxIteration inclusively with no error or warning. But the current code will raise exception or log warning at the max iteration even the plan is fixed. This apply to both analyzer & optimizer.

I tried to fix it in 015e97218b60da907e49ad9bebb208a276a13354, and also added a corresponding test case in RuleExecutorSuite. The case will fail(raise exception) with original code.

What's your opinion? Please correct me if I'm wrong. Thanks!

@Eric5553
Copy link
Contributor Author

@cloud-fan @maropu @viirya Would you please help review the latest change? Thanks!

@maropu
Copy link
Member

maropu commented Dec 27, 2019

Analyzer should raise exception to indicates the plan is not fixed after max iterations

I'm currently not sure that this feature is useful... If a logical plan has unresolved nodes for that case, I think Spark should throw an analysis exception in CheckAnalysis. Do you mean that is not enough?

@gatorsmile
Copy link
Member

@maropu I think the exception threw by CheckAnalysis is confusing to end users. The unresolved plans are not caused by the user queries. Instead, they need to increase the value of SQLConf.ANALYZER_MAX_ITERATIONS.

@gatorsmile
Copy link
Member

If the analyzer hits max iteration, and the plan is resolved, shall we log warning or fail?

in this case, keep the existing behavior?

@Eric5553
Copy link
Contributor Author

If the analyzer hits max iteration, and the plan is resolved, shall we log warning or fail?

in this case, keep the existing behavior?

Sure, I've reverted the code to keep existing behavior for this case. Thanks for review! @gatorsmile

@Eric5553
Copy link
Contributor Author

Eric5553 commented Jan 3, 2020

@cloud-fan @maropu @viirya Would you please help take a look? Thanks!

@maropu
Copy link
Member

maropu commented Jan 8, 2020

ok to test

@maropu
Copy link
Member

maropu commented Jan 8, 2020

Looks fine now. I'll leave this to the other more qualified reviewers. @cloud-fan @viirya

@Eric5553
Copy link
Contributor Author

Eric5553 commented Jan 8, 2020

@maropu Thanks for the detailed review!

@SparkQA
Copy link

SparkQA commented Jan 8, 2020

Test build #116303 has finished for PR 26977 at commit ab82c41.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 8, 2020

Test build #116305 has finished for PR 26977 at commit 259d12d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Eric5553
Copy link
Contributor Author

@cloud-fan @viirya @gatorsmile Would you please revisit the changes, thanks!

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

LGTM except a comment about the message

@gatorsmile
Copy link
Member

cc @cloud-fan @maryannxue

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118055 has finished for PR 26977 at commit 259d12d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118064 has finished for PR 26977 at commit c80773f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 9, 2020

Test build #118084 has finished for PR 26977 at commit 177adc8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val message = s"Max iterations (${iteration - 1}) reached for batch ${batch.name}"
if (Utils.isTesting) {
val message = s"Max iterations (${iteration - 1}) reached for batch ${batch.name}, " +
s"increasing the value of '${SQLConf.ANALYZER_MAX_ITERATIONS.key}'."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this change is not right. The optimizer will also issue this message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opps, sorry.
I updated the logic in cdc4879. Extend abstract class Strategy to support setting different hint setting key in Analyzer or Optimizer.
Would you please take time to review? Thanks!

@gatorsmile
Copy link
Member

add to whitelist

val message = batch.strategy.maxIterationsSetting match {
case setting: String if setting != null =>
s"Max iterations (${iteration - 1}) reached for batch ${batch.name}, " +
s"increasing the value of '${setting}'."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please set '${setting}' to a larget value

@@ -155,8 +168,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging {
if (iteration > batch.strategy.maxIterations) {
// Only log if this is a rule that is supposed to run more than once.
if (iteration != 2) {
val message = s"Max iterations (${iteration - 1}) reached for batch ${batch.name}"
if (Utils.isTesting) {
val message = batch.strategy.maxIterationsSetting match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

val endingMsg = if (batch.strategy.maxIterationsSetting == null) {
  "."
} else {
  ", please set '${setting}' to a larget value"
}
val message = s"Max iterations (${iteration - 1}) reached for batch ${batch.name}$endingMsg"

val conf = new SQLConf().copy(SQLConf.ANALYZER_MAX_ITERATIONS -> maxIterations)
val testAnalyzer = new Analyzer(
new SessionCatalog(new InMemoryCatalog, FunctionRegistry.builtin, conf),
new SQLConf().copy(SQLConf.ANALYZER_MAX_ITERATIONS -> maxIterations))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just pass conf

@Eric5553
Copy link
Contributor Author

@cloud-fan Thanks for the review! bff75bc was to address the comments.

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118108 has finished for PR 26977 at commit cdc4879.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class FixedPoint(

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118112 has finished for PR 26977 at commit cdc4879.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class FixedPoint(

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118130 has finished for PR 26977 at commit bff75bc.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118116 has finished for PR 26977 at commit bff75bc.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Eric5553
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118141 has finished for PR 26977 at commit bff75bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

This needs to be in 3.0. In Spark 2.4 the analyzer max iteration was controlled by SQLConf.OPTIMIZER_MAX_ITERATIONS. In 3.0, we add a new config for it, and it's possible that existing queries set SQLConf.OPTIMIZER_MAX_ITERATIONS to a large value and they fail analysis after upgrading to Spark 3.0.

With this PR, they can get a clear error message to set the new config.

Thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in b2011a2 Feb 10, 2020
@Eric5553
Copy link
Contributor Author

Thank you all !!

cloud-fan pushed a commit that referenced this pull request Feb 10, 2020
### What changes were proposed in this pull request?
Enhance RuleExecutor strategy to take different actions when exceeding max iterations. And raise exception if analyzer exceed max iterations.

### Why are the changes needed?
Currently, both analyzer and optimizer just log warning message if rule execution exceed max iterations. They should have different behavior. Analyzer should raise exception to indicates the plan is not fixed after max iterations, while optimizer just log warning to keep the current plan. This is more feasible after SPARK-30138 was introduced.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Add test in AnalysisSuite

Closes #26977 from Eric5553/EnhanceMaxIterations.

Authored-by: Eric Wu <492960551@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b2011a2)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@Eric5553 Eric5553 deleted the EnhanceMaxIterations branch March 13, 2020 06:50
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
Enhance RuleExecutor strategy to take different actions when exceeding max iterations. And raise exception if analyzer exceed max iterations.

### Why are the changes needed?
Currently, both analyzer and optimizer just log warning message if rule execution exceed max iterations. They should have different behavior. Analyzer should raise exception to indicates the plan is not fixed after max iterations, while optimizer just log warning to keep the current plan. This is more feasible after SPARK-30138 was introduced.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Add test in AnalysisSuite

Closes apache#26977 from Eric5553/EnhanceMaxIterations.

Authored-by: Eric Wu <492960551@qq.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants