-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30326][SQL] Raise exception if analyzer exceed max iterations #26977
Conversation
fb95d12
to
cfe5a5e
Compare
@cloud-fan @dongjoon-hyun @HyukjinKwon |
sql/core/src/test/scala/org/apache/spark/sql/AnalyzerRuleStrategySuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
If the analyzer hits max iteration, and the plan is resolved, shall we log warning or fail? |
@cloud-fan Good catch! IMO, literally the description I tried to fix it in 015e97218b60da907e49ad9bebb208a276a13354, and also added a corresponding test case in RuleExecutorSuite. The case will fail(raise exception) with original code. What's your opinion? Please correct me if I'm wrong. Thanks! |
@cloud-fan @maropu @viirya Would you please help review the latest change? Thanks! |
I'm currently not sure that this feature is useful... If a logical plan has unresolved nodes for that case, I think Spark should throw an analysis exception in |
@maropu I think the exception threw by CheckAnalysis is confusing to end users. The unresolved plans are not caused by the user queries. Instead, they need to increase the value of SQLConf.ANALYZER_MAX_ITERATIONS. |
in this case, keep the existing behavior? |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
Outdated
Show resolved
Hide resolved
Sure, I've reverted the code to keep existing behavior for this case. Thanks for review! @gatorsmile |
@cloud-fan @maropu @viirya Would you please help take a look? Thanks! |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisTest.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
Show resolved
Hide resolved
ok to test |
Looks fine now. I'll leave this to the other more qualified reviewers. @cloud-fan @viirya |
@maropu Thanks for the detailed review! |
Test build #116303 has finished for PR 26977 at commit
|
Test build #116305 has finished for PR 26977 at commit
|
@cloud-fan @viirya @gatorsmile Would you please revisit the changes, thanks! |
retest this please |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
Show resolved
Hide resolved
LGTM except a comment about the message |
Test build #118055 has finished for PR 26977 at commit
|
259d12d
to
c80773f
Compare
Test build #118064 has finished for PR 26977 at commit
|
Test build #118084 has finished for PR 26977 at commit
|
val message = s"Max iterations (${iteration - 1}) reached for batch ${batch.name}" | ||
if (Utils.isTesting) { | ||
val message = s"Max iterations (${iteration - 1}) reached for batch ${batch.name}, " + | ||
s"increasing the value of '${SQLConf.ANALYZER_MAX_ITERATIONS.key}'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this change is not right. The optimizer will also issue this message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opps, sorry.
I updated the logic in cdc4879. Extend abstract class Strategy
to support setting different hint setting key in Analyzer or Optimizer.
Would you please take time to review? Thanks!
add to whitelist |
val message = batch.strategy.maxIterationsSetting match { | ||
case setting: String if setting != null => | ||
s"Max iterations (${iteration - 1}) reached for batch ${batch.name}, " + | ||
s"increasing the value of '${setting}'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please set '${setting}' to a larget value
@@ -155,8 +168,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { | |||
if (iteration > batch.strategy.maxIterations) { | |||
// Only log if this is a rule that is supposed to run more than once. | |||
if (iteration != 2) { | |||
val message = s"Max iterations (${iteration - 1}) reached for batch ${batch.name}" | |||
if (Utils.isTesting) { | |||
val message = batch.strategy.maxIterationsSetting match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
val endingMsg = if (batch.strategy.maxIterationsSetting == null) {
"."
} else {
", please set '${setting}' to a larget value"
}
val message = s"Max iterations (${iteration - 1}) reached for batch ${batch.name}$endingMsg"
val conf = new SQLConf().copy(SQLConf.ANALYZER_MAX_ITERATIONS -> maxIterations) | ||
val testAnalyzer = new Analyzer( | ||
new SessionCatalog(new InMemoryCatalog, FunctionRegistry.builtin, conf), | ||
new SQLConf().copy(SQLConf.ANALYZER_MAX_ITERATIONS -> maxIterations)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just pass conf
@cloud-fan Thanks for the review! bff75bc was to address the comments. |
Test build #118108 has finished for PR 26977 at commit
|
Test build #118112 has finished for PR 26977 at commit
|
Test build #118130 has finished for PR 26977 at commit
|
Test build #118116 has finished for PR 26977 at commit
|
retest this please |
Test build #118141 has finished for PR 26977 at commit
|
This needs to be in 3.0. In Spark 2.4 the analyzer max iteration was controlled by With this PR, they can get a clear error message to set the new config. Thanks, merging to master/3.0! |
Thank you all !! |
### What changes were proposed in this pull request? Enhance RuleExecutor strategy to take different actions when exceeding max iterations. And raise exception if analyzer exceed max iterations. ### Why are the changes needed? Currently, both analyzer and optimizer just log warning message if rule execution exceed max iterations. They should have different behavior. Analyzer should raise exception to indicates the plan is not fixed after max iterations, while optimizer just log warning to keep the current plan. This is more feasible after SPARK-30138 was introduced. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Add test in AnalysisSuite Closes #26977 from Eric5553/EnhanceMaxIterations. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b2011a2) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Enhance RuleExecutor strategy to take different actions when exceeding max iterations. And raise exception if analyzer exceed max iterations. ### Why are the changes needed? Currently, both analyzer and optimizer just log warning message if rule execution exceed max iterations. They should have different behavior. Analyzer should raise exception to indicates the plan is not fixed after max iterations, while optimizer just log warning to keep the current plan. This is more feasible after SPARK-30138 was introduced. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Add test in AnalysisSuite Closes apache#26977 from Eric5553/EnhanceMaxIterations. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Enhance RuleExecutor strategy to take different actions when exceeding max iterations. And raise exception if analyzer exceed max iterations.
Why are the changes needed?
Currently, both analyzer and optimizer just log warning message if rule execution exceed max iterations. They should have different behavior. Analyzer should raise exception to indicates the plan is not fixed after max iterations, while optimizer just log warning to keep the current plan. This is more feasible after SPARK-30138 was introduced.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add test in AnalysisSuite