[SPARK-48346][SQL] Support for IF ELSE statements in SQL scripts#47442
[SPARK-48346][SQL] Support for IF ELSE statements in SQL scripts#47442davidm-db wants to merge 10 commits intoapache:masterfrom
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala
Outdated
Show resolved
Hide resolved
|
@cloud-fan are there any leftover open questions here? |
...alyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingParserSuite.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
can't we create a new testing SingleStatement that we can customize the boolean result? I don't see the point of StatementBooleanEvaluator.
There was a problem hiding this comment.
how would I do that exactly? that would require logic for if/else execution node to be re-implemented for testing as well? because in the if/else iterator, in case we are visiting the condition, we are invoking the evaluator to evaluate the condition to boolean. if we were not to have the evaluator in tests, we would need to change the iteration logic for the tests as well?
we could create a statement whose logical plan is some simple projection of true/false value and that would be fine for now, but it seems like it would be too complicated to extend that logic for the loops for example, where we need to return true few times before returning false - which is quite simple with RepEval evaluator introduced here.
There was a problem hiding this comment.
I don't think we should introduce this heavy concept of evaluator for test only. To test loop, we can create a statement that selects a session variable, and in the loop body we modify the session variable.
There was a problem hiding this comment.
I do think the idea of evaluator is nice, but please do it when we have a proper design for debugging/auditing/profiling/... that needs an evaluator.
There was a problem hiding this comment.
I'm not sure if we understand each other here - StatementBooleanEvaluator is NOT used only in tests, it is used in IfElseStatementExec to evaluate the statement in condition to true or false, based on the output format and values.
For unit tests in SqlScriptingExecutionNodeSuite this is overridden with RepEval (for now, more will be added with while loops). RepEval is a simple and a bit hardcoded implementation for the sake of the tests.
If I were to change the tests to not use the evaluator, I don't know how would I change the evaluator used in IfElseStatementExec and have the same logic applied (i.e. checking the statement output format and values). This evaluator approach seems like the cleanest one - we have removed any kind of abstractions and we will introduce it in the future if needed, but I don't see a way how to do this simpler now.
Any suggestions or further comments on the topic? Everything that I can come up with is to actually execute the statement and take first row/first column value from the collected DataFrame and check if it's true or false - but this is literally what evaluator is doing at the moment (with a few format checks as well).
If we decide to keep the evaluator, it also makes tests simpler - we simply create a dummy evaluator (RepEval) instead of adding logic to the while loops that adds statements that work with session variables etc. Evaluator implementation for tests is literally 5 lines whereas any other logic would require adjusting on the level of each separate test (i.e. creating session vars and adding statements that set their values based on the number of loop iterations that we want to achieve).
There was a problem hiding this comment.
It might make the test code simpler, but it complicates the abstraction. Can't we inline the code of StatementBooleanEvaluator into IfElseStatementExec, and use real queries as conditions in the test?
There was a problem hiding this comment.
what to do then when we introduce while (immediately after this PR) and other loops (later on)? condition evaluation will share the same logic, it doesn't seem nice to inline it in all the places? we could implement a function for evaluation in NonLeafStatementExec - it doesn't seem like it belongs there logically, but we can do that... or does something else make more sense to you?
There was a problem hiding this comment.
@cloud-fan I did what I said in last comment - created evaluateBooleanCondition in NonLeafStatementExec to be reused by other exec nodes that will need to evaluate conditions (switch/case and loops) and removed StatementBooleanEvaluator completely. Sorted the tests for now with dummy TestIfElseCondition - will figure out the similar approach for loops, but this seems fine I would say.
Can you please check if this is conforming to what you had in mind?
There was a problem hiding this comment.
why do we need TestIfElse? Just to change the declared type of conditionalBodies and elseBody?
There was a problem hiding this comment.
good catch - it's simply a leftover from before before we simplified tests a lot... I've actually removed all Test* implementations of non leaf statements - we need to override only leaf statements for testing purposes. Abstractions are done properly, so real implementations of execution nodes from SqlScriptingExecutionNode can handle dummy implementations of leaf statements in tests. Should be really clean now, I think...
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
to avoid side effect of tests, let's wrap the test code with withTable to drop the table at the end of test.
There was a problem hiding this comment.
I did wrap up tests with withTable("t") - though not all of them, only the tests that work with the table t actually.
didn't I do it properly?
439fda3 to
014a915
Compare
|
thanks, merging to master! |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Thank you for adding this, @davidm-db and @cloud-fan .
What changes were proposed in this pull request?
This PR proposes introduction of IF/ELSE statement to SQL scripting language.
To evaluate conditions in IF or ELSE IF clauses, introduction of boolean statement evaluator is required as well.
Changes summary:
ifElseStatementgrammar rulevisitIfElseStatementrule visitorIfElseStatementlogical operatorIfElseStatementExecexecution node:ConditionandBodytrueis foundStatementBooleanEvaluatorimplementation to evaluate conditionsDataFrameEvaluator:StatementBooleanEvaluatortrueif it is single row, single column of boolean type with valuetrueSqlScriptingInterpreter- add logic to transformIfElseStatementtoIfElseStatementExecWhy are the changes needed?
We are gradually introducing SQL Scripting to Spark, and IF/ELSE is one of the basic control flow constructs in the SQL language. For more details, check JIRA item.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New tests are introduced to all of the three scripting test suites:
SqlScriptingParserSuite,SqlScriptingExecutionNodeSuiteandSqlScriptingInterpreterSuite.Was this patch authored or co-authored using generative AI tooling?
No.