Skip to content

[SPARK-48346][SQL] Support for IF ELSE statements in SQL scripts#47442

Closed
davidm-db wants to merge 10 commits intoapache:masterfrom
davidm-db:sql_scripting_if_else
Closed

[SPARK-48346][SQL] Support for IF ELSE statements in SQL scripts#47442
davidm-db wants to merge 10 commits intoapache:masterfrom
davidm-db:sql_scripting_if_else

Conversation

@davidm-db
Copy link
Contributor

@davidm-db davidm-db commented Jul 22, 2024

What changes were proposed in this pull request?

This PR proposes introduction of IF/ELSE statement to SQL scripting language.
To evaluate conditions in IF or ELSE IF clauses, introduction of boolean statement evaluator is required as well.

Changes summary:

  • Grammar/parser changes:
    • ifElseStatement grammar rule
    • visitIfElseStatement rule visitor
    • IfElseStatement logical operator
  • IfElseStatementExec execution node:
    • Internal states - Condition and Body
    • Iterator implementation - iterate over conditions until the one that evaluates to true is found
    • Use StatementBooleanEvaluator implementation to evaluate conditions
  • DataFrameEvaluator:
    • Implementation of StatementBooleanEvaluator
    • Evaluates results to true if it is single row, single column of boolean type with value true
  • SqlScriptingInterpreter - add logic to transform IfElseStatement to IfElseStatementExec

Why are the changes needed?

We are gradually introducing SQL Scripting to Spark, and IF/ELSE is one of the basic control flow constructs in the SQL language. For more details, check JIRA item.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests are introduced to all of the three scripting test suites: SqlScriptingParserSuite, SqlScriptingExecutionNodeSuite and SqlScriptingInterpreterSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jul 22, 2024
@davidm-db davidm-db changed the title [WIP][SPARK-48346][SQL] Support for IF ELSE statements in SQL scripts [SPARK-48346][SQL] Support for IF ELSE statements in SQL scripts Jul 23, 2024
@davidm-db
Copy link
Contributor Author

@cloud-fan are there any leftover open questions here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we create a new testing SingleStatement that we can customize the boolean result? I don't see the point of StatementBooleanEvaluator.

Copy link
Contributor Author

@davidm-db davidm-db Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would I do that exactly? that would require logic for if/else execution node to be re-implemented for testing as well? because in the if/else iterator, in case we are visiting the condition, we are invoking the evaluator to evaluate the condition to boolean. if we were not to have the evaluator in tests, we would need to change the iteration logic for the tests as well?

we could create a statement whose logical plan is some simple projection of true/false value and that would be fine for now, but it seems like it would be too complicated to extend that logic for the loops for example, where we need to return true few times before returning false - which is quite simple with RepEval evaluator introduced here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should introduce this heavy concept of evaluator for test only. To test loop, we can create a statement that selects a session variable, and in the loop body we modify the session variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the idea of evaluator is nice, but please do it when we have a proper design for debugging/auditing/profiling/... that needs an evaluator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we understand each other here - StatementBooleanEvaluator is NOT used only in tests, it is used in IfElseStatementExec to evaluate the statement in condition to true or false, based on the output format and values.

For unit tests in SqlScriptingExecutionNodeSuite this is overridden with RepEval (for now, more will be added with while loops). RepEval is a simple and a bit hardcoded implementation for the sake of the tests.

If I were to change the tests to not use the evaluator, I don't know how would I change the evaluator used in IfElseStatementExec and have the same logic applied (i.e. checking the statement output format and values). This evaluator approach seems like the cleanest one - we have removed any kind of abstractions and we will introduce it in the future if needed, but I don't see a way how to do this simpler now.

Any suggestions or further comments on the topic? Everything that I can come up with is to actually execute the statement and take first row/first column value from the collected DataFrame and check if it's true or false - but this is literally what evaluator is doing at the moment (with a few format checks as well).

If we decide to keep the evaluator, it also makes tests simpler - we simply create a dummy evaluator (RepEval) instead of adding logic to the while loops that adds statements that work with session variables etc. Evaluator implementation for tests is literally 5 lines whereas any other logic would require adjusting on the level of each separate test (i.e. creating session vars and adding statements that set their values based on the number of loop iterations that we want to achieve).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make the test code simpler, but it complicates the abstraction. Can't we inline the code of StatementBooleanEvaluator into IfElseStatementExec, and use real queries as conditions in the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what to do then when we introduce while (immediately after this PR) and other loops (later on)? condition evaluation will share the same logic, it doesn't seem nice to inline it in all the places? we could implement a function for evaluation in NonLeafStatementExec - it doesn't seem like it belongs there logically, but we can do that... or does something else make more sense to you?

Copy link
Contributor Author

@davidm-db davidm-db Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I did what I said in last comment - created evaluateBooleanCondition in NonLeafStatementExec to be reused by other exec nodes that will need to evaluate conditions (switch/case and loops) and removed StatementBooleanEvaluator completely. Sorted the tests for now with dummy TestIfElseCondition - will figure out the similar approach for loops, but this seems fine I would say.

Can you please check if this is conforming to what you had in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need TestIfElse? Just to change the declared type of conditionalBodies and elseBody?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - it's simply a leftover from before before we simplified tests a lot... I've actually removed all Test* implementations of non leaf statements - we need to override only leaf statements for testing purposes. Abstractions are done properly, so real implementations of execution nodes from SqlScriptingExecutionNode can handle dummy implementations of leaf statements in tests. Should be really clean now, I think...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid side effect of tests, let's wrap the test code with withTable to drop the table at the end of test.

Copy link
Contributor Author

@davidm-db davidm-db Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did wrap up tests with withTable("t") - though not all of them, only the tests that work with the table t actually.
didn't I do it properly?

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general

@davidm-db davidm-db force-pushed the sql_scripting_if_else branch from 439fda3 to 014a915 Compare August 3, 2024 01:29
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f99291a Aug 5, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this, @davidm-db and @cloud-fan .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants