Skip to content

Conversation

@dusantism-db
Copy link
Contributor

What changes were proposed in this pull request?

In this PR, CASE statement is refactored. Existing CaseStatement is split into two - SimpleCaseStatement and SearchedCaseStatement. SearchedCaseStatement retains the old behavior, while for SimpleCaseStatement a new logical and execution node are added - SimpleCaseStatement and SimpleCaseStatementExec.

Previously, a simple case statement would evaluate the case variable again for every WHEN clause in the CASE. This is both inefficient, and could produce unexpected behavior if the evaluation has a side effect. SimpleCaseStatementExec now caches the result of the evaluation, and compares the WHEN conditions to the cached Literal.

Why are the changes needed?

Previous iteration of simple CASE evaluated the expression multiple times.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Tests were added to SqlScriptingParserSuite, SqlScriptingInterpreterSuite and SqlScriptingExecutionNodeSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Feb 20, 2025
* @param elseBody Body that is executed if none of the conditions are met, i.e. ELSE branch.
*/
case class CaseStatement(
case class SearchedCaseStatement(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update the description comment, to be equivalent of "SIMPLE variant" as below. Also, we might include a small examples to illustrate which is which, I always get confused...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

override def getTreeIterator: Iterator[CompoundStatementExec] = treeIterator

override def reset(): Unit = {
state = CaseState.Condition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caseVariableExec.reset() should be added as well? I guess it's not important, but it will reset the isExecuted flag if it's ever important for anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, added it.

Comment on lines 692 to 694
Origin(sqlText = Some(conditionText),
startIndex = Some(0),
stopIndex = Some(conditionText.length - 1)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of this, but won't it screw-up the line number in the error message for example? Shall we add some test cases to the SqlScriptingExecutionSuite to cover exceptions and similar?

Copy link
Contributor Author

@dusantism-db dusantism-db Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tests for these exceptions, they are in SqlScriptingInterpreterSuite.
You're right it will screw up line numbers, perhaps it would be better to copy the Origin from the original caseVariable expression. This will keep the proper line number, however would just point to one part of the equality expression.

For example, if we have this script:

BEGIN
  CASE 1
    WHEN NULL THEN
      SELECT 41;
    ELSE
      SELECT 43;
  END CASE;
END

The error we would have if we kept the Origin from case variable expression:
{LINE:3} [BOOLEAN_STATEMENT_WITH_EMPTY_ROW] Boolean statement 1 is invalid. Expected single row with a value of the BOOLEAN type, but got an empty row. SQLSTATE: 21000

The error with the hacked origin:
[BOOLEAN_STATEMENT_WITH_EMPTY_ROW] Boolean statement (1 = NULL) is invalid. Expected single row with a value of the BOOLEAN type, but got an empty row. SQLSTATE: 21000

Not sure which is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could expand the hack to copy the just line number from caseVariableExec? That way we would have both the line number and the proper equality expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an easy fix (and it looks like it is) I would definitely go with this approach (copying the line number to the hacked origin). If not, I think it's better to keep the line number and then improve the messaging later on as a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the line propagation - d06ab73

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI passes

Copy link
Contributor

@davidm-db davidm-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comment left regarding the exception messaging.

@cloud-fan
Copy link
Contributor

thanks, merging to master/4.0!

@cloud-fan cloud-fan closed this in 48fc0fb Feb 24, 2025
cloud-fan pushed a commit that referenced this pull request Feb 24, 2025
…iable only once

### What changes were proposed in this pull request?
In this PR, CASE statement is refactored. Existing `CaseStatement` is split into two - `SimpleCaseStatement` and `SearchedCaseStatement`. `SearchedCaseStatement` retains the old behavior, while for `SimpleCaseStatement` a new logical and execution node are added - `SimpleCaseStatement` and `SimpleCaseStatementExec`.

Previously, a simple case statement would evaluate the case variable again for every WHEN clause in the CASE. This is both inefficient, and could produce unexpected behavior if the evaluation has a side effect. `SimpleCaseStatementExec` now caches the result of the evaluation, and compares the WHEN conditions to the cached `Literal`.

### Why are the changes needed?
Previous iteration of simple CASE evaluated the expression multiple times.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tests were added to `SqlScriptingParserSuite`, `SqlScriptingInterpreterSuite` and `SqlScriptingExecutionNodeSuite`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #50027 from dusantism-db/scripting-case-improvements-v2.

Authored-by: Dušan Tišma <dusan.tisma@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 48fc0fb)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…iable only once

### What changes were proposed in this pull request?
In this PR, CASE statement is refactored. Existing `CaseStatement` is split into two - `SimpleCaseStatement` and `SearchedCaseStatement`. `SearchedCaseStatement` retains the old behavior, while for `SimpleCaseStatement` a new logical and execution node are added - `SimpleCaseStatement` and `SimpleCaseStatementExec`.

Previously, a simple case statement would evaluate the case variable again for every WHEN clause in the CASE. This is both inefficient, and could produce unexpected behavior if the evaluation has a side effect. `SimpleCaseStatementExec` now caches the result of the evaluation, and compares the WHEN conditions to the cached `Literal`.

### Why are the changes needed?
Previous iteration of simple CASE evaluated the expression multiple times.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tests were added to `SqlScriptingParserSuite`, `SqlScriptingInterpreterSuite` and `SqlScriptingExecutionNodeSuite`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50027 from dusantism-db/scripting-case-improvements-v2.

Authored-by: Dušan Tišma <dusan.tisma@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8c70b3f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants