Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48197][SQL] Avoid assert error for invalid lambda function #46475

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

ExpressionBuilder asserts all its input expressions to be resolved during lookup, which is not true as the analyzer rule ResolveFunctions can trigger function lookup even if the input expression contains unresolved lambda functions.

This PR updates that assert to check non-lambda inputs only, and fail earlier if the input contains lambda functions. In the future, if we use ExpressionBuilder to register higher-order functions, we can relax it.

Why are the changes needed?

better error message

Does this PR introduce any user-facing change?

no, only changes error message

How was this patch tested?

new test

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label May 8, 2024
@dongjoon-hyun
Copy link
Member

Is the UT failure relevant, @cloud-fan ?

[info] *** 1 TEST FAILED ***
[error] Failed: Total 10583, Failed 1, Errors 0, Passed 10582, Ignored 29
[error] Failed tests:
[error] 	org.apache.spark.sql.execution.python.PythonStreamingDataSourceSuite

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

throw new AnalysisException(
errorClass = "INVALID_LAMBDA_FUNCTION_CALL.NON_HIGHER_ORDER_FUNCTION",
messageParameters = Map(
"class" -> builder.getClass.getCanonicalName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious is there a way to use org.apache.spark.sql.catalyst.expressions.Ceil instead of
"class" : "org.apache.spark.sql.catalyst.expressions.CeilExpressionBuilder$"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, a builder can even create different expressions based on the input data types.

@cloud-fan
Copy link
Contributor Author

thanks for review, merging to master/3.5!

@cloud-fan cloud-fan closed this in 7e79e91 May 9, 2024
cloud-fan added a commit that referenced this pull request May 9, 2024
### What changes were proposed in this pull request?

`ExpressionBuilder` asserts all its input expressions to be resolved during lookup, which is not true as the analyzer rule `ResolveFunctions` can trigger function lookup even if the input expression contains unresolved lambda functions.

This PR updates that assert to check non-lambda inputs only, and fail earlier if the input contains lambda functions. In the future, if we use `ExpressionBuilder` to register higher-order functions, we can relax it.

### Why are the changes needed?

better error message

### Does this PR introduce _any_ user-facing change?

no, only changes error message

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46475 from cloud-fan/minor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 7e79e91)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this seems to break branch-3.5. Could you take a look at this, @cloud-fan and @allisonwang-db ?

[info] - ansi/higher-order-functions.sql *** FAILED *** (782 milliseconds)
[info]   ansi/higher-order-functions.sql
[info]   Expected "...ORDER_FUNCTION",
[info]     "[sqlState" : "42K0D",
[info]     "]messageParameters" :...", but got "...ORDER_FUNCTION",
[info]     "[]messageParameters" :..." Result did not match for query #2
[info]   select ceil(x -> x) as v (SQLQueryTestSuite.scala:876)

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 9, 2024

It seems that there are four failures at Java 17 at least. Let me regenerate them.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun added a commit that referenced this pull request May 9, 2024
### What changes were proposed in this pull request?

This PR is a follow-up to regenerate golden files for branch-3.5
- #46475

### Why are the changes needed?

To recover branch-3.5 CI.
- https://github.com/apache/spark/actions/runs/9011670853/job/24786397001
```
[info] *** 4 TESTS FAILED ***
[error] Failed: Total 3036, Failed 4, Errors 0, Passed 3032, Ignored 3
[error] Failed tests:
[error] 	org.apache.spark.sql.SQLQueryTestSuite
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46514 from dongjoon-hyun/SPARK-48197.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
### What changes were proposed in this pull request?

`ExpressionBuilder` asserts all its input expressions to be resolved during lookup, which is not true as the analyzer rule `ResolveFunctions` can trigger function lookup even if the input expression contains unresolved lambda functions.

This PR updates that assert to check non-lambda inputs only, and fail earlier if the input contains lambda functions. In the future, if we use `ExpressionBuilder` to register higher-order functions, we can relax it.

### Why are the changes needed?

better error message

### Does this PR introduce _any_ user-facing change?

no, only changes error message

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#46475 from cloud-fan/minor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants