New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37716][SQL] Improve error messages when a LateralJoin has non-deterministic expressions #34987
[SPARK-37716][SQL] Improve error messages when a LateralJoin has non-deterministic expressions #34987
Conversation
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status failure |
Test build #146492 has finished for PR 34987 at commit
|
cc @cloud-fan |
Test build #146494 has finished for PR 34987 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #146541 has finished for PR 34987 at commit
|
This PR doesn't touch pyspark at all, and the github action failure is unrelated
|
merging to master, thanks! |
What changes were proposed in this pull request?
This PR allows the LateralJoin node's lateral subquery field to host non-deterministic expressions when the outer relation can produce at most one row. It also improves the error messages when a lateral join contains non-deterministic expressions that are not currently supported.
Why are the changes needed?
SPARK-37199 changes PlanExpression's
deterministic
field definition: both the children and the plan itself have to be deterministic for the plan expression to be deterministic. So users can no longer use lateral join with non-deterministic lateral subqueries. This PR is to improve the error messages and allows a special case when the outer query only produces at most one row.Does this PR introduce any user-facing change?
Yes. Improve error messages:
Before:
After
How was this patch tested?
SQL query tests.