Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37716][SQL] Improve error messages when a LateralJoin has non-deterministic expressions #34987

Closed

Conversation

allisonwang-db
Copy link
Contributor

@allisonwang-db allisonwang-db commented Dec 22, 2021

What changes were proposed in this pull request?

This PR allows the LateralJoin node's lateral subquery field to host non-deterministic expressions when the outer relation can produce at most one row. It also improves the error messages when a lateral join contains non-deterministic expressions that are not currently supported.

Why are the changes needed?

SPARK-37199 changes PlanExpression's deterministic field definition: both the children and the plan itself have to be deterministic for the plan expression to be deterministic. So users can no longer use lateral join with non-deterministic lateral subqueries. This PR is to improve the error messages and allows a special case when the outer query only produces at most one row.

Does this PR introduce any user-facing change?

Yes. Improve error messages:
Before:

org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window

After

Non-deterministic lateral subqueries are not supported when joining with outer relations that produce more than one row
-- Or
Lateral join condition cannot be non-deterministic:

How was this patch tested?

SQL query tests.

@github-actions github-actions bot added the SQL label Dec 22, 2021
@SparkQA
Copy link

SparkQA commented Dec 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50968/

@SparkQA
Copy link

SparkQA commented Dec 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50970/

@SparkQA
Copy link

SparkQA commented Dec 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50968/

@SparkQA
Copy link

SparkQA commented Dec 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50970/

@SparkQA
Copy link

SparkQA commented Dec 22, 2021

Test build #146492 has finished for PR 34987 at commit 58c3863.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@allisonwang-db
Copy link
Contributor Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Dec 23, 2021

Test build #146494 has finished for PR 34987 at commit 3c14c21.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@allisonwang-db allisonwang-db changed the title [SPARK-37716][SQL] Allow LateralJoin node to host non-deterministic expressions [SPARK-37716][SQL] Improve error messages when a LateralJoin has non-deterministic expressions Dec 23, 2021
@SparkQA
Copy link

SparkQA commented Dec 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51016/

@SparkQA
Copy link

SparkQA commented Dec 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51016/

@SparkQA
Copy link

SparkQA commented Dec 24, 2021

Test build #146541 has finished for PR 34987 at commit 4822a61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

This PR doesn't touch pyspark at all, and the github action failure is unrelated

starting mypy annotations test...
annotations failed mypy checks:

@cloud-fan
Copy link
Contributor

merging to master, thanks!

@cloud-fan cloud-fan closed this in 67c39a0 Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants