New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43780][SQL] Support correlated references in join predicates for scalar and lateral subqueries #41301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks good overall, added some comments inline.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuerySuite.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! @cloud-fan can you also take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Can we mention in the PR title or comment that this currently only works for scalar and lateral subqueries?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
.../test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-predicate.sql.out
Show resolved
Hide resolved
cc @cloud-fan |
The failure is unrelated, thanks, merging to master! |
.internal() | ||
.doc("Decorrelate scalar and lateral subqueries with correlated references in join " + | ||
"predicates. This configuration is only effective when " + | ||
"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed string interpolation, here is the PR w/ the fix: #42607
…or scalar and lateral subqueries ### What changes were proposed in this pull request? This PR adds support to subqueries that involve joins with correlated references in join predicates, e.g. ``` select * from t0 join lateral (select * from t1 join t2 on t1a = t2a and t1a = t0a); ``` (full example in https://issues.apache.org/jira/browse/SPARK-43780) Currently we only handle scalar and lateral subqueries. ### Why are the changes needed? This is a valid SQL that is not yet supported by Spark SQL. ### Does this PR introduce _any_ user-facing change? Yes, previously unsupported queries become supported. ### How was this patch tested? Query and unit tests Closes apache#41301 from agubichev/spark-43780-corr-predicate. Authored-by: Andrey Gubichev <andrey.gubichev@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…or scalar and lateral subqueries ### What changes were proposed in this pull request? This PR adds support to subqueries that involve joins with correlated references in join predicates, e.g. ``` select * from t0 join lateral (select * from t1 join t2 on t1a = t2a and t1a = t0a); ``` (full example in https://issues.apache.org/jira/browse/SPARK-43780) Currently we only handle scalar and lateral subqueries. ### Why are the changes needed? This is a valid SQL that is not yet supported by Spark SQL. ### Does this PR introduce _any_ user-facing change? Yes, previously unsupported queries become supported. ### How was this patch tested? Query and unit tests Closes apache#41301 from agubichev/spark-43780-corr-predicate. Authored-by: Andrey Gubichev <andrey.gubichev@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR adds support to subqueries that involve joins with correlated references in join predicates, e.g.
(full example in https://issues.apache.org/jira/browse/SPARK-43780)
Currently we only handle scalar and lateral subqueries.
Why are the changes needed?
This is a valid SQL that is not yet supported by Spark SQL.
Does this PR introduce any user-facing change?
Yes, previously unsupported queries become supported.
How was this patch tested?
Query and unit tests