Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43780][SQL] Support correlated references in join predicates for scalar and lateral subqueries #41301

Closed
wants to merge 9 commits into from

Conversation

agubichev
Copy link
Contributor

@agubichev agubichev commented May 24, 2023

What changes were proposed in this pull request?

This PR adds support to subqueries that involve joins with correlated references in join predicates, e.g.

select * from t0 join lateral (select * from t1 join t2 on t1a = t2a and t1a = t0a);

(full example in https://issues.apache.org/jira/browse/SPARK-43780)

Currently we only handle scalar and lateral subqueries.

Why are the changes needed?

This is a valid SQL that is not yet supported by Spark SQL.

Does this PR introduce any user-facing change?

Yes, previously unsupported queries become supported.

How was this patch tested?

Query and unit tests

@github-actions github-actions bot added the SQL label May 24, 2023
Copy link
Contributor

@jchen5 jchen5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good overall, added some comments inline.

@jchen5
Copy link
Contributor

jchen5 commented Jun 5, 2023

CC @cloud-fan @allisonwang-db

@agubichev agubichev requested a review from jchen5 July 18, 2023 16:41
Copy link
Contributor

@jchen5 jchen5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! @cloud-fan can you also take a look?

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Can we mention in the PR title or comment that this currently only works for scalar and lateral subqueries?

@agubichev agubichev changed the title [SPARK-43780][SQL] Support correlated references in join predicates [SPARK-43780][SQL] Support correlated references in join predicates for scalar and lateral subqueries Aug 10, 2023
@allisonwang-db
Copy link
Contributor

cc @cloud-fan

@cloud-fan
Copy link
Contributor

The failure is unrelated, thanks, merging to master!

@cloud-fan cloud-fan closed this in 420e687 Aug 15, 2023
.internal()
.doc("Decorrelate scalar and lateral subqueries with correlated references in join " +
"predicates. This configuration is only effective when " +
"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed string interpolation, here is the PR w/ the fix: #42607

valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…or scalar and lateral subqueries

### What changes were proposed in this pull request?

This PR adds support to subqueries that involve joins with correlated references in join predicates, e.g.

```
select * from t0 join lateral (select * from t1 join t2 on t1a = t2a and t1a = t0a);
```

(full example in https://issues.apache.org/jira/browse/SPARK-43780)

Currently we only handle scalar and lateral subqueries.

### Why are the changes needed?

This is a valid SQL that is not yet supported by Spark SQL.

### Does this PR introduce _any_ user-facing change?

Yes, previously unsupported queries become supported.

### How was this patch tested?

Query and unit tests

Closes apache#41301 from agubichev/spark-43780-corr-predicate.

Authored-by: Andrey Gubichev <andrey.gubichev@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
…or scalar and lateral subqueries

### What changes were proposed in this pull request?

This PR adds support to subqueries that involve joins with correlated references in join predicates, e.g.

```
select * from t0 join lateral (select * from t1 join t2 on t1a = t2a and t1a = t0a);
```

(full example in https://issues.apache.org/jira/browse/SPARK-43780)

Currently we only handle scalar and lateral subqueries.

### Why are the changes needed?

This is a valid SQL that is not yet supported by Spark SQL.

### Does this PR introduce _any_ user-facing change?

Yes, previously unsupported queries become supported.

### How was this patch tested?

Query and unit tests

Closes apache#41301 from agubichev/spark-43780-corr-predicate.

Authored-by: Andrey Gubichev <andrey.gubichev@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants