Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN #36763

Closed
wants to merge 1 commit into from

Conversation

karenfeng
Copy link
Contributor

What changes were proposed in this pull request?

Follows up from #31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns.

Why are the changes needed?

Duplicated, hidden columns should not be output from a star expansion.

Does this PR introduce any user-facing change?

The query

val df1 = Seq((3, 8)).toDF("a", "b") 
val df2 = Seq((8, 7)).toDF("b", "d") 
val joinDF = df1.join(df2, "b")
joinDF.alias("r").select("r.*")

Now outputs a single column b, instead of two (duplicate) columns for b.

How was this patch tested?

UTs

Signed-off-by: Karen Feng <karen.feng@databricks.com>
@github-actions github-actions bot added the SQL label Jun 3, 2022
cloud-fan pushed a commit that referenced this pull request Jun 6, 2022
…ery alias from NATURAL/USING JOIN

### What changes were proposed in this pull request?

Follows up from #31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns.

### Why are the changes needed?

Duplicated, hidden columns should not be output from a star expansion.

### Does this PR introduce _any_ user-facing change?

The query

```
val df1 = Seq((3, 8)).toDF("a", "b")
val df2 = Seq((8, 7)).toDF("b", "d")
val joinDF = df1.join(df2, "b")
joinDF.alias("r").select("r.*")
```

Now outputs a single column `b`, instead of two (duplicate) columns for `b`.

### How was this patch tested?

UTs

Closes #36763 from karenfeng/SPARK-39376.

Authored-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 18ca369)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this in 18ca369 Jun 6, 2022
cloud-fan pushed a commit that referenced this pull request Jun 6, 2022
…ery alias from NATURAL/USING JOIN

### What changes were proposed in this pull request?

Follows up from #31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns.

### Why are the changes needed?

Duplicated, hidden columns should not be output from a star expansion.

### Does this PR introduce _any_ user-facing change?

The query

```
val df1 = Seq((3, 8)).toDF("a", "b")
val df2 = Seq((8, 7)).toDF("b", "d")
val joinDF = df1.join(df2, "b")
joinDF.alias("r").select("r.*")
```

Now outputs a single column `b`, instead of two (duplicate) columns for `b`.

### How was this patch tested?

UTs

Closes #36763 from karenfeng/SPARK-39376.

Authored-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…ery alias from NATURAL/USING JOIN

### What changes were proposed in this pull request?

Follows up from apache#31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns.

### Why are the changes needed?

Duplicated, hidden columns should not be output from a star expansion.

### Does this PR introduce _any_ user-facing change?

The query

```
val df1 = Seq((3, 8)).toDF("a", "b")
val df2 = Seq((8, 7)).toDF("b", "d")
val joinDF = df1.join(df2, "b")
joinDF.alias("r").select("r.*")
```

Now outputs a single column `b`, instead of two (duplicate) columns for `b`.

### How was this patch tested?

UTs

Closes apache#36763 from karenfeng/SPARK-39376.

Authored-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants