[SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN #31666

karenfeng · 2021-02-26T18:47:35Z

What changes were proposed in this pull request?

Adds the duplicated common columns as hidden columns to the Projection used to rewrite NATURAL/USING JOINs.

Why are the changes needed?

Allows users to resolve either side of the NATURAL/USING JOIN's common keys.
Previously, the user could only resolve the following columns:

Join type	Left key columns	Right key columns
Inner	Yes	No
Left	Yes	No
Right	No	Yes
Outer	No	No

Does this PR introduce any user-facing change?

Yes. The user can now symmetrically resolve the common columns from a NATURAL/USING JOIN.

How was this patch tested?

SQL-side tests. The behavior matches PostgreSQL and MySQL.

Signed-off-by: Karen Feng <karen.feng@databricks.com>

… column Signed-off-by: Karen Feng <karen.feng@databricks.com>

…4527 Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-02-27T00:39:07Z

Test build #135521 has finished for PR 31666 at commit 2c261bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
sealed trait PartitionSpec extends LeafExpression with Unevaluable
trait V2PartitionCommand extends Command
case class TruncateTable(table: LogicalPlan) extends Command
case class TruncatePartition(
case class TruncatePartitionExec(

Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-02-27T02:16:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40108/

SparkQA · 2021-02-27T02:24:47Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40108/

SparkQA · 2021-02-27T03:40:05Z

Test build #135527 has finished for PR 31666 at commit 80beda8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-02-27T09:55:35Z

Test build #135531 has finished for PR 31666 at commit e1719d3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…4527 Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-03-02T20:33:27Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40256/

SparkQA · 2021-03-02T20:41:49Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40256/

SparkQA · 2021-03-03T00:32:26Z

Test build #135674 has finished for PR 31666 at commit 6fa70ba.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class JavaModelSelectionViaRandomHyperparametersExample
class GangliaSink(
case class Limits[T: Numeric](x: T, y: T)
abstract class Generator[T: Numeric]
class ParamRandomBuilder extends ParamGridBuilder
class ParamRandomBuilder(ParamGridBuilder):
case class Product(child: Expression)
case class AnalyzeTables(
case class AnalyzeTablesCommand(

karenfeng · 2021-03-03T17:59:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala

@@ -94,6 +94,8 @@ trait AnalysisHelper extends QueryPlan[LogicalPlan] { self: LogicalPlan =>
            rule.applyOrElse(afterRuleOnChildren, identity[LogicalPlan])
          }
        }
+        newNode.copyTagsFrom(this)


This exists in transformUp, but not in resolveOperatorsUp - was the difference intentional or unintentional? Without the tags, the metadata cannot be resolved properly (isMetadataCol is always false).

I think it's a mistake.

karenfeng · 2021-03-03T18:03:29Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)
+      plan.expressions.flatMap(_.collect {
+        case a: Attribute if a.isMetadataCol => a
+        case a: Attribute if childMetadataOutput.exists(_.exprId == a.exprId) =>


This occurs in the case that a column is resolved below the level at which it becomes labeled as metadata. For the NATURAL/USING JOIN, this occurs when the column is resolved at the level of the root table - it is only labeled as hidden when it is used as a key column in the join.

cloud-fan · 2021-03-04T09:07:19Z

This fix applies in SQL but does not apply in Scala; this seems to be related to the metadata column framework in the DSv2 API.

This is still true? I think your previous bug fix PR solved it. We can add some tests to verify it (even if it fails, we need to show people the behavior of the Scala API)

karenfeng · 2021-03-04T15:52:26Z

This fix applies in SQL but does not apply in Scala; this seems to be related to the metadata column framework in the DSv2 API.

This is still true? I think your previous bug fix PR solved it. We can add some tests to verify it (even if it fails, we need to show people the behavior of the Scala API)

Whoops, I forgot to change the PR description. This no longer holds. Thanks for the catch @cloud-fan!

karenfeng · 2021-03-04T15:53:05Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala

+      testData3.as("testData3"), usingColumns = Seq("a"), joinType = "fullouter")
+    val dfQuery = joinDf.select(
+      $"a", $"testData2.a", $"testData2.b", $"testData3.a", $"testData3.b")
+    val dfQuery2 = joinDf.select(


These demonstrate that the behavior now works in Scala.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

cloud-fan · 2021-03-04T17:17:47Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -3370,54 +3435,6 @@ class Analyzer(override val catalogManager: CatalogManager)
    }
  }

-  private def commonNaturalJoinProcessing(


why do we move this method? It creates a lot of code diff and makes it harder to review.

I can move it back - I just wasn't sure why it lived outside of this class, given that it's not shared.

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala

sql/core/src/test/resources/sql-tests/inputs/natural-join.sql

cloud-fan · 2021-03-04T17:24:45Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

+
+  override def metadataOutput: Seq[Attribute] = {
+    child.metadataOutput ++
+      getTagValue(hiddenOutputTag).getOrElse(Seq.empty[Attribute])


It's unfortunate that we need to use TreeNodeTag to store the extra information in Project, but I don't have a better idea without changing the Project constructor.

We could make this more generic by adding this LogicalPlan's metadataOutput, but that would complicate how we can add these hidden columns in AddMetadataColumns.

Signed-off-by: Karen Feng <karen.feng@databricks.com>

…4527

Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-03-05T00:05:44Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40359/

SparkQA · 2021-04-13T04:52:49Z

Test build #137253 has finished for PR 31666 at commit 9e62d7d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-04-13T14:18:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -957,6 +946,36 @@ class Analyzer(override val catalogManager: CatalogManager)
          }
        }
    }
+
+    private def getMetadataAttributes(plan: LogicalPlan): Seq[Attribute] = {
+      lazy val childMetadataOutput = plan.children.flatMap(_.metadataOutput)


nit: we can avoid building a new Seq frequently. The check can be
plan.children.exists(c => c.metadataOutput.exists(_.exprId == a.exprId))

The same to hasMetadataCol

cloud-fan · 2021-04-13T14:21:30Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala

+
+  /**
+   * Hidden columns are a type of metadata column that are candidates during qualified star
+   * star expansions. They are propagated through Projects that have hidden children output,


The comment needs update again.

cloud-fan · 2021-04-13T14:30:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala

+   * star expansions. They are propagated through Projects that have hidden children output,
+   * so that nested hidden output is not lost.
+   */
+  val HIDDEN_COL_ATTR_KEY = "__hidden_col"


The semantic is clear now, let's refine the naming.

We only have metadata column, and metadata column can be included in qualified star if required. We can just add a new property to metadata columns to indicate it.

The property name can be __support_qualified_star, and the helper class can be

implicit class MetadataColumnHelper(attr: Attribute) { def isMetadataCol: Boolean ... def supportQualifiedStar: Boolean ... def markAsSupportQualifiedStar: Attribute ... }

cloud-fan

LGTM except for some minor comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

…4527 Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-04-13T19:23:56Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41881/

SparkQA · 2021-04-13T19:26:04Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41881/

SparkQA · 2021-04-13T22:44:49Z

Test build #137301 has finished for PR 31666 at commit 8f70c2d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class WriteToDataSourceV2(
case class WriteToMicroBatchDataSource(

cloud-fan · 2021-04-14T07:01:38Z

thanks, merging to master!

…ery alias from NATURAL/USING JOIN ### What changes were proposed in this pull request? Follows up from #31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns. ### Why are the changes needed? Duplicated, hidden columns should not be output from a star expansion. ### Does this PR introduce _any_ user-facing change? The query ``` val df1 = Seq((3, 8)).toDF("a", "b") val df2 = Seq((8, 7)).toDF("b", "d") val joinDF = df1.join(df2, "b") joinDF.alias("r").select("r.*") ``` Now outputs a single column `b`, instead of two (duplicate) columns for `b`. ### How was this patch tested? UTs Closes #36763 from karenfeng/SPARK-39376. Authored-by: Karen Feng <karen.feng@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 18ca369) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ery alias from NATURAL/USING JOIN ### What changes were proposed in this pull request? Follows up from #31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns. ### Why are the changes needed? Duplicated, hidden columns should not be output from a star expansion. ### Does this PR introduce _any_ user-facing change? The query ``` val df1 = Seq((3, 8)).toDF("a", "b") val df2 = Seq((8, 7)).toDF("b", "d") val joinDF = df1.join(df2, "b") joinDF.alias("r").select("r.*") ``` Now outputs a single column `b`, instead of two (duplicate) columns for `b`. ### How was this patch tested? UTs Closes #36763 from karenfeng/SPARK-39376. Authored-by: Karen Feng <karen.feng@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

timgautier · 2022-06-14T20:00:31Z

sql/core/src/test/resources/sql-tests/results/using-join.sql.out

+
+
+-- !query
+SELECT k FROM nt1 inner join nt2 using (k)


SELECT t.* FROM (SELECT k FROM nt1 inner join nt2 using (k)) as t
That used to return the same results, but now it returns the results of the query on line 302.

I should add that I haven't tested this exact scenario, but I have run into a very similar issue when I attempted to switch to spark 3.2.0. I'll open a bug report when I get a chance.

Have you tried #36763 ? The issue should have been fixed.

I did not know about that, thank you. Now I just have to wait for EMR to support Spark 3.2.2.

### What changes were proposed in this pull request? This PR fixes a regression caused by #32017 . In #32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of #32017 . After propagating metadata columns, a problem from #31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. ### Why are the changes needed? fix a regression ### Does this PR introduce _any_ user-facing change? For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. ### How was this patch tested? new tests Closes #37758 from cloud-fan/metadata. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

This PR fixes a regression caused by #32017 . In #32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of #32017 . After propagating metadata columns, a problem from #31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. fix a regression For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. new tests Closes #37758 from cloud-fan/metadata. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 99ae1d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

This PR fixes a regression caused by apache#32017 . In apache#32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of apache#32017 . After propagating metadata columns, a problem from apache#31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. fix a regression For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. new tests Closes apache#37758 from cloud-fan/metadata. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 99ae1d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

backport #37758 to 3.2 ### What changes were proposed in this pull request? This PR fixes a regression caused by #32017 . In #32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of #32017 . After propagating metadata columns, a problem from #31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. ### Why are the changes needed? fix a regression ### Does this PR introduce _any_ user-facing change? For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. ### How was this patch tested? new tests Closes #37818 from cloud-fan/backport. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ery alias from NATURAL/USING JOIN ### What changes were proposed in this pull request? Follows up from apache#31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns. ### Why are the changes needed? Duplicated, hidden columns should not be output from a star expansion. ### Does this PR introduce _any_ user-facing change? The query ``` val df1 = Seq((3, 8)).toDF("a", "b") val df2 = Seq((8, 7)).toDF("b", "d") val joinDF = df1.join(df2, "b") joinDF.alias("r").select("r.*") ``` Now outputs a single column `b`, instead of two (duplicate) columns for `b`. ### How was this patch tested? UTs Closes apache#36763 from karenfeng/SPARK-39376. Authored-by: Karen Feng <karen.feng@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

backport apache#37758 to 3.2 This PR fixes a regression caused by apache#32017 . In apache#32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of apache#32017 . After propagating metadata columns, a problem from apache#31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. fix a regression For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. new tests Closes apache#37818 from cloud-fan/backport. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit d566017)

karenfeng added 3 commits February 25, 2021 16:00

Only use metadata columns for resolution as last resort

1c5ab03

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Resolve deduplicated common columns in NATURAL/USING JOIN with hidden…

2fe733f

… column Signed-off-by: Karen Feng <karen.feng@databricks.com>

Merge branch 'master' of https://github.com/apache/spark into spark-3…

2c261bb

…4527 Signed-off-by: Karen Feng <karen.feng@databricks.com>

github-actions bot added the SQL label Feb 26, 2021

Fix behavior in Scala

80beda8

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Fix nested expression

e1719d3

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Merge branch 'master' of https://github.com/apache/spark into spark-3…

6fa70ba

…4527 Signed-off-by: Karen Feng <karen.feng@databricks.com>

karenfeng commented Mar 3, 2021

View reviewed changes

karenfeng commented Mar 4, 2021

View reviewed changes

cloud-fan reviewed Mar 4, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 4, 2021

View reviewed changes

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 4, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 4, 2021

View reviewed changes

sql/core/src/test/resources/sql-tests/inputs/natural-join.sql Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 4, 2021

View reviewed changes

karenfeng added 3 commits March 4, 2021 10:04

Address comments

0ba1916

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Merge branch 'master' of https://github.com/apache/spark into spark-3…

2b7e730

…4527

Push SQL output

0c116a5

Signed-off-by: Karen Feng <karen.feng@databricks.com>

karenfeng requested a review from cloud-fan April 13, 2021 03:27

cloud-fan reviewed Apr 13, 2021

View reviewed changes

cloud-fan approved these changes Apr 13, 2021

View reviewed changes

karenfeng added 2 commits April 13, 2021 10:53

address comments

446d4bc

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Merge branch 'master' of https://github.com/apache/spark into spark-3…

8f70c2d

…4527 Signed-off-by: Karen Feng <karen.feng@databricks.com>

cloud-fan closed this in 816f6dd Apr 14, 2021

karenfeng mentioned this pull request Jun 3, 2022

[SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN #36763

Closed

timgautier reviewed Jun 14, 2022

View reviewed changes

cloud-fan mentioned this pull request Sep 2, 2022

[SPARK-40149][SQL] Propagate metadata columns through Project #37758

Closed

cloud-fan mentioned this pull request Sep 7, 2022

[SPARK-40149][SQL][3.2] Propagate metadata columns through Project #37818

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN #31666

[SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN #31666

karenfeng commented Feb 26, 2021 •

edited by cloud-fan

Loading

SparkQA commented Feb 27, 2021

SparkQA commented Feb 27, 2021

SparkQA commented Feb 27, 2021

SparkQA commented Feb 27, 2021

SparkQA commented Feb 27, 2021

SparkQA commented Mar 2, 2021

SparkQA commented Mar 2, 2021

SparkQA commented Mar 3, 2021

karenfeng Mar 3, 2021

cloud-fan Mar 4, 2021

karenfeng Mar 3, 2021

cloud-fan commented Mar 4, 2021

karenfeng commented Mar 4, 2021

karenfeng Mar 4, 2021

cloud-fan Mar 4, 2021

karenfeng Mar 4, 2021

cloud-fan Mar 4, 2021

karenfeng Mar 4, 2021

SparkQA commented Mar 5, 2021

SparkQA commented Apr 13, 2021

cloud-fan Apr 13, 2021

cloud-fan Apr 13, 2021

cloud-fan Apr 13, 2021

cloud-fan Apr 13, 2021

cloud-fan left a comment

SparkQA commented Apr 13, 2021

SparkQA commented Apr 13, 2021

SparkQA commented Apr 13, 2021

cloud-fan commented Apr 14, 2021

timgautier Jun 14, 2022

timgautier Jun 14, 2022

cloud-fan Jun 15, 2022

timgautier Jun 15, 2022

[SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN #31666

[SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN #31666

Conversation

karenfeng commented Feb 26, 2021 • edited by cloud-fan Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Feb 27, 2021

SparkQA commented Feb 27, 2021

SparkQA commented Feb 27, 2021

SparkQA commented Feb 27, 2021

SparkQA commented Feb 27, 2021

SparkQA commented Mar 2, 2021

SparkQA commented Mar 2, 2021

SparkQA commented Mar 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Mar 4, 2021

karenfeng commented Mar 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 5, 2021

SparkQA commented Apr 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan left a comment

Choose a reason for hiding this comment

SparkQA commented Apr 13, 2021

SparkQA commented Apr 13, 2021

SparkQA commented Apr 13, 2021

cloud-fan commented Apr 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karenfeng commented Feb 26, 2021 •

edited by cloud-fan

Loading