[SPARK-34923][SQL] Metadata output should be empty for more plans #32017

karenfeng · 2021-03-31T22:29:45Z

What changes were proposed in this pull request?

Changes the metadata propagation framework.

Previously, most LogicalPlan's propagated their children's metadataOutput. This did not make sense in cases where the LogicalPlan did not even propagate their children's output.

I set the metadata output for plans that do not propagate their children's output to be Nil. Notably, Project and View no longer have metadata output.

Why are the changes needed?

Previously, SELECT m from (SELECT a from tb) would output m if it were metadata. This did not make sense.

Does this PR introduce any user-facing change?

Yes. Now, SELECT m from (SELECT a from tb) will encounter an AnalysisException.

How was this patch tested?

Added unit tests. I did not cover all cases, as they are fairly extensive. However, the new tests cover major cases (and an existing test already covers Join).

Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-03-31T23:26:29Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41367/

SparkQA · 2021-04-01T03:00:25Z

Test build #136784 has finished for PR 32017 at commit 6c3dde2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-04-01T04:41:40Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

@@ -107,6 +109,8 @@ case class Generate(
    child: LogicalPlan)
  extends UnaryNode {

+  val unrequiredSet: Set[Int] = unrequiredChildIndex.toSet


this seems not needed.

cloud-fan · 2021-04-01T04:43:12Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

@@ -270,6 +279,8 @@ case class Union(
    }
  }

+  override def metadataOutput: Seq[Attribute] = children.flatMap(_.metadataOutput)


not sure about Union. It only propagates the output for the first child, and ResolveMissingReferences doesn't support it either.

Same to Intersect Except

cloud-fan · 2021-04-01T04:46:26Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

@@ -466,6 +488,8 @@ case class View(

  override def output: Seq[Attribute] = child.output

+  override def metadataOutput: Seq[Attribute] = child.output


For boundary nodes like View and SubqueryAlias, I think we should not propagate as the query under them should not change during analysis after they are resolved. ResolveMissingReferences skips SubqueryAlias as well.

cloud-fan · 2021-04-01T04:47:51Z

The new behavior makes more sense to me. cc @rdblue @brkyvz @viirya

Signed-off-by: Karen Feng <karen.feng@databricks.com>

viirya

Conceptually this makes sense. I just a bit worry that we might forget to add propagated metadataOutput from its children when adding new node. It seems not a well-known field in logical plan.

Not a big problem, however.

SparkQA · 2021-04-01T06:37:56Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41379/

SparkQA · 2021-04-01T10:37:29Z

Test build #136796 has finished for PR 32017 at commit ce3ac0e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Signed-off-by: Karen Feng <karen.feng@databricks.com>

rdblue · 2021-04-01T17:47:43Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

-  override def metadataOutput: Seq[Attribute] = {
-    val qualifierList = identifier.qualifier :+ alias
-    child.metadataOutput.map(_.withQualifier(qualifierList))
-  }


Why is the logic here removed? Won't this cause resolution failures when referencing a metadata column via an alias? Like SELECT s._file FROM (SELECT ...) s?

@cloud-fan, should we support this case if it requires changing the query during analysis after being resolved?

I think we should only expose the metadata column in a single SELECT group, e.g. SELECT _file FROM t, SELECT t1._file FROM t1 JOIN t2.

It's super weird if we can propagate the metadata column through SELECT groups, e.g. SELECT s._file FROM (SELECT a, b FROM t) s. The s is a subquery alias and the subquery has a clear output list which is a, b. It may surprise users if they can access s._file.

However, I do agree with @rdblue that simple alias should be supported. For example, SELECT t1._file FROM t t1 JOIN t t2. SELECT s._file FROM (SELECT ...) s won't work anyway because Project can't propagate the metadata columns.

That said, let's propagate metadata columns in SubqueryAlias

rdblue · 2021-04-01T17:49:32Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

@@ -31,7 +31,7 @@ import org.apache.spark.sql.types._
 import org.apache.spark.util.random.RandomSampler

 /**
- * When planning take() or collect() operations, this special node that is inserted at the top of
+ * When planning take() or collect() operations, this special node is inserted at the top of


Nit: including unrelated changes tends to cause git conflicts.

rdblue · 2021-04-01T17:51:52Z

I share @viirya's concern about losing metadataOutput. I think it is better to propagate by default than to drop by default, since most of the time the metadata output is not used.

SparkQA · 2021-04-01T18:29:04Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41404/

SparkQA · 2021-04-01T18:31:05Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41404/

SparkQA · 2021-04-01T19:24:36Z

Test build #136824 has finished for PR 32017 at commit a2d72ef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-04-05T07:17:38Z

Usually allowlist is better than denylist as it's more conservative. But for the case here "conservative" means we may hit analysis errors. Let's use denylist then: by default we propagate metadata columns but some operators should override it. e.g. Project, View.

Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-04-05T19:53:43Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41494/

Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-04-05T21:56:56Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41497/

SparkQA · 2021-04-05T21:56:57Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41497/

SparkQA · 2021-04-05T23:22:12Z

Test build #136917 has finished for PR 32017 at commit 5a04a7e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-04-06T01:24:43Z

Test build #136920 has finished for PR 32017 at commit 73219d4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-04-06T02:08:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41502/

SparkQA · 2021-04-06T02:08:25Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41502/

Signed-off-by: Karen Feng <karen.feng@databricks.com>

SparkQA · 2021-04-06T05:38:20Z

Test build #136925 has finished for PR 32017 at commit b1c0183.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-04-06T05:53:12Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41510/

SparkQA · 2021-04-06T05:53:13Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41510/

cloud-fan · 2021-04-06T08:04:24Z

thanks, merging to master/3.1!

Changes the metadata propagation framework. Previously, most `LogicalPlan`'s propagated their `children`'s `metadataOutput`. This did not make sense in cases where the `LogicalPlan` did not even propagate their `children`'s `output`. I set the metadata output for plans that do not propagate their `children`'s `output` to be `Nil`. Notably, `Project` and `View` no longer have metadata output. Previously, `SELECT m from (SELECT a from tb)` would output `m` if it were metadata. This did not make sense. Yes. Now, `SELECT m from (SELECT a from tb)` will encounter an `AnalysisException`. Added unit tests. I did not cover all cases, as they are fairly extensive. However, the new tests cover major cases (and an existing test already covers Join). Closes #32017 from karenfeng/spark-34923. Authored-by: Karen Feng <karen.feng@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 3b634f6) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

SparkQA · 2021-04-06T10:08:03Z

Test build #136933 has finished for PR 32017 at commit e8e6e7d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Changes the metadata propagation framework. Previously, most `LogicalPlan`'s propagated their `children`'s `metadataOutput`. This did not make sense in cases where the `LogicalPlan` did not even propagate their `children`'s `output`. I set the metadata output for plans that do not propagate their `children`'s `output` to be `Nil`. Notably, `Project` and `View` no longer have metadata output. Previously, `SELECT m from (SELECT a from tb)` would output `m` if it were metadata. This did not make sense. Yes. Now, `SELECT m from (SELECT a from tb)` will encounter an `AnalysisException`. Added unit tests. I did not cover all cases, as they are fairly extensive. However, the new tests cover major cases (and an existing test already covers Join). Closes apache#32017 from karenfeng/spark-34923. Authored-by: Karen Feng <karen.feng@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 3b634f6) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? This PR fixes a regression caused by #32017 . In #32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of #32017 . After propagating metadata columns, a problem from #31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. ### Why are the changes needed? fix a regression ### Does this PR introduce _any_ user-facing change? For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. ### How was this patch tested? new tests Closes #37758 from cloud-fan/metadata. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

This PR fixes a regression caused by #32017 . In #32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of #32017 . After propagating metadata columns, a problem from #31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. fix a regression For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. new tests Closes #37758 from cloud-fan/metadata. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 99ae1d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

This PR fixes a regression caused by apache#32017 . In apache#32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of apache#32017 . After propagating metadata columns, a problem from apache#31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. fix a regression For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. new tests Closes apache#37758 from cloud-fan/metadata. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 99ae1d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

backport #37758 to 3.2 ### What changes were proposed in this pull request? This PR fixes a regression caused by #32017 . In #32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of #32017 . After propagating metadata columns, a problem from #31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. ### Why are the changes needed? fix a regression ### Does this PR introduce _any_ user-facing change? For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. ### How was this patch tested? new tests Closes #37818 from cloud-fan/backport. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

backport apache#37758 to 3.2 This PR fixes a regression caused by apache#32017 . In apache#32017 , we tried to be more conservative and decided to not propagate metadata columns in certain operators, including `Project`. However, the decision was made only considering SQL API, not DataFrame API. In fact, it's very common to chain `Project` operators in DataFrame, e.g. `df.withColumn(...).withColumn(...)...`, and it's very inconvenient if metadata columns are not propagated through `Project`. This PR makes 2 changes: 1. Project should propagate metadata columns 2. SubqueryAlias should only propagate metadata columns if the child is a leaf node or also a SubqueryAlias The second change is needed to still forbid weird queries like `SELECT m from (SELECT a from t)`, which is the main motivation of apache#32017 . After propagating metadata columns, a problem from apache#31666 is exposed: the natural join metadata columns may confuse the analyzer and lead to wrong analyzed plan. For example, `SELECT t1.value FROM t1 LEFT JOIN t2 USING (key) ORDER BY key`, how shall we resolve `ORDER BY key`? It should be resolved to `t1.key` via the rule `ResolveMissingReferences`, which is in the output of the left join. However, if `Project` can propagate metadata columns, `ORDER BY key` will be resolved to `t2.key`. To solve this problem, this PR only allows qualified access for metadata columns of natural join. This has no breaking change, as people can only do qualified access for natural join metadata columns before, in the `Project` right after `Join`. This actually enables more use cases, as people can now access natural join metadata columns in ORDER BY. I've added a test for it. fix a regression For SQL API, there is no change, as a `SubqueryAlias` always comes with a `Project` or `Aggregate`, so we still don't propagate metadata columns through a SELECT group. For DataFrame API, the behavior becomes more lenient. The only breaking case is an operator that can propagate metadata columns then follows a `SubqueryAlias`, e.g. `df.filter(...).as("t").select("t.metadata_col")`. But this is a weird use case and I don't think we should support it at the first place. new tests Closes apache#37818 from cloud-fan/backport. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit d566017)

karenfeng added 2 commits March 31, 2021 15:16

Metadata output should be empty by default

6b4cba3

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Clean up

6c3dde2

Signed-off-by: Karen Feng <karen.feng@databricks.com>

karenfeng mentioned this pull request Mar 31, 2021

[SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN #31666

Closed

github-actions bot added the SQL label Mar 31, 2021

cloud-fan reviewed Apr 1, 2021

View reviewed changes

Revert unintended change

ce3ac0e

Signed-off-by: Karen Feng <karen.feng@databricks.com>

viirya reviewed Apr 1, 2021

View reviewed changes

Address comments and merge conflicts

a2d72ef

Signed-off-by: Karen Feng <karen.feng@databricks.com>

rdblue reviewed Apr 1, 2021

View reviewed changes

karenfeng added 2 commits April 5, 2021 11:08

By default, propagate metadata output

a4a7d05

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Use blocklist

5a04a7e

Signed-off-by: Karen Feng <karen.feng@databricks.com>

karenfeng changed the title ~~[SPARK-34923][SQL] Metadata output should be empty by default~~ [SPARK-34923][SQL] Metadata output should be empty fore more plans Apr 5, 2021

karenfeng changed the title ~~[SPARK-34923][SQL] Metadata output should be empty fore more plans~~ [SPARK-34923][SQL] Metadata output should be empty for more plans Apr 5, 2021

Retrigger tests

73219d4

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Retrigger tests

b1c0183

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Retrigger tests

e8e6e7d

Signed-off-by: Karen Feng <karen.feng@databricks.com>

cloud-fan approved these changes Apr 6, 2021

View reviewed changes

cloud-fan closed this in 3b634f6 Apr 6, 2021

cloud-fan mentioned this pull request Sep 1, 2022

[SPARK-40149][SQL] Propagate metadata columns through Project #37758

Closed

cloud-fan mentioned this pull request Sep 7, 2022

[SPARK-40149][SQL][3.2] Propagate metadata columns through Project #37818

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-34923][SQL] Metadata output should be empty for more plans #32017

[SPARK-34923][SQL] Metadata output should be empty for more plans #32017

karenfeng commented Mar 31, 2021 •

edited

Loading

SparkQA commented Mar 31, 2021

SparkQA commented Apr 1, 2021

cloud-fan Apr 1, 2021

cloud-fan Apr 1, 2021

cloud-fan Apr 1, 2021

cloud-fan Apr 1, 2021

cloud-fan commented Apr 1, 2021

viirya left a comment •

edited

Loading

SparkQA commented Apr 1, 2021

SparkQA commented Apr 1, 2021

rdblue Apr 1, 2021

karenfeng Apr 1, 2021

cloud-fan Apr 5, 2021

rdblue Apr 1, 2021

rdblue commented Apr 1, 2021

SparkQA commented Apr 1, 2021

SparkQA commented Apr 1, 2021

SparkQA commented Apr 1, 2021

cloud-fan commented Apr 5, 2021

SparkQA commented Apr 5, 2021

SparkQA commented Apr 5, 2021

SparkQA commented Apr 5, 2021

SparkQA commented Apr 5, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

cloud-fan commented Apr 6, 2021

SparkQA commented Apr 6, 2021

		@@ -466,6 +488,8 @@ case class View(

		override def output: Seq[Attribute] = child.output

		override def metadataOutput: Seq[Attribute] = child.output

[SPARK-34923][SQL] Metadata output should be empty for more plans #32017

[SPARK-34923][SQL] Metadata output should be empty for more plans #32017

Conversation

karenfeng commented Mar 31, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Mar 31, 2021

SparkQA commented Apr 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Apr 1, 2021

viirya left a comment • edited Loading

Choose a reason for hiding this comment

SparkQA commented Apr 1, 2021

SparkQA commented Apr 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue commented Apr 1, 2021

SparkQA commented Apr 1, 2021

SparkQA commented Apr 1, 2021

SparkQA commented Apr 1, 2021

cloud-fan commented Apr 5, 2021

SparkQA commented Apr 5, 2021

SparkQA commented Apr 5, 2021

SparkQA commented Apr 5, 2021

SparkQA commented Apr 5, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

SparkQA commented Apr 6, 2021

cloud-fan commented Apr 6, 2021

SparkQA commented Apr 6, 2021

karenfeng commented Mar 31, 2021 •

edited

Loading

viirya left a comment •

edited

Loading