[SPARK-36086][SQL] CollapseProject project replace alias should use origin column name #33576

AngersZhuuuu · 2021-07-29T10:17:18Z

What changes were proposed in this pull request?

For added UT, without this patch will failed as below

[info] - SHOW TABLES V2: SPARK-36086: CollapseProject project replace alias should use origin column name *** FAILED *** (4 seconds, 935 milliseconds)
[info]   java.lang.RuntimeException: After applying rule org.apache.spark.sql.catalyst.optimizer.CollapseProject in batch Operator Optimization before Inferring Filters, the structural integrity of the plan is broken.
[info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.structuralIntegrityIsBrokenAfterApplyingRuleError(QueryExecutionErrors.scala:1217)
[info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:229)
[info]   at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[info]   at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[info]   at scala.collection.immutable.List.foldLeft(List.scala:91)
[info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
[info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
[info]   at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
[info]   at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)

CollapseProject project replace alias should use origin column name

Why are the changes needed?

Fix bug

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

…rigin column name

AngersZhuuuu · 2021-07-29T10:46:42Z

ping @cloud-fan @maropu @HyukjinKwon

SparkQA · 2021-07-29T11:31:22Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46342/

SparkQA · 2021-07-29T13:09:31Z

Test build #141829 has finished for PR 33576 at commit 1b35748.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-29T13:15:08Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46349/

SparkQA · 2021-07-29T13:22:36Z

Test build #141834 has finished for PR 33576 at commit 0f8f841.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-29T14:08:12Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46349/

AngersZhuuuu · 2021-07-29T17:11:51Z

sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q5.sf100/explain.txt

@@ -173,7 +173,7 @@ Input [5]: [s_store_id#23, sum#30, sum#31, sum#32, sum#33]
 Keys [1]: [s_store_id#23]
 Functions [4]: [sum(UnscaledValue(sales_price#8)), sum(UnscaledValue(return_amt#10)), sum(UnscaledValue(profit#9)), sum(UnscaledValue(net_loss#11))]
 Aggregate Attributes [4]: [sum(UnscaledValue(sales_price#8))#35, sum(UnscaledValue(return_amt#10))#36, sum(UnscaledValue(profit#9))#37, sum(UnscaledValue(net_loss#11))#38]
-Results [5]: [MakeDecimal(sum(UnscaledValue(sales_price#8))#35,17,2) AS sales#39, MakeDecimal(sum(UnscaledValue(return_amt#10))#36,17,2) AS RETURNS#40, CheckOverflow((promote_precision(cast(MakeDecimal(sum(UnscaledValue(profit#9))#37,17,2) as decimal(18,2))) - promote_precision(cast(MakeDecimal(sum(UnscaledValue(net_loss#11))#38,17,2) as decimal(18,2)))), DecimalType(18,2), true) AS profit#41, store channel AS channel#42, concat(store, s_store_id#23) AS id#43]
+Results [5]: [MakeDecimal(sum(UnscaledValue(sales_price#8))#35,17,2) AS sales#39, MakeDecimal(sum(UnscaledValue(return_amt#10))#36,17,2) AS returns#40, CheckOverflow((promote_precision(cast(MakeDecimal(sum(UnscaledValue(profit#9))#37,17,2) as decimal(18,2))) - promote_precision(cast(MakeDecimal(sum(UnscaledValue(net_loss#11))#38,17,2) as decimal(18,2)))), DecimalType(18,2), true) AS profit#41, store channel AS channel#42, concat(store, s_store_id#23) AS id#43]


Check the plan, return returns is more reasonable since when collapse project, use top level name is correct.

SparkQA · 2021-07-29T18:22:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46362/

SparkQA · 2021-07-29T19:14:17Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46362/

SparkQA · 2021-07-29T22:12:32Z

Test build #141851 has finished for PR 33576 at commit 96ddcb9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala

SparkQA · 2021-07-30T08:32:12Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46393/

SparkQA · 2021-07-30T10:30:13Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46398/

SparkQA · 2021-07-30T12:05:25Z

Test build #141884 has finished for PR 33576 at commit ac715f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-30T14:12:11Z

Test build #141889 has finished for PR 33576 at commit 89c9695.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-08-02T16:08:11Z

thanks, merging to master/3.2!

…rigin column name ### What changes were proposed in this pull request? For added UT, without this patch will failed as below ``` [info] - SHOW TABLES V2: SPARK-36086: CollapseProject project replace alias should use origin column name *** FAILED *** (4 seconds, 935 milliseconds) [info] java.lang.RuntimeException: After applying rule org.apache.spark.sql.catalyst.optimizer.CollapseProject in batch Operator Optimization before Inferring Filters, the structural integrity of the plan is broken. [info] at org.apache.spark.sql.errors.QueryExecutionErrors$.structuralIntegrityIsBrokenAfterApplyingRuleError(QueryExecutionErrors.scala:1217) [info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:229) [info] at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) [info] at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) [info] at scala.collection.immutable.List.foldLeft(List.scala:91) [info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208) [info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200) [info] at scala.collection.immutable.List.foreach(List.scala:431) [info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200) [info] at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179) [info] at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88) ``` CollapseProject project replace alias should use origin column name ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #33576 from AngersZhuuuu/SPARK-36086. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit f317395) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2021-08-02T16:10:04Z

@AngersZhuuuu can you open backport PRs for 3.1/3.0? thanks!

[SPARK-36086][SQL] CollapseProject project replace alias should use o…

1b35748

…rigin column name

github-actions bot added the SQL label Jul 29, 2021

update UT

0f8f841

uodate

96ddcb9

AngersZhuuuu commented Jul 29, 2021

View reviewed changes

Update CollapseProjectSuite.scala

ac715f2

AngersZhuuuu commented Jul 30, 2021

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala Outdated Show resolved Hide resolved

Update DataSourceV2SQLSuite.scala

89c9695

cloud-fan closed this in f317395 Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-36086][SQL] CollapseProject project replace alias should use origin column name #33576

[SPARK-36086][SQL] CollapseProject project replace alias should use origin column name #33576

AngersZhuuuu commented Jul 29, 2021

AngersZhuuuu commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

AngersZhuuuu Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 30, 2021

SparkQA commented Jul 30, 2021

SparkQA commented Jul 30, 2021

SparkQA commented Jul 30, 2021

cloud-fan commented Aug 2, 2021

cloud-fan commented Aug 2, 2021

[SPARK-36086][SQL] CollapseProject project replace alias should use origin column name #33576

[SPARK-36086][SQL] CollapseProject project replace alias should use origin column name #33576

Conversation

AngersZhuuuu commented Jul 29, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AngersZhuuuu commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

AngersZhuuuu Jul 29, 2021

Choose a reason for hiding this comment

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 29, 2021

SparkQA commented Jul 30, 2021

SparkQA commented Jul 30, 2021

SparkQA commented Jul 30, 2021

SparkQA commented Jul 30, 2021

cloud-fan commented Aug 2, 2021

cloud-fan commented Aug 2, 2021