[SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations #28371

cloud-fan · 2020-04-27T13:48:29Z

What changes were proposed in this pull request?

This is a followup of #28318, to make the code more readable, by adding some comments to explain the trick and simplify the code to use a boolean flag instead of 2 string sets.

This PR also fixes various problems:

the name check should consider case sensitivity
forward name conflicts like with t as (with t2 as ...), t2 as ... is not a real conflict and we shouldn't fail.

Why are the changes needed?

correct the behavior

Does this PR introduce any user-facing change?

yes, fix the fore-mentioned behaviors.

How was this patch tested?

new tests

cloud-fan · 2020-04-27T13:50:05Z

sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out

 -- !query output
-org.apache.spark.sql.AnalysisException
-Name t is ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228.;


I tried this query with Spark 2.4.5 (need to replace t(c) AS (SELECT 1) with t AS (SELECT 1 c) as Spark 2.4 doesn't support t(c) syntax), it fails the parser. So we don't need to fail here.

cloud-fan · 2020-04-27T13:50:52Z

cc @peter-toth @xuanyuanking @dongjoon-hyun

peter-toth · 2020-04-27T15:50:45Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

-        }.toSet ++ namesInSubqueries
-        assertNoNameConflictsInCTE(child, outerCTERelationNames, newNames)
-        w.innerChildren.foreach(assertNoNameConflictsInCTE(_, newNames, newNames))
+            assertNoNameConflictsInCTE(relation, newNames)


Do you think we could use relation.child here and drop the SubqueryAlias case below?

good idea! updated.

peter-toth · 2020-04-27T15:55:22Z

That flag is a nice trick so LGTM, I just left a minor note.
The other 2 fixes also look good.

dongjoon-hyun · 2020-04-27T16:10:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

+        newNames ++= outerCTERelationNames
+        relations.foreach {
+          case (name, relation) =>
+            if (startOfQuery && outerCTERelationNames.exists(resolver(_, name))) {


My bad. I missed to check case-sensitivity.

dongjoon-hyun · 2020-04-27T16:55:19Z

Thank you so much, @cloud-fan .

SparkQA · 2020-04-27T23:04:46Z

Test build #121920 has finished for PR 28371 at commit 24a487d.

This patch fails from timeout after a configured wait of 400m.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-04-27T23:47:04Z

All Scala/Java/Python test passed, but it's timeouted at R testing.

I ran the R UT manually.

══ testthat results  ═══════════════════════════════════════════════════════════
[ OK: 13 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 0 ]
✔ |  OK F W S | Context
✔ |  11       | binary functions [1.5 s]
✔ |   4       | functions on binary files [1.3 s]
✔ |   2       | broadcast variables [0.3 s]
✔ |   5       | functions in client.R
✔ |  46       | test functions in sparkR.R [4.4 s]
✔ |   2       | include R packages [0.2 s]
✔ |   2       | JVM API [0.1 s]
✔ |  75       | MLlib classification algorithms, except for tree-based algorithms [55.2 s]
✔ |  70       | MLlib clustering algorithms [23.8 s]
✔ |   6       | MLlib frequent pattern mining [1.8 s]
✔ |   8       | MLlib recommendation algorithms [4.6 s]
✔ | 136       | MLlib regression algorithms, except for tree-based algorithms [50.6 s]
✔ |   8       | MLlib statistics algorithms [0.4 s]
✔ |  94       | MLlib tree-based algorithms [40.7 s]
✔ |  29       | parallelize() and collect() [0.4 s]
✔ | 428       | basic RDD functions [15.0 s]
✔ |  39       | SerDe functionality [2.2 s]
✔ |  20       | partitionBy, groupByKey, reduceByKey etc. [2.1 s]
✔ |   4       | functions in sparkR.R
✔ |  16       | SparkSQL Arrow optimization [10.5 s]
✔ |   6       | test show SparkDataFrame when eager execution is enabled. [0.7 s]
✔ | 1177       | SparkSQL functions [106.5 s]
✔ |  42       | Structured Streaming [53.8 s]
✔ |  16       | tests RDD function take() [0.5 s]
✔ |  14       | the textFile() function [1.3 s]
✔ |  46       | functions in utils.R [0.3 s]
✔ |   0     1 | Windows-specific tests
────────────────────────────────────────────────────────────────────────────────
test_Windows.R:22: skip: sparkJars tag in SparkContext
Reason: This test is only for Windows, skipped
────────────────────────────────────────────────────────────────────────────────

══ Results ═════════════════════════════════════════════════════════════════════
Duration: 378.6 s

OK:       2306
Failed:   0
Warnings: 0
Skipped:  1

Thank you so much for this fix, @cloud-fan and @peter-toth .
Merged to master/3.0.

…blems when check name conflicts of CTE relations ### What changes were proposed in this pull request? This is a followup of #28318, to make the code more readable, by adding some comments to explain the trick and simplify the code to use a boolean flag instead of 2 string sets. This PR also fixes various problems: 1. the name check should consider case sensitivity 2. forward name conflicts like `with t as (with t2 as ...), t2 as ...` is not a real conflict and we shouldn't fail. ### Why are the changes needed? correct the behavior ### Does this PR introduce any user-facing change? yes, fix the fore-mentioned behaviors. ### How was this patch tested? new tests Closes #28371 from cloud-fan/followup. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 2f4f38b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

xuanyuanking

Late LGTM, thanks for the fixing.

probot-autolabeler bot added the SQL label Apr 27, 2020

cloud-fan commented Apr 27, 2020

View reviewed changes

cloud-fan changed the title ~~[SPARK-31535][SQL][FOLLOWUP] Simplify name conflict check in CTE resolution~~ [SPARK-31577][SQL] Fix various problems when check name conflicts of CTE relations Apr 27, 2020

simplify name conflict check in CTE resolution

c4d0a69

cloud-fan force-pushed the followup branch from dd6cec5 to c4d0a69 Compare April 27, 2020 14:03

peter-toth reviewed Apr 27, 2020

View reviewed changes

dongjoon-hyun reviewed Apr 27, 2020

View reviewed changes

address comment

24a487d

cloud-fan force-pushed the followup branch from 0d4de3f to 24a487d Compare April 27, 2020 16:19

dongjoon-hyun changed the title ~~[SPARK-31577][SQL] Fix various problems when check name conflicts of CTE relations~~ [SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations Apr 27, 2020

This comment has been minimized.

Sign in to view

dongjoon-hyun closed this in 2f4f38b Apr 27, 2020

xuanyuanking reviewed Apr 28, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations #28371

[SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations #28371

cloud-fan commented Apr 27, 2020 •

edited

cloud-fan Apr 27, 2020 •

edited

dongjoon-hyun Apr 27, 2020

cloud-fan commented Apr 27, 2020

peter-toth Apr 27, 2020

cloud-fan Apr 27, 2020

peter-toth commented Apr 27, 2020

dongjoon-hyun Apr 27, 2020

dongjoon-hyun commented Apr 27, 2020

This comment has been minimized.

This comment has been minimized.

SparkQA commented Apr 27, 2020

dongjoon-hyun commented Apr 27, 2020 •

edited

xuanyuanking left a comment

[SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations #28371

[SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations #28371

Conversation

cloud-fan commented Apr 27, 2020 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

cloud-fan Apr 27, 2020 • edited

Choose a reason for hiding this comment

dongjoon-hyun Apr 27, 2020

Choose a reason for hiding this comment

cloud-fan commented Apr 27, 2020

peter-toth Apr 27, 2020

Choose a reason for hiding this comment

cloud-fan Apr 27, 2020

Choose a reason for hiding this comment

peter-toth commented Apr 27, 2020

dongjoon-hyun Apr 27, 2020

Choose a reason for hiding this comment

dongjoon-hyun commented Apr 27, 2020

This comment has been minimized.

This comment has been minimized.

SparkQA commented Apr 27, 2020

dongjoon-hyun commented Apr 27, 2020 • edited

xuanyuanking left a comment

Choose a reason for hiding this comment

cloud-fan commented Apr 27, 2020 •

edited

cloud-fan Apr 27, 2020 •

edited

dongjoon-hyun commented Apr 27, 2020 •

edited