Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations #28371

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Apr 27, 2020

What changes were proposed in this pull request?

This is a followup of #28318, to make the code more readable, by adding some comments to explain the trick and simplify the code to use a boolean flag instead of 2 string sets.

This PR also fixes various problems:

  1. the name check should consider case sensitivity
  2. forward name conflicts like with t as (with t2 as ...), t2 as ... is not a real conflict and we shouldn't fail.

Why are the changes needed?

correct the behavior

Does this PR introduce any user-facing change?

yes, fix the fore-mentioned behaviors.

How was this patch tested?

new tests

-- !query output
org.apache.spark.sql.AnalysisException
Name t is ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228.;
Copy link
Contributor Author

@cloud-fan cloud-fan Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this query with Spark 2.4.5 (need to replace t(c) AS (SELECT 1) with t AS (SELECT 1 c) as Spark 2.4 doesn't support t(c) syntax), it fails the parser. So we don't need to fail here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@cloud-fan
Copy link
Contributor Author

@cloud-fan cloud-fan changed the title [SPARK-31535][SQL][FOLLOWUP] Simplify name conflict check in CTE resolution [SPARK-31577][SQL] Fix various problems when check name conflicts of CTE relations Apr 27, 2020
}.toSet ++ namesInSubqueries
assertNoNameConflictsInCTE(child, outerCTERelationNames, newNames)
w.innerChildren.foreach(assertNoNameConflictsInCTE(_, newNames, newNames))
assertNoNameConflictsInCTE(relation, newNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could use relation.child here and drop the SubqueryAlias case below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! updated.

@peter-toth
Copy link
Contributor

That flag is a nice trick so LGTM, I just left a minor note.
The other 2 fixes also look good.

newNames ++= outerCTERelationNames
relations.foreach {
case (name, relation) =>
if (startOfQuery && outerCTERelationNames.exists(resolver(_, name))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I missed to check case-sensitivity.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31577][SQL] Fix various problems when check name conflicts of CTE relations [SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations Apr 27, 2020
@dongjoon-hyun
Copy link
Member

Thank you so much, @cloud-fan .

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Apr 27, 2020

Test build #121920 has finished for PR 28371 at commit 24a487d.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 27, 2020

All Scala/Java/Python test passed, but it's timeouted at R testing.

I ran the R UT manually.

══ testthat results  ═══════════════════════════════════════════════════════════
[ OK: 13 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 0 ]
✔ |  OK F W S | Context
✔ |  11       | binary functions [1.5 s]
✔ |   4       | functions on binary files [1.3 s]
✔ |   2       | broadcast variables [0.3 s]
✔ |   5       | functions in client.R
✔ |  46       | test functions in sparkR.R [4.4 s]
✔ |   2       | include R packages [0.2 s]
✔ |   2       | JVM API [0.1 s]
✔ |  75       | MLlib classification algorithms, except for tree-based algorithms [55.2 s]
✔ |  70       | MLlib clustering algorithms [23.8 s]
✔ |   6       | MLlib frequent pattern mining [1.8 s]
✔ |   8       | MLlib recommendation algorithms [4.6 s]
✔ | 136       | MLlib regression algorithms, except for tree-based algorithms [50.6 s]
✔ |   8       | MLlib statistics algorithms [0.4 s]
✔ |  94       | MLlib tree-based algorithms [40.7 s]
✔ |  29       | parallelize() and collect() [0.4 s]
✔ | 428       | basic RDD functions [15.0 s]
✔ |  39       | SerDe functionality [2.2 s]
✔ |  20       | partitionBy, groupByKey, reduceByKey etc. [2.1 s]
✔ |   4       | functions in sparkR.R
✔ |  16       | SparkSQL Arrow optimization [10.5 s]
✔ |   6       | test show SparkDataFrame when eager execution is enabled. [0.7 s]
✔ | 1177       | SparkSQL functions [106.5 s]
✔ |  42       | Structured Streaming [53.8 s]
✔ |  16       | tests RDD function take() [0.5 s]
✔ |  14       | the textFile() function [1.3 s]
✔ |  46       | functions in utils.R [0.3 s]
✔ |   0     1 | Windows-specific tests
────────────────────────────────────────────────────────────────────────────────
test_Windows.R:22: skip: sparkJars tag in SparkContext
Reason: This test is only for Windows, skipped
────────────────────────────────────────────────────────────────────────────────

══ Results ═════════════════════════════════════════════════════════════════════
Duration: 378.6 s

OK:       2306
Failed:   0
Warnings: 0
Skipped:  1

Thank you so much for this fix, @cloud-fan and @peter-toth .
Merged to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request Apr 27, 2020
…blems when check name conflicts of CTE relations

### What changes were proposed in this pull request?

This is a followup of #28318, to make the code more readable, by adding some comments to explain the trick and simplify the code to use a boolean flag instead of 2 string sets.

This PR also fixes various problems:
1. the name check should consider case sensitivity
2. forward name conflicts like `with t as (with t2 as ...), t2 as ...` is not a real conflict and we shouldn't fail.

### Why are the changes needed?

correct the behavior

### Does this PR introduce any user-facing change?

yes, fix the fore-mentioned behaviors.

### How was this patch tested?

new tests

Closes #28371 from cloud-fan/followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 2f4f38b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Copy link
Member

@xuanyuanking xuanyuanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late LGTM, thanks for the fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants