New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations #28371
Conversation
-- !query output | ||
org.apache.spark.sql.AnalysisException | ||
Name t is ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228.; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this query with Spark 2.4.5 (need to replace t(c) AS (SELECT 1)
with t AS (SELECT 1 c)
as Spark 2.4 doesn't support t(c)
syntax), it fails the parser. So we don't need to fail here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
}.toSet ++ namesInSubqueries | ||
assertNoNameConflictsInCTE(child, outerCTERelationNames, newNames) | ||
w.innerChildren.foreach(assertNoNameConflictsInCTE(_, newNames, newNames)) | ||
assertNoNameConflictsInCTE(relation, newNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could use relation.child
here and drop the SubqueryAlias
case below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea! updated.
That flag is a nice trick so LGTM, I just left a minor note. |
newNames ++= outerCTERelationNames | ||
relations.foreach { | ||
case (name, relation) => | ||
if (startOfQuery && outerCTERelationNames.exists(resolver(_, name))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I missed to check case-sensitivity.
Thank you so much, @cloud-fan . |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Test build #121920 has finished for PR 28371 at commit
|
All Scala/Java/Python test passed, but it's timeouted at R testing. I ran the R UT manually.
Thank you so much for this fix, @cloud-fan and @peter-toth . |
…blems when check name conflicts of CTE relations ### What changes were proposed in this pull request? This is a followup of #28318, to make the code more readable, by adding some comments to explain the trick and simplify the code to use a boolean flag instead of 2 string sets. This PR also fixes various problems: 1. the name check should consider case sensitivity 2. forward name conflicts like `with t as (with t2 as ...), t2 as ...` is not a real conflict and we shouldn't fail. ### Why are the changes needed? correct the behavior ### Does this PR introduce any user-facing change? yes, fix the fore-mentioned behaviors. ### How was this patch tested? new tests Closes #28371 from cloud-fan/followup. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 2f4f38b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late LGTM, thanks for the fixing.
What changes were proposed in this pull request?
This is a followup of #28318, to make the code more readable, by adding some comments to explain the trick and simplify the code to use a boolean flag instead of 2 string sets.
This PR also fixes various problems:
with t as (with t2 as ...), t2 as ...
is not a real conflict and we shouldn't fail.Why are the changes needed?
correct the behavior
Does this PR introduce any user-facing change?
yes, fix the fore-mentioned behaviors.
How was this patch tested?
new tests