-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45752][SQL] Unreferenced CTE should all be checked by CheckAnalysis0 #43614
Conversation
e876a60
to
96defe4
Compare
val e = intercept[AnalysisException](sql( | ||
s""" | ||
|with | ||
|a as (select * from t), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|a as (select * from t), | |
|a as (select * from non_exist), |
@@ -120,6 +120,30 @@ case class InlineCTE(alwaysInline: Boolean = false) extends Rule[LogicalPlan] { | |||
} | |||
} | |||
} | |||
// If a CTE ref count is 0, the other CTE that is referencing it should also have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about changing the ref count directly. I think the ref count is correct, the problem is in CheckAnalysis
when we decide which CTE relations need to be checked ahead. The current condition is simply refCount == 0
, but we should also check CTE relations referenced by CTE relations that has 0 refCount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @cloud-fan. Probably you just need to call cleanCTEMap()
in CheckAnalysis
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to address this comment.
// If a CTE ref count is 0, the other CTE that is referencing it should also have | ||
// a 0 ref count. | ||
val visited: mutable.Map[Long, Boolean] = mutable.Map.empty.withDefaultValue(false) | ||
cteMap.foreach { case(cteId, _) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cteMap.foreach { case(cteId, _) => | |
cteMap.foreach { case (cteId, (_, refCount, _)) => |
cteMap.foreach { case(cteId, _) => | ||
val (_, refCount, _) = cteMap(cteId) | ||
if (refCount == 0 && !visited(cteId) ) { | ||
reconcileCTERefCount(cteMap, visited, cteId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we should inline reconcileCTERefCount
here.
The code has been updated. Please take a look. |
sql/core/src/test/scala/org/apache/spark/sql/CTEInlineSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Outdated
Show resolved
Hide resolved
@@ -156,7 +171,15 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB | |||
// If a CTE relation is never used, it will disappear after inline. Here we explicitly check | |||
// analysis for it, to make sure the entire query plan is valid. | |||
try { | |||
if (refCount == 0) checkAnalysis0(relation.child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can keep this check as it is, but call the already existing cleanCTEMap()
right after buildCTEMap()
a bit above? cleanCTEMap()
's comment seems wrong and it doesn't remove CTE defs without useful refs, but it just set ref counts to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current code is better as it guarantees the leaf CTE relations (do not reference any other) are checked first. Maybe we can make it clearer using comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I added some comments to explain we are trying to check the leaf relation first because that gives the most accurate error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, I was travelling this week.
The reason why I would suggest the cleanCTEMap()
approach is because you might be checking a CTE 2 times now. Not in this new code, as visited
map makes sure to check a CTE only once here, but later when we check the inlined plan it might contain a CTE that was referenced from an other CTE with ref count = 0 (so the CTE is checked in the new code) and also from a third CTE with valid references (so the CTE is checked in the inlined plan again).
As for checking the leafs first, checking the CTEs in id order guarantees that. This is because how we do CTE resolution (assign ids to defs) in CTESubstitution
. cleanCTEMap()
also uses this property to quickly clear useless transitive reference counts.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Outdated
Show resolved
Hide resolved
…ysis/CheckAnalysis.scala
thanks, merging to master! |
…ions ### What changes were proposed in this pull request? #43614 let unreferenced `CTE` checked by `CheckAnalysis0`. This PR follows up #43614 to simplify the code for check unreferenced CTE relations. ### Why are the changes needed? Simplify the code for check unreferenced CTE relations ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #43727 from beliefer/SPARK-45752_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Jiaan Geng <beliefer@163.com>
What changes were proposed in this pull request?
This PR fixes an issue that if a CTE is referenced by a non-referenced CTE, then this CTE should also have ref count as 0 and goes through CheckAnalysis0. This will guarantee analyzer throw proper error message for problematic CTE which is not referenced.
Why are the changes needed?
To improve error message for non-referenced CTE case.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
UT
Was this patch authored or co-authored using generative AI tooling?
NO