Skip to content

Comments

[SPARK-44714] Ease restriction of LCA resolution regarding queries with having#42276

Closed
anchovYu wants to merge 6 commits intoapache:masterfrom
anchovYu:lca-limitation-better-error
Closed

[SPARK-44714] Ease restriction of LCA resolution regarding queries with having#42276
anchovYu wants to merge 6 commits intoapache:masterfrom
anchovYu:lca-limitation-better-error

Conversation

@anchovYu
Copy link
Contributor

@anchovYu anchovYu commented Aug 1, 2023

What changes were proposed in this pull request?

This PR eases some restriction of LCA resolution regarding queries with having.

Previously LCA won't rewrite (to the new plan shape) when the whole queries contains UnresolvedHaving, in case it breaks the plan shape of UnresolvedHaving - Aggregate that can be recognized by other rules. But this limitation is too strict and it causes some deadlock in having - lca - window queries. See https://issues.apache.org/jira/browse/SPARK-42936 for more details and examples.

With this PR, it will only skip LCA resolution on the Aggregate whose direct parent is UnresolvedHaving. This is enabled by a new bottom-up resolution without using the transform or resolve utility function.

This PR also recognizes a vulnerability related to TEMP_RESOVLED_COLUMN and comments in the code. It should be considered as future work.

Why are the changes needed?

More complete functionality and better user experience.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Aug 1, 2023
"LATERAL_COLUMN_ALIAS_IN_AGGREGATE_WITH_WINDOW_AND_HAVING" : {
"message" : [
"Referencing lateral column alias <lca> in the aggregate query both with window expressions and with having clause. Please rewrite the aggregate query by removing the having clause or removing lateral alias reference in the SELECT list."
"Referencing lateral column alias <lca> in the query both with window expressions and with having clause. Please rewrite the aggregate query by removing the having clause or removing lateral alias reference in the SELECT list that has the window expression and retry."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anchovYu anchovYu changed the title [WIP] Throw a more tailored error message for having - window - LCA unresolved case [WIP] Ease restriction of LCA resolution regarding queries with having Aug 7, 2023
@anchovYu anchovYu changed the title [WIP] Ease restriction of LCA resolution regarding queries with having [SPARK-44714] Ease restriction of LCA resolution regarding queries with having Aug 8, 2023
&& aggregateExpressions.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) =>
case pOriginal: Project if ruleApplicableOnOperator(pOriginal, pOriginal.projectList)
&& pOriginal.projectList.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) =>
val p @ Project(projectList, child) = pOriginal.mapChildren(apply0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bottom-up resolution. The rest of code is fully copied and has no change.

if ruleApplicableOnOperator(aggOriginal, aggOriginal.aggregateExpressions)
&& aggOriginal.aggregateExpressions.exists(
_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) =>
val agg @ Aggregate(groupingExpressions, aggregateExpressions, _) =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bottom up resolution. The rest of code is fully copied and has no change (except the one I commented out).

}
}
if (!aggregateExpressions.forall(eligibleToLiftUp)) {
agg
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this line was return agg. That could be risky to return in a closure.. But for apply0 with or without return should be the same.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.5!

@cloud-fan cloud-fan closed this in 29e8331 Aug 8, 2023
cloud-fan pushed a commit that referenced this pull request Aug 8, 2023
…th having

### What changes were proposed in this pull request?
This PR eases some restriction of LCA resolution regarding queries with having.

Previously LCA won't rewrite (to the new plan shape) when the whole queries contains `UnresolvedHaving`, in case it breaks the plan shape of `UnresolvedHaving - Aggregate` that can be recognized by other rules. But this limitation is too strict and it causes some deadlock in having - lca - window queries. See https://issues.apache.org/jira/browse/SPARK-42936 for more details and examples.

With this PR, it will only skip LCA resolution on the `Aggregate` whose direct parent is `UnresolvedHaving`. This is enabled by a new bottom-up resolution without using the transform or resolve utility function.

This PR also recognizes a vulnerability related to `TEMP_RESOVLED_COLUMN` and comments in the code. It should be considered as future work.

### Why are the changes needed?
More complete functionality and better user experience.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
New tests.

Closes #42276 from anchovYu/lca-limitation-better-error.

Authored-by: Xinyi Yu <xinyi.yu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 29e8331)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…th having

### What changes were proposed in this pull request?
This PR eases some restriction of LCA resolution regarding queries with having.

Previously LCA won't rewrite (to the new plan shape) when the whole queries contains `UnresolvedHaving`, in case it breaks the plan shape of `UnresolvedHaving - Aggregate` that can be recognized by other rules. But this limitation is too strict and it causes some deadlock in having - lca - window queries. See https://issues.apache.org/jira/browse/SPARK-42936 for more details and examples.

With this PR, it will only skip LCA resolution on the `Aggregate` whose direct parent is `UnresolvedHaving`. This is enabled by a new bottom-up resolution without using the transform or resolve utility function.

This PR also recognizes a vulnerability related to `TEMP_RESOVLED_COLUMN` and comments in the code. It should be considered as future work.

### Why are the changes needed?
More complete functionality and better user experience.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
New tests.

Closes apache#42276 from anchovYu/lca-limitation-better-error.

Authored-by: Xinyi Yu <xinyi.yu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants