Skip to content

Commit

Permalink
[SPARK-41985][SQL] Centralize more column resolution rules
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This is a followup of #38888 .

When I search for all the matching of `UnresolvedAttribute`, I found that there are still a few rules doing column resolution:
1. ResolveAggAliasInGroupBy
2. ResolveGroupByAll
3. ResolveOrderByAll
4. ResolveDefaultColumns

This PR merges the first 3 into `ResolvedReferences`. The last one will be done with a separate PR, as it's more complicated.

To avoid making the rule `ResolvedReferences` bigger and bigger, this PR pulls out the resolution code for `Aggregate` to a separated virtual rule (only be used by `ResolvedReferences`). The same to `Sort`. We can refactor and add more virtual rules later.

### Why are the changes needed?

It's problematic to not centralize all the column resolution logic, as the execution order of the rules is not reliable. It actually leads to regression after #38888  : `select a from t where exists (select 1 as a group by a)`. The `group by a` should be resolved as `1 as a`, but now it's resolved as outer reference `a`. This is because `ResolveReferences` runs before `ResolveAggAliasInGroupBy`, and resolves outer references too early.

### Does this PR introduce _any_ user-facing change?

Fixes a bug, but the bug is not released yet.

### How was this patch tested?

new tests

Closes #39508 from cloud-fan/column.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 40ca27c)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
cloud-fan committed Feb 1, 2023
1 parent c421b51 commit 3927c5e
Show file tree
Hide file tree
Showing 12 changed files with 918 additions and 631 deletions.
5 changes: 5 additions & 0 deletions core/src/main/resources/error/error-classes.json
Original file line number Diff line number Diff line change
Expand Up @@ -1521,6 +1521,11 @@
"Referencing a lateral column alias <lca> in the aggregate function <aggFunc>."
]
},
"LATERAL_COLUMN_ALIAS_IN_GROUP_BY" : {
"message" : [
"Referencing a lateral column alias via GROUP BY alias/ALL is not supported yet."
]
},
"LATERAL_JOIN_USING" : {
"message" : [
"JOIN USING with LATERAL correlation."
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB

// Fail if we still have an unresolved all in group by. This needs to run before the
// general unresolved check below to throw a more tailored error message.
ResolveGroupByAll.checkAnalysis(operator)
ResolveReferencesInAggregate.checkUnresolvedGroupByAll(operator)

getAllExpressions(operator).foreach(_.foreachUp {
case a: Attribute if !a.resolved =>
Expand Down Expand Up @@ -762,8 +762,8 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB

private def getAllExpressions(plan: LogicalPlan): Seq[Expression] = {
plan match {
// `groupingExpressions` may rely on `aggregateExpressions`, due to the GROUP BY alias
// feature. We should check errors in `aggregateExpressions` first.
// We only resolve `groupingExpressions` if `aggregateExpressions` is resolved first (See
// `ResolveReferencesInAggregate`). We should check errors in `aggregateExpressions` first.
case a: Aggregate => a.aggregateExpressions ++ a.groupingExpressions
case _ => plan.expressions
}
Expand Down

Large diffs are not rendered by default.

This file was deleted.

This file was deleted.

Loading

0 comments on commit 3927c5e

Please sign in to comment.