-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41635][SQL] Fix group by all error reporting #39509
Conversation
|
||
-- make sure we report the right error when there's an attribute in correlated subquery | ||
-- that is, to report UNRESOLVED_ALL_IN_GROUP_BY, rather than some random subquery error. | ||
select coutnry, (select count(*) from data d1 where d1.country = d2.country), count(id) from data d2 group by all; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coutnry
is an non-existing column and we should report normal missing column error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢
@@ -93,8 +93,9 @@ object ResolveGroupByAll extends Rule[LogicalPlan] { | |||
* end of analysis, so we can tell users that we fail to infer the grouping columns. | |||
*/ | |||
def checkAnalysis(operator: LogicalPlan): Unit = operator match { | |||
case a: Aggregate if matchToken(a) => | |||
if (a.aggregateExpressions.exists(_.exists(_.isInstanceOf[Attribute]))) { | |||
case a: Aggregate if a.aggregateExpressions.forall(_.resolved) && matchToken(a) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: there are some codes duplicated with the apply
method. We can consider simple refactors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check code here is a partial version of the apply
code. I can't think of a simple refactor that can reduce more code duplication. Let's think about this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about the check here. For the grouping expression inference, I think we can refactor it as #39523
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one comment
thanks for review, merging to master! |
What changes were proposed in this pull request?
This is a followup of #39134 . It fixes a bug in the error reporting: when detecting the failed GROUP BY ALL case, we should use the same code that was used to resolve GROUP BY ALL. This bug was hidden because the test has a typo.
Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
yes, better errors
How was this patch tested?
new tests