Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41635][SQL] Fix group by all error reporting #39509

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup of #39134 . It fixes a bug in the error reporting: when detecting the failed GROUP BY ALL case, we should use the same code that was used to resolve GROUP BY ALL. This bug was hidden because the test has a typo.

Why are the changes needed?

bug fix

Does this PR introduce any user-facing change?

yes, better errors

How was this patch tested?

new tests

@github-actions github-actions bot added the SQL label Jan 11, 2023

-- make sure we report the right error when there's an attribute in correlated subquery
-- that is, to report UNRESOLVED_ALL_IN_GROUP_BY, rather than some random subquery error.
select coutnry, (select count(*) from data d1 where d1.country = d2.country), count(id) from data d2 group by all;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coutnry is an non-existing column and we should report normal missing column error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

@cloud-fan
Copy link
Contributor Author

cc @rxin @gengliangwang

@@ -93,8 +93,9 @@ object ResolveGroupByAll extends Rule[LogicalPlan] {
* end of analysis, so we can tell users that we fail to infer the grouping columns.
*/
def checkAnalysis(operator: LogicalPlan): Unit = operator match {
case a: Aggregate if matchToken(a) =>
if (a.aggregateExpressions.exists(_.exists(_.isInstanceOf[Attribute]))) {
case a: Aggregate if a.aggregateExpressions.forall(_.resolved) && matchToken(a) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there are some codes duplicated with the apply method. We can consider simple refactors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check code here is a partial version of the apply code. I can't think of a simple refactor that can reduce more code duplication. Let's think about this later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about the check here. For the grouping expression inference, I think we can refactor it as #39523

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment

@cloud-fan
Copy link
Contributor Author

thanks for review, merging to master!

@cloud-fan cloud-fan closed this in 63479a2 Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants