Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37943][SQL] Use error classes in the compilation errors of grouping #35389

Closed
wants to merge 3 commits into from

Conversation

SaurabhChawla100
Copy link
Contributor

@SaurabhChawla100 SaurabhChawla100 commented Feb 3, 2022

What changes were proposed in this pull request?

Migrate the following errors in QueryCompilationErrors onto use error classes:
groupingMustWithGroupingSetsOrCubeOrRollupError => UNSUPPORTED_GROUPING_EXPRESSION

Why are the changes needed?

Porting grouping /grouping Id errors to new error framework.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added the unit test in QueryCompilationErrorsSuite and tested the unit test

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I have only a question for the error message.

cc @amaliujia

@@ -153,6 +153,9 @@
"message" : [ "The feature is not supported: %s" ],
"sqlState" : "0A000"
},
"UNSUPPORTED_GROUPING_EXPRESSION" : {
"message" : [ "grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking GroupingSets/Cube/Rollup is not easy to understand. Right now I cannot immediately recall what is grouping sets or cube or rollup.

Is it possible to use other way to explain grouping()/grouping_id()? Like they can only be used in syntax a, b, c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GroupingSets/Cube/Rollup are actually first class SQL syntax. So saying them in error message should be fine as people can quickly search and then know what GroupingSets/Cube/Rollup is.

}
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we can already pass a parametrized reason here as we know this is sort with grouping?

Copy link
Contributor Author

@SaurabhChawla100 SaurabhChawla100 Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding correctly So the message should be like this

"Sort with grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup" 
   
instead of 

"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup"

Similar case for filter

"Filter with grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup" 

instead of 

"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the code val groupingExprs = findGroupingExprs(child) . findGroupingExprs is called from two places

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L733)

and

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L740.

In that case if want to parameterised the error by updating the input param to the method
def findGroupingExprs(plan: LogicalPlan) to def findGroupingExprs(plan: LogicalPlan, parentInfo: String)

So that when throwing the exception we can add the where it failed either in filter or sort.

If this looks fine, I can make that change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM. Sort with grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup is not really more helpful.

assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the empty line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 88 to 104
// filter with grouping
var errMsg = intercept[AnalysisException] {
df.filter("grouping(CustomerId)=17850")
.groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))

// filter with grouping_id
errMsg = intercept[AnalysisException] {
df.filter("grouping_id(CustomerId)=17850").
groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// filter with grouping
var errMsg = intercept[AnalysisException] {
df.filter("grouping(CustomerId)=17850")
.groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
// filter with grouping_id
errMsg = intercept[AnalysisException] {
df.filter("grouping_id(CustomerId)=17850").
groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
Seq("grouping", "grouping_id").foreach { grouping =>
val errMsg = intercept[AnalysisException] {
df.filter(s"$grouping(CustomerId)=17850")
.groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 114 to 130
// Sort with grouping
var errMsg = intercept[AnalysisException] {
df.sort(grouping("CustomerId"))
.groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")

// Sort with grouping_id
errMsg = intercept[AnalysisException] {
df.sort(grouping_id("CustomerId")).
groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Sort with grouping
var errMsg = intercept[AnalysisException] {
df.sort(grouping("CustomerId"))
.groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
// Sort with grouping_id
errMsg = intercept[AnalysisException] {
df.sort(grouping_id("CustomerId")).
groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
Seq(grouping("CustomerId"), grouping_id("CustomerId")).foreach { grouping =>
val errMsg = intercept[AnalysisException] {
df.sort(grouping)
.groupBy("CustomerId").agg(Map("Quantity" -> "max"))
}
assert(errMsg.errorClass === Some("UNSUPPORTED_GROUPING_EXPRESSION"))
assert(errMsg.message ===
"grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup")
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@MaxGekk MaxGekk changed the title [SPARK-37943][SQL]Use error classes in the compilation errors of grouping [SPARK-37943][SQL] Use error classes in the compilation errors of grouping Feb 6, 2022
@MaxGekk
Copy link
Member

MaxGekk commented Feb 8, 2022

+1, LGTM. Merging to master.
Thank you, @SaurabhChawla100 and @amaliujia for review.

@MaxGekk MaxGekk closed this in 08c851d Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants