New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40018][SQL][TESTS] Output SparkThrowable
to SQL golden files in JSON format
#37452
Conversation
SparkThrowable
to SQL golden files in JSON formatSparkThrowable
to SQL golden files in JSON format
SparkThrowable
to SQL golden files in JSON formatSparkThrowable
to SQL golden files in JSON format
@cloud-fan @gengliangwang @srielau @anchovYu Could you review this PR, please. |
344 out of the 470 errors changed do not have an error class associated which means losing all the information about the error. This could hide bugs related to unexpected changes on the errors. |
@@ -128,7 +128,7 @@ select sort_array(array('b', 'd'), '1') | |||
struct<> | |||
-- !query output | |||
org.apache.spark.sql.AnalysisException | |||
cannot resolve 'sort_array(array('b', 'd'), '1')' due to data type mismatch: Sort order in second argument requires a boolean literal.; line 1 pos 7 | |||
{"errorClass":null,"messageParameters":[],"queryContext":[]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @entong, we can't do this.
If there is no defined error class we need have a default.
How about:
{"errorClass":"legacy","messageParameters":["message" -> "original message"],"queryContext":[]}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it's a bit odd to me that the user-facing error contains JSON exception. e.g., if you run it in spark-sql
then users would face such JSON encoded string .. that I don't think it's good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon This is not user-facing changes. I changed the way how golden files are generated in tests. Output of spark-sql
is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
{"errorClass":"legacy","messageParameters":["message" -> "original message"],"queryContext":[]}
It makes sense. Let me do that. Thanks.
@cloud-fan @HyukjinKwon @srielau @entong @gengliangwang @anchovYu Could you take a look at this PR one more time, please. |
== SQL(line 1, position 8) == | ||
select element_at(array(1, 2, 3), 5) | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
{"errorClass":"INVALID_ARRAY_INDEX_IN_ELEMENT_AT","messageParameters":["5","3","\"spark.sql.ansi.enabled\""],"queryContext":[{"objectType":"","objectName":"","startIndex":7,"stopIndex":35,"fragment":"element_at(array(1, 2, 3), 5"}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the design of JSON called for messageParameter entry to be a map (parameterName -> parameterValue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we put parameterName
to golden files, we prevent tech writers from modifying of parameters names in error-classes.json. It seems it is unnecessary restriction, isn't. In any case, the order of params is fixed/constant in the code. cc @cloud-fan WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the design of JSON called for messageParameter entry to be a map (parameterName -> parameterValue).
@@ -71,6 +76,30 @@ trait SQLQueryTestHelper { | |||
if (isSorted(df.queryExecution.analyzed)) (schema, answer) else (schema, answer.sorted) | |||
} | |||
|
|||
private def toJson(e: SparkThrowable): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether we have messed up the order of the PRs here.
In my mental model I had us implement machineReadable Error message and then simply "flip the config" to produce the golden files in that format. What we have here now is a hand crafted JSON. Is that temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generating the JSON file is a small piece of code. Don't see any problem to remove/move it in the near future. And introduce the config, and then check it on the already existing golden files.
What we have here now is a hand crafted JSON.
How else would you generate JSON from SparkThrowable
, I wonder.
Is that temporary?
Not final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same question: shall we add a user-facing feature first that allows users to enable JSON style error message? Due to the low coverage of error classes today, I'd also suggest keeping the message unchanged for errors without error classes (no JSON).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merging to master. Thank you, @srielau @cloud-fan @entong @HyukjinKwon for review. |
try { | ||
result | ||
} catch { | ||
case e: SparkThrowable with Throwable if e.getErrorClass != null => | ||
(emptySchema, Seq(e.getClass.getName, getMessage(e, format))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should normalize the error message as before, see L94
// Do not output the logical plan tree which contains expression IDs.
// Also implement a crude way of masking expression IDs in the error message
// with a generic pattern "###".
val msg = if (a.plan.nonEmpty) a.getSimpleMessage else a.getMessage
(emptySchema, Seq(a.getClass.getName, msg.replaceAll("#\\d+", "#x")))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Do you have an example where we output a plan in an error with error class. As far as I know we are trying to avoid output any plans when we migrate on error classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sql fragment (query context) may contain expr IDs.
What changes were proposed in this pull request?
In the PR, I propose to catch exceptions of the type
SparkThrowable
in the test suite sub-classes ofSQLQueryTestHelper
:and output the content of
SparkThrowable
in the JSON format, seeSQLQueryTestHelper.handleExceptions()
.Also, the PR regenerates all SQL golden files.
When the error class is set (null) in
SparkThrowable
, we output error messages as is in the same way as before the PR.Why are the changes needed?
Does this PR introduce any user-facing change?
No.
How was this patch tested?
By running the affected test suite: