Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40018][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format #37452

Closed
wants to merge 5 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Aug 9, 2022

What changes were proposed in this pull request?

In the PR, I propose to catch exceptions of the type SparkThrowable in the test suite sub-classes of SQLQueryTestHelper:

  • SQLQueryTestSuite
  • TPCDSQueryTestSuite
  • ThriftServerQueryTestSuite

and output the content of SparkThrowable in the JSON format, see SQLQueryTestHelper.handleExceptions().

Also, the PR regenerates all SQL golden files.

When the error class is set (null) in SparkThrowable, we output error messages as is in the same way as before the PR.

Why are the changes needed?

  1. To put only important information to SQL golden files
  2. To avoid dependencies from the content of error messages

Does this PR introduce any user-facing change?

No.

How was this patch tested?

By running the affected test suite:

$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"

@github-actions github-actions bot added the SQL label Aug 9, 2022
@MaxGekk MaxGekk changed the title [WIP][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format [WIP][SPARK-40018][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format Aug 9, 2022
@MaxGekk MaxGekk marked this pull request as ready for review August 9, 2022 15:17
@MaxGekk MaxGekk changed the title [WIP][SPARK-40018][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format [SPARK-40018][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format Aug 9, 2022
@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 9, 2022

@cloud-fan @gengliangwang @srielau @anchovYu Could you review this PR, please.

@entong
Copy link

entong commented Aug 9, 2022

344 out of the 470 errors changed do not have an error class associated which means losing all the information about the error. This could hide bugs related to unexpected changes on the errors.

@@ -128,7 +128,7 @@ select sort_array(array('b', 'd'), '1')
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'sort_array(array('b', 'd'), '1')' due to data type mismatch: Sort order in second argument requires a boolean literal.; line 1 pos 7
{"errorClass":null,"messageParameters":[],"queryContext":[]}
Copy link
Contributor

@srielau srielau Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @entong, we can't do this.
If there is no defined error class we need have a default.
How about:
{"errorClass":"legacy","messageParameters":["message" -> "original message"],"queryContext":[]}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it's a bit odd to me that the user-facing error contains JSON exception. e.g., if you run it in spark-sql then users would face such JSON encoded string .. that I don't think it's good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon This is not user-facing changes. I changed the way how golden files are generated in tests. Output of spark-sql is the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:
{"errorClass":"legacy","messageParameters":["message" -> "original message"],"queryContext":[]}

It makes sense. Let me do that. Thanks.

@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 10, 2022

@cloud-fan @HyukjinKwon @srielau @entong @gengliangwang @anchovYu Could you take a look at this PR one more time, please.

== SQL(line 1, position 8) ==
select element_at(array(1, 2, 3), 5)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
{"errorClass":"INVALID_ARRAY_INDEX_IN_ELEMENT_AT","messageParameters":["5","3","\"spark.sql.ansi.enabled\""],"queryContext":[{"objectType":"","objectName":"","startIndex":7,"stopIndex":35,"fragment":"element_at(array(1, 2, 3), 5"}]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the design of JSON called for messageParameter entry to be a map (parameterName -> parameterValue).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we put parameterName to golden files, we prevent tech writers from modifying of parameters names in error-classes.json. It seems it is unnecessary restriction, isn't. In any case, the order of params is fixed/constant in the code. cc @cloud-fan WDYT?

Copy link
Contributor

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the design of JSON called for messageParameter entry to be a map (parameterName -> parameterValue).

@@ -71,6 +76,30 @@ trait SQLQueryTestHelper {
if (isSorted(df.queryExecution.analyzed)) (schema, answer) else (schema, answer.sorted)
}

private def toJson(e: SparkThrowable): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we have messed up the order of the PRs here.
In my mental model I had us implement machineReadable Error message and then simply "flip the config" to produce the golden files in that format. What we have here now is a hand crafted JSON. Is that temporary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating the JSON file is a small piece of code. Don't see any problem to remove/move it in the near future. And introduce the config, and then check it on the already existing golden files.

What we have here now is a hand crafted JSON.

How else would you generate JSON from SparkThrowable, I wonder.

Is that temporary?

Not final.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question: shall we add a user-facing feature first that allows users to enable JSON style error message? Due to the low coverage of error classes today, I'd also suggest keeping the message unchanged for errors without error classes (no JSON).

Copy link
Contributor

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 19, 2022

Merging to master. Thank you, @srielau @cloud-fan @entong @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in db5aea6 Aug 19, 2022
try {
result
} catch {
case e: SparkThrowable with Throwable if e.getErrorClass != null =>
(emptySchema, Seq(e.getClass.getName, getMessage(e, format)))
Copy link
Contributor

@cloud-fan cloud-fan Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should normalize the error message as before, see L94

        // Do not output the logical plan tree which contains expression IDs.
        // Also implement a crude way of masking expression IDs in the error message
        // with a generic pattern "###".
        val msg = if (a.plan.nonEmpty) a.getSimpleMessage else a.getMessage
        (emptySchema, Seq(a.getClass.getName, msg.replaceAll("#\\d+", "#x")))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Do you have an example where we output a plan in an error with error class. As far as I know we are trying to avoid output any plans when we migrate on error classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sql fragment (query context) may contain expr IDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants