[SPARK-40018][SQL][TESTS] Output `SparkThrowable` to SQL golden files in JSON format #37452

MaxGekk · 2022-08-09T12:40:09Z

What changes were proposed in this pull request?

In the PR, I propose to catch exceptions of the type SparkThrowable in the test suite sub-classes of SQLQueryTestHelper:

SQLQueryTestSuite
TPCDSQueryTestSuite
ThriftServerQueryTestSuite

and output the content of SparkThrowable in the JSON format, see SQLQueryTestHelper.handleExceptions().

Also, the PR regenerates all SQL golden files.

When the error class is set (null) in SparkThrowable, we output error messages as is in the same way as before the PR.

Why are the changes needed?

To put only important information to SQL golden files
To avoid dependencies from the content of error messages

Does this PR introduce any user-facing change?

No.

How was this patch tested?

By running the affected test suite:

$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"

MaxGekk · 2022-08-09T15:18:51Z

@cloud-fan @gengliangwang @srielau @anchovYu Could you review this PR, please.

entong · 2022-08-09T22:37:27Z

344 out of the 470 errors changed do not have an error class associated which means losing all the information about the error. This could hide bugs related to unexpected changes on the errors.

srielau · 2022-08-09T22:57:42Z

sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out

@@ -128,7 +128,7 @@ select sort_array(array('b', 'd'), '1')
 struct<>
 -- !query output
 org.apache.spark.sql.AnalysisException
-cannot resolve 'sort_array(array('b', 'd'), '1')' due to data type mismatch: Sort order in second argument requires a boolean literal.; line 1 pos 7
+{"errorClass":null,"messageParameters":[],"queryContext":[]}


I'm with @entong, we can't do this.
If there is no defined error class we need have a default.
How about:
{"errorClass":"legacy","messageParameters":["message" -> "original message"],"queryContext":[]}

Hm, it's a bit odd to me that the user-facing error contains JSON exception. e.g., if you run it in spark-sql then users would face such JSON encoded string .. that I don't think it's good.

@HyukjinKwon This is not user-facing changes. I changed the way how golden files are generated in tests. Output of spark-sql is the same.

How about:
{"errorClass":"legacy","messageParameters":["message" -> "original message"],"queryContext":[]}

It makes sense. Let me do that. Thanks.

MaxGekk · 2022-08-10T09:30:25Z

@cloud-fan @HyukjinKwon @srielau @entong @gengliangwang @anchovYu Could you take a look at this PR one more time, please.

srielau · 2022-08-10T15:11:51Z

sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out

-== SQL(line 1, position 8) ==
-select element_at(array(1, 2, 3), 5)
-       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+{"errorClass":"INVALID_ARRAY_INDEX_IN_ELEMENT_AT","messageParameters":["5","3","\"spark.sql.ansi.enabled\""],"queryContext":[{"objectType":"","objectName":"","startIndex":7,"stopIndex":35,"fragment":"element_at(array(1, 2, 3), 5"}]}


I think the design of JSON called for messageParameter entry to be a map (parameterName -> parameterValue).

If we put parameterName to golden files, we prevent tech writers from modifying of parameters names in error-classes.json. It seems it is unnecessary restriction, isn't. In any case, the order of params is fixed/constant in the code. cc @cloud-fan WDYT?

srielau

I think the design of JSON called for messageParameter entry to be a map (parameterName -> parameterValue).

srielau · 2022-08-10T15:15:55Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala

@@ -71,6 +76,30 @@ trait SQLQueryTestHelper {
    if (isSorted(df.queryExecution.analyzed)) (schema, answer) else (schema, answer.sorted)
  }

+  private def toJson(e: SparkThrowable): String = {


I'm wondering whether we have messed up the order of the PRs here.
In my mental model I had us implement machineReadable Error message and then simply "flip the config" to produce the golden files in that format. What we have here now is a hand crafted JSON. Is that temporary?

Generating the JSON file is a small piece of code. Don't see any problem to remove/move it in the near future. And introduce the config, and then check it on the already existing golden files.

What we have here now is a hand crafted JSON.

How else would you generate JSON from SparkThrowable, I wonder.

Is that temporary?

Not final.

I have the same question: shall we add a user-facing feature first that allows users to enable JSON style error message? Due to the low coverage of error classes today, I'd also suggest keeping the message unchanged for errors without error classes (no JSON).

srielau

LGTM

MaxGekk · 2022-08-19T15:10:39Z

Merging to master. Thank you, @srielau @cloud-fan @entong @HyukjinKwon for review.

cloud-fan · 2022-08-30T05:50:56Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala

    try {
      result
    } catch {
+      case e: SparkThrowable with Throwable if e.getErrorClass != null =>
+        (emptySchema, Seq(e.getClass.getName, getMessage(e, format)))


We should normalize the error message as before, see L94

// Do not output the logical plan tree which contains expression IDs. // Also implement a crude way of masking expression IDs in the error message // with a generic pattern "###". val msg = if (a.plan.nonEmpty) a.getSimpleMessage else a.getMessage (emptySchema, Seq(a.getClass.getName, msg.replaceAll("#\\d+", "#x")))

@cloud-fan Do you have an example where we output a plan in an error with error class. As far as I know we are trying to avoid output any plans when we migrate on error classes.

The sql fragment (query context) may contain expr IDs.

Output SparkThrowable to SQL golden files in JSON format

aa6d1a2

github-actions bot added the SQL label Aug 9, 2022

MaxGekk changed the title ~~[WIP][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format~~ [WIP][SPARK-40018][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format Aug 9, 2022

MaxGekk marked this pull request as ready for review August 9, 2022 15:17

MaxGekk changed the title ~~[WIP][SPARK-40018][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format~~ [SPARK-40018][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format Aug 9, 2022

MaxGekk requested review from cloud-fan and gengliangwang August 9, 2022 15:17

srielau suggested changes Aug 9, 2022

View reviewed changes

Set the legacy error class by default

7936e2e

srielau reviewed Aug 10, 2022

View reviewed changes

srielau suggested changes Aug 10, 2022

View reviewed changes

srielau reviewed Aug 10, 2022

View reviewed changes

Change the type of messageParameters from array to map

f8db16f

srielau approved these changes Aug 16, 2022

View reviewed changes

MaxGekk added 2 commits August 19, 2022 13:05

Merge remote-tracking branch 'origin/master' into sql-golden-files-json

27bc5df

Output the legacy messages as is

6a6669b

cloud-fan approved these changes Aug 19, 2022

View reviewed changes

MaxGekk closed this in db5aea6 Aug 19, 2022

cloud-fan reviewed Aug 30, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-40018][SQL][TESTS] Output `SparkThrowable` to SQL golden files in JSON format #37452

[SPARK-40018][SQL][TESTS] Output `SparkThrowable` to SQL golden files in JSON format #37452

MaxGekk commented Aug 9, 2022 •

edited

MaxGekk commented Aug 9, 2022

entong commented Aug 9, 2022

srielau Aug 9, 2022 •

edited

HyukjinKwon Aug 10, 2022

MaxGekk Aug 10, 2022

MaxGekk Aug 10, 2022

MaxGekk commented Aug 10, 2022

srielau Aug 10, 2022

MaxGekk Aug 10, 2022

srielau left a comment

srielau Aug 10, 2022

MaxGekk Aug 10, 2022

cloud-fan Aug 10, 2022

srielau left a comment

MaxGekk commented Aug 19, 2022

cloud-fan Aug 30, 2022 •

edited

MaxGekk Aug 30, 2022

cloud-fan Aug 30, 2022

[SPARK-40018][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format #37452

[SPARK-40018][SQL][TESTS] Output SparkThrowable to SQL golden files in JSON format #37452

Conversation

MaxGekk commented Aug 9, 2022 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

MaxGekk commented Aug 9, 2022

entong commented Aug 9, 2022

srielau Aug 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaxGekk commented Aug 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srielau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srielau left a comment

Choose a reason for hiding this comment

MaxGekk commented Aug 19, 2022

cloud-fan Aug 30, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[SPARK-40018][SQL][TESTS] Output `SparkThrowable` to SQL golden files in JSON format #37452

[SPARK-40018][SQL][TESTS] Output `SparkThrowable` to SQL golden files in JSON format #37452

MaxGekk commented Aug 9, 2022 •

edited

srielau Aug 9, 2022 •

edited

cloud-fan Aug 30, 2022 •

edited