-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40098][SQL] Format error messages in the Thrift Server #37520
Conversation
@cloud-fan @gengliangwang @HyukjinKwon Could you review this PR, please. |
core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
Outdated
Show resolved
Hide resolved
I want to point out that I'm not married to the labels MINIMAL, STANDARD, and PRETTY.... If someone has more appropriate names .. |
"startIndex - the integer represents the starting index in the query text which throws the exception. The index starts from 0, but it can be -1 if the index is unknown. |
Also this is a map: |
@srielau If you mean #36651 but it increased the position in textual form of query context not in |
Yes I changed the position in the text summary. |
If we assume that the messages in the JSON format will be parsed machinery, it would be more natural to begin the indexes from 0, IMHO. |
After offline discussion with @srielau @cloud-fan , I am going to change |
core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
…onf.scala Co-authored-by: Serge Rielau <srielau@users.noreply.github.com>
g.writeStringField(name, value) | ||
} | ||
g.writeEndObject() | ||
g.writeArrayFieldStart("queryContext") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need the queryContext
field if there is no query context in the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove it.
val ERROR_MESSAGE_FORMAT = buildConf("spark.sql.error.messageFormat") | ||
.doc("When PRETTY, the error message consists of textual representation of error class, " + | ||
"message and query context. The MINIMAL and STANDARD formats are pretty JSON formats where " + | ||
"STANDARD includes an additional JSON field `message`.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also mention where this config applies: thriftserver and sql shell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this logic, we should mention where every SQL config is applied for. I could mention the Thrift server as one of applications and of the config, only as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
| "objectName" : "", | ||
| "startIndex" : 8, | ||
| "stopIndex" : 12, | ||
| "fragment" : "1 / " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a bug? it should be 1 / 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this bug thrift-server only? if yes we can probably create a ticket and ask someone who knows thriftserver better to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not thrift-server specific:
withSQLConf(SQLConf.ANSI_ENABLED.key -> "true") {
val e = intercept[SparkArithmeticException] {
sql("select 1 / 0").collect()
}
println("'" + e.getQueryContext()(0).fragment() + "'")
}
'1 / '
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the issue in:
override lazy val fragment: String = {
if (!isValid) {
""
} else {
sqlText.get.substring(originStartIndex.get, originStopIndex.get)
}
}
originStopIndex.get
points to the last char of the fragment but substring()
takes a sub-string up to endIndex - 1
.
Let me open a PR and fix this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the JIRA for the issue: https://issues.apache.org/jira/browse/SPARK-40136
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have opened the PR w/ a bug fix: #37566
What is the reason for the limit? Will it break notebook and query editor?
I think the point is that we want tools to start using it.
Sent from my iPhone
On Aug 17, 2022, at 9:34 PM, Wenchen Fan ***@***.***> wrote:
@cloud-fan commented on this pull request.
________________________________
In sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala<#37520 (comment)>:
@@ -3876,6 +3876,15 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val ERROR_MESSAGE_FORMAT = buildConf("spark.sql.error.messageFormat")
+ .doc("When PRETTY, the error message consists of textual representation of error class, " +
+ "message and query context. The MINIMAL and STANDARD formats are pretty JSON formats where " +
+ "STANDARD includes an additional JSON field `message`.")
let's also mention where this config applies: thriftserver and sql shell
—
Reply to this email directly, view it on GitHub<#37520 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA22CFGHCBQXN6WA4WU5XEDVZW4OLANCNFSM56RPFPYA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
override val stopIndex = -1 | ||
override val fragment = "1 / 0" | ||
} | ||
val e = new SparkArithmeticException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also test an exception with error class but without query context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a test w/ a sub-class but w/o query context. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, do you mean before this PR thriftserver print error class twice?
ImageFileFormatSuite failed again. I re-ran it locally. Seems like it is just a flaky test. Merging to master. |
### What changes were proposed in this pull request? 1. Respect the SQL config `spark.sql.error.messageFormat` introduced by #37520, and output error messages in the one of format: `PRETTY` (by default), `MINIMAL` or `STANDARD`. 2. In the `PRETTY` format, output the error message of the `AnalysisException` exception in the same way as for other exceptions, i. e. w/o `Error in query:`. 3. Take into account the `silent` CLI option, and output the call stack only when it is `false`. In the `MINIMAL` and `STANDARD` formats don't print the call stack independently from the `silent` mode. ### Why are the changes needed? To respect the SQL config `spark.sql.error.messageFormat` and to be consistent to error outputs of the Thrift Server. ### Does this PR introduce _any_ user-facing change? Yes. The PR changes the behavior for `AnalysisException`. In that case, `spark-sql` does not output the prefix: `Error in query:` by default (format is PRETTY). Before: ```sql spark-sql> DROP TABLE ajhkdha; Error in query: Table or view not found: ajhkdha; line 1 pos 11; ``` After: ```sql spark-sql> DROP TABLE ajhkdha; Table or view not found: ajhkdha; line 1 pos 11; ``` ### How was this patch tested? By running the modified test suites: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly org.apache.spark.sql.hive.thriftserver.CliSuite" ``` Closes #37590 from MaxGekk/spark-sql-error-json. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
@@ -36,11 +36,10 @@ object HiveThriftServerErrors { | |||
" new task for execution, please retry the operation", rejected) | |||
} | |||
|
|||
def runningQueryError(e: Throwable): Throwable = e match { | |||
case st: SparkThrowable => | |||
val errorClassPrefix = Option(st.getErrorClass).map(e => s"[$e] ").getOrElse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that thriftserver already print the error class. Can we keep it when the format is PRETTY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PRETTY message includes the error class, see the test
https://github.com/apache/spark/pull/37520/files#diff-44a4c34050c5ff3cc3488e7e800ad41d791936c97c2f9535371931e43868741bR165
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, do you mean before this PR thriftserver print error class twice?
Yep, it did.
case st: SparkThrowable => | ||
val errorClassPrefix = Option(st.getErrorClass).map(e => s"[$e] ").getOrElse("") | ||
new HiveSQLException( | ||
s"Error running query: ${errorClassPrefix}${st.toString}", st.getSqlState, st) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found a behavior change here: previously we call st.toString
, but now SparkThrowableHelper.getMessage
will call st.getMessage
at the end.
Can we restore the previous behavior?
…es in the Thrift Server ### What changes were proposed in this pull request? In the PR, I propose: 1. Output errors in the PRETTY format in the same way as before the PR #37520. 2. Do not output non-JSON elements in the MINIMAL and STANDARD formats. ### Why are the changes needed? 1. To not break existing apps that might expect text errors in particular format. 3. Do not output extra text when the Thrift Server outputs errors in an JSON format. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the modified tests: ``` $ build/sbt -Phive -Phive-thriftserver "test:testOnly *ThriftServerWithSparkContextInBinarySuite" ``` Closes #37773 from MaxGekk/thrift-serv-json-errors-followup. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
Please stick with 1. we should not change values based on formatting of the error.
…Sent from my iPhone
On Aug 16, 2022, at 1:38 PM, Maxim Gekk ***@***.***> wrote:
For Json format output, I am ok with 0-based or 1-based. Either way makes sense...
If we assume that the messages in the JSON format will be parsed machinery, it would be more natural to begin the indexes from 0, IMHO.
—
Reply to this email directly, view it on GitHub<#37520 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA22CFGO7PHMRABGOZVH6M3VZP34PANCNFSM56RPFPYA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
What changes were proposed in this pull request?
In the PR, I propose to introduce new SQL config
spark.sql.error.messageFormat
to control the format of Spark SQL error messages. The config can have at least the following string values:SparkThrowable
. Compact form w/ minimal set of fields. Aims to be parsed machinery.Added new method
getMessage()
toSparkThrowableHelper
which formatsSparkThrowable
+Throwable
according to the requested format (see the SQL config). When the format is MINIMAL or STANDARD, the JSON record includes the following fields:errorClass
- human-readable, unique, and consistent representation of the error category. It is a string.errorSubClass
- optional sub-class of the error class as a string. It may be missing in the result string.sqlState
- optional string w/ error identifier across SQL engines. It may be missing in the result string.message
- the message template fromerror-classes.json
. It is included to JSON when the format is STANDARD.messageParameters
- a map of message parameter names to its values.queryContext
- an array of query contexts of the error message which the following JSON fields. If it is empty, the array is not outputted:objectType
- the string represents the object type of the query which throws the exception. If the exception is directly from the main query, it should be an empty string.objectName
- the string represents the object name of the query which throws the exception. If the exception is directly from the main query, it should be an empty string.startIndex
- the integer represents the starting index in the query text which throws the exception. The index starts from 1. If the index is unknown, the field is not outputted.stopIndex
- the integer represents the stopping index in the query which throws the exception. The index starts from 1. If the index is unknown, the field is not outputted.fragment
- the string represents the corresponding fragment of the query which throws the exception. If the query fragment cannot be identified, the string is empty.For example:
MINIMAL
STANDARD
When
errorClass
isnull
, Spark outputs the error message in the legacy form where:errorClass
isLEGACY
messageParameters
is an array w/ only one elements as a string. The element contains textual error message.In the case if the specified format is PRETTY, Spark follows existing behavior and outputs error message as plain text, for example:
Also, this PR modifies the Thrift server to output the error messages according to current value of the SQL config
spark.sql.error.messageFormat
.Why are the changes needed?
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
By running new tests: