[SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes#33535
[SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes#33535beliefer wants to merge 10 commits intoapache:masterfrom
Conversation
|
Kubernetes integration test starting |
|
cc @karenfeng FYI |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Test build #141693 has finished for PR 33535 at commit
|
|
Kubernetes integration test status failure |
|
Test build #141695 has finished for PR 33535 at commit
|
karenfeng
left a comment
There was a problem hiding this comment.
Thanks for working on this! I left some suggestions to simplify the error classes. We should also make sure the SQLSTATEs match the ISO standard; if you're not sure what SQLSTATE fits, you can also leave it empty for now.
| "sqlState" : "10001" | ||
| }, | ||
| "INSERT_OVERWRITE_DIRECTORY_UNSUPPORTED" : { | ||
| "message" : [ "INSERT OVERWRITE DIRECTORY is not supported" ], |
There was a problem hiding this comment.
It'd be cleaner if we parametrized this error message to create a single error class representing simple operations that are unsupported in all cases, such as OPERATION_UNSUPPORTED: %s is not supported. Then we can collapse this with some of the error classes below.
| "message" : [ "There must be at least one WHEN clause in a MERGE statement" ], | ||
| "sqlState" : "10008" | ||
| }, | ||
| "NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION" : { |
There was a problem hiding this comment.
We can also parametrize these two error messages to simplify auditing.
| }, | ||
| "INVALID_INSERT_INTO_CONTEXT" : { | ||
| "message" : [ "Invalid InsertIntoContext" ], | ||
| "sqlState" : "10001" |
There was a problem hiding this comment.
The SQLSTATEs should reflect those set in the ANSI/ISO standard; see https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md.
| "message" : [ "INSERT OVERWRITE DIRECTORY is not supported" ], | ||
| "sqlState" : "10002" | ||
| }, | ||
| "COLUMNS_ALIASES_NOT_ALLOWED_IN_OPERATION" : { |
There was a problem hiding this comment.
Grammar nit: Columns aliases -> Column aliases.
| "message" : [ "Empty source for merge: you should specify a source table/subquery in merge." ], | ||
| "sqlState" : "10004" | ||
| }, | ||
| "UNRECOGNIZED_MATCHED_ACTION" : { |
There was a problem hiding this comment.
Can we simplify this by parametrizing? Then we only need one for MATCHED/NOT_MATCHED.
| "message" : [ "LATERAL cannot be used together with PIVOT in FROM clause" ], | ||
| "sqlState" : "10016" | ||
| }, | ||
| "LATERAL_JOIN_WITH_NATURAL_JOIN_UNSUPPORTED" : { |
There was a problem hiding this comment.
I think we can merge these as well.
| "sqlState" : "10020" | ||
| } | ||
| } No newline at end of file | ||
| } |
There was a problem hiding this comment.
This newline may be causing the test failures; we guarantee that the spacing is correct with a round trip read/write in the unit tests.
| "message" : [ "Writing job aborted" ], | ||
| "sqlState" : "40000" | ||
| }, | ||
| "INVALID_INSERT_INTO_CONTEXT" : { |
There was a problem hiding this comment.
These need to be in alphabetical order to pass the style tests.
Can you tell me how to know the SQLSTATEs? |
|
Kubernetes integration test starting |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test status failure |
|
Test build #141771 has finished for PR 33535 at commit
|
|
Hi @beliefer! The SQLSTATEs are in the ISO standard, but it's behind a paywall. You can look at free resources like the Wikipedia article, or the Oracle or DB2 manuals (although they often define their own classes and subclasses). I'll push a PR soon with a summarized version of the ISO standards. |
|
Test build #141772 has finished for PR 33535 at commit
|
| }, | ||
| "COLUMN_ALIASES_NOT_ALLOWED_IN_OPERATION" : { | ||
| "message" : [ "Columns aliases are not allowed in %s." ], | ||
| "sqlState" : "10000" |
There was a problem hiding this comment.
See #33560 for the standard SQLSTATEs. I think this would be 42000; we'll do another pass before the 3.3 release to clean up any incorrect ones.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #141803 has finished for PR 33535 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #141812 has finished for PR 33535 at commit
|
|
ping @karenfeng @HyukjinKwon Any other suggestion? |
karenfeng
left a comment
There was a problem hiding this comment.
Thanks for doing this! LGTM. @HyukjinKwon, can you also take a look?
|
ping @cloud-fan too. |
| "sqlState" : "42000" | ||
| }, | ||
| "COLUMN_ALIASES_NOT_ALLOWED_IN_OPERATION" : { | ||
| "message" : [ "Columns aliases are not allowed in %s." ], |
There was a problem hiding this comment.
columns aliases -> column aliases?
| "sqlState" : "42000" | ||
| }, | ||
| "INVALID_INSERT_INTO_CONTEXT" : { | ||
| "message" : [ "Invalid InsertIntoContext" ], |
There was a problem hiding this comment.
This error message looks a big vague. What exactly the error is?
There was a problem hiding this comment.
The message "Invalid InsertIntoContext" is thrown when withInsertInto does not match.
There was a problem hiding this comment.
This is a user-facing error message, do we expect end-users to understand this message?
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #142018 has finished for PR 33535 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #142078 has finished for PR 33535 at commit
|
| }, | ||
| "TRANSFORM_WITH_SERDE_UNSUPPORTED" : { | ||
| "message" : [ "TRANSFORM with serde is only supported in hive mode" ], | ||
| "sqlState" : "42000" |
There was a problem hiding this comment.
This should probably be 0A000
| "UNSUPPORTED_LATERAL_JOIN_TYPE" : { | ||
| "message" : [ "Unsupported LATERAL join type %s" ], | ||
| "sqlState" : "42000" | ||
| }, |
There was a problem hiding this comment.
This should probably be 0A000
| case hiveDir: InsertOverwriteHiveDirContext => | ||
| val (isLocal, storage, provider) = visitInsertOverwriteHiveDir(hiveDir) | ||
| InsertIntoDir(isLocal, storage, provider, query, overwrite = true) | ||
| case _ => |
| new ParseException(s"Unrecognized matched action: ${ctx.matchedAction().getText}", | ||
| ctx.matchedAction()) | ||
| new ParseException("UNRECOGNIZED_ACTION", | ||
| Array("matched", ctx.matchedAction().getText), ctx.matchedAction()) |
There was a problem hiding this comment.
Can we make matched all caps? This may make it clearer.
| new ParseException(s"Unrecognized not matched action: ${ctx.notMatchedAction().getText}", | ||
| ctx.notMatchedAction()) | ||
| new ParseException("UNRECOGNIZED_ACTION", | ||
| Array("not matched", ctx.notMatchedAction().getText), ctx.notMatchedAction()) |
There was a problem hiding this comment.
Can we make not matched all caps? This may make it clearer.
| def combinationQueryResultClausesUnsupportedError(ctx: QueryOrganizationContext): Throwable = { | ||
| new ParseException( | ||
| "Combination of ORDER BY/SORT BY/DISTRIBUTE BY/CLUSTER BY is not supported", ctx) | ||
| new ParseException("OPERATION_UNSUPPORTED", |
There was a problem hiding this comment.
We should probably have a separate error class for unsupported combinations.
|
|
||
| def lateralJoinWithNaturalJoinUnsupportedError(ctx: ParserRuleContext): Throwable = { | ||
| new ParseException("LATERAL join with NATURAL join is not supported", ctx) | ||
| new ParseException("OPERATION_UNSUPPORTED", Array("LATERAL join with NATURAL join"), ctx) |
There was a problem hiding this comment.
I think we can also make a separate error class here as well, maybe about how different join types are incompatible.
| private def intercept(sqlCommand: String, messages: String*): Unit = | ||
| interceptParseException(parsePlan)(sqlCommand, messages: _*) | ||
|
|
||
| private def interceptWithErrorClass(sqlCommand: String, messages: String*)( |
| "sqlState" : "42000" | ||
| }, | ||
| "INSERTED_VALUE_NUMBER_NOT_MATCH_FIELD_NUMBER" : { | ||
| "message" : [ "The number of inserted values cannot match the fields." ], |
| "message" : [ "Invalid pivot column '%s'. Pivot columns must be comparable." ], | ||
| "sqlState" : "42000" | ||
| }, | ||
| "INSERTED_VALUE_NUMBER_NOT_MATCH_FIELD_NUMBER" : { |
There was a problem hiding this comment.
The grammar is a bit strange. Maybe INSERTED_VALUE_AND_FIELD_NUMBER_MISMATCH?
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
@cloud-fan Could you help me to remove |
What changes were proposed in this pull request?
This PR refactor some exceptions in
QueryParsingErrorsto use error classes.There are currently ~100 exceptions in this file; so this PR only focuses on the first set of 20.
Why are the changes needed?
To improve auditing, reduce duplication, and improve quality of error messages thrown from Spark, we should group them in a single JSON file (as discussed in the mailing list and introduced in SPARK-34920).
Does this PR introduce any user-facing change?
'No'.
Just use new error classes.
How was this patch tested?
Jenkins test.