Skip to content

[SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes#33535

Closed
beliefer wants to merge 10 commits intoapache:masterfrom
beliefer:SPARK-36108
Closed

[SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes#33535
beliefer wants to merge 10 commits intoapache:masterfrom
beliefer:SPARK-36108

Conversation

@beliefer
Copy link
Contributor

What changes were proposed in this pull request?

This PR refactor some exceptions in QueryParsingErrors to use error classes.

There are currently ~100 exceptions in this file; so this PR only focuses on the first set of 20.

Why are the changes needed?

To improve auditing, reduce duplication, and improve quality of error messages thrown from Spark, we should group them in a single JSON file (as discussed in the mailing list and introduced in SPARK-34920).

Does this PR introduce any user-facing change?

'No'.
Just use new error classes.

How was this patch tested?

Jenkins test.

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46207/

@HyukjinKwon
Copy link
Member

cc @karenfeng FYI

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46207/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46209/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Test build #141693 has finished for PR 33535 at commit 32b28e0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46209/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Test build #141695 has finished for PR 33535 at commit 9aea3eb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@karenfeng karenfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I left some suggestions to simplify the error classes. We should also make sure the SQLSTATEs match the ISO standard; if you're not sure what SQLSTATE fits, you can also leave it empty for now.

"sqlState" : "10001"
},
"INSERT_OVERWRITE_DIRECTORY_UNSUPPORTED" : {
"message" : [ "INSERT OVERWRITE DIRECTORY is not supported" ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be cleaner if we parametrized this error message to create a single error class representing simple operations that are unsupported in all cases, such as OPERATION_UNSUPPORTED: %s is not supported. Then we can collapse this with some of the error classes below.

"message" : [ "There must be at least one WHEN clause in a MERGE statement" ],
"sqlState" : "10008"
},
"NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also parametrize these two error messages to simplify auditing.

},
"INVALID_INSERT_INTO_CONTEXT" : {
"message" : [ "Invalid InsertIntoContext" ],
"sqlState" : "10001"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQLSTATEs should reflect those set in the ANSI/ISO standard; see https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure.

"message" : [ "INSERT OVERWRITE DIRECTORY is not supported" ],
"sqlState" : "10002"
},
"COLUMNS_ALIASES_NOT_ALLOWED_IN_OPERATION" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar nit: Columns aliases -> Column aliases.

"message" : [ "Empty source for merge: you should specify a source table/subquery in merge." ],
"sqlState" : "10004"
},
"UNRECOGNIZED_MATCHED_ACTION" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this by parametrizing? Then we only need one for MATCHED/NOT_MATCHED.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

"message" : [ "LATERAL cannot be used together with PIVOT in FROM clause" ],
"sqlState" : "10016"
},
"LATERAL_JOIN_WITH_NATURAL_JOIN_UNSUPPORTED" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge these as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

"sqlState" : "10020"
}
} No newline at end of file
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This newline may be causing the test failures; we guarantee that the spacing is correct with a round trip read/write in the unit tests.

"message" : [ "Writing job aborted" ],
"sqlState" : "40000"
},
"INVALID_INSERT_INTO_CONTEXT" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be in alphabetical order to pass the style tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it.

@beliefer
Copy link
Contributor Author

Thanks for working on this! I left some suggestions to simplify the error classes. We should also make sure the SQLSTATEs match the ISO standard; if you're not sure what SQLSTATE fits, you can also leave it empty for now.

Can you tell me how to know the SQLSTATEs?

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46284/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46285/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46284/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Test build #141771 has finished for PR 33535 at commit 91d05b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@karenfeng
Copy link
Contributor

Hi @beliefer! The SQLSTATEs are in the ISO standard, but it's behind a paywall. You can look at free resources like the Wikipedia article, or the Oracle or DB2 manuals (although they often define their own classes and subclasses). I'll push a PR soon with a summarized version of the ISO standards.

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Test build #141772 has finished for PR 33535 at commit 907e96d.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

},
"COLUMN_ALIASES_NOT_ALLOWED_IN_OPERATION" : {
"message" : [ "Columns aliases are not allowed in %s." ],
"sqlState" : "10000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #33560 for the standard SQLSTATEs. I think this would be 42000; we'll do another pass before the 3.3 release to clean up any incorrect ones.

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46316/

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46316/

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Test build #141803 has finished for PR 33535 at commit b1f97b0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46325/

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46325/

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Test build #141812 has finished for PR 33535 at commit f3380a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

ping @karenfeng @HyukjinKwon Any other suggestion?

Copy link
Contributor

@karenfeng karenfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! LGTM. @HyukjinKwon, can you also take a look?

@beliefer
Copy link
Contributor Author

beliefer commented Aug 2, 2021

ping @cloud-fan too.

"sqlState" : "42000"
},
"COLUMN_ALIASES_NOT_ALLOWED_IN_OPERATION" : {
"message" : [ "Columns aliases are not allowed in %s." ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columns aliases -> column aliases?

"sqlState" : "42000"
},
"INVALID_INSERT_INTO_CONTEXT" : {
"message" : [ "Invalid InsertIntoContext" ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message looks a big vague. What exactly the error is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message "Invalid InsertIntoContext" is thrown when withInsertInto does not match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a user-facing error message, do we expect end-users to understand this message?

@SparkQA
Copy link

SparkQA commented Aug 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46529/

@SparkQA
Copy link

SparkQA commented Aug 4, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46529/

@SparkQA
Copy link

SparkQA commented Aug 4, 2021

Test build #142018 has finished for PR 33535 at commit 3971047.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46591/

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46591/

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Test build #142078 has finished for PR 33535 at commit 86ea00d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

},
"TRANSFORM_WITH_SERDE_UNSUPPORTED" : {
"message" : [ "TRANSFORM with serde is only supported in hive mode" ],
"sqlState" : "42000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be 0A000

"UNSUPPORTED_LATERAL_JOIN_TYPE" : {
"message" : [ "Unsupported LATERAL join type %s" ],
"sqlState" : "42000"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be 0A000

case hiveDir: InsertOverwriteHiveDirContext =>
val (isLocal, storage, provider) = visitInsertOverwriteHiveDir(hiveDir)
InsertIntoDir(isLocal, storage, provider, query, overwrite = true)
case _ =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

new ParseException(s"Unrecognized matched action: ${ctx.matchedAction().getText}",
ctx.matchedAction())
new ParseException("UNRECOGNIZED_ACTION",
Array("matched", ctx.matchedAction().getText), ctx.matchedAction())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make matched all caps? This may make it clearer.

new ParseException(s"Unrecognized not matched action: ${ctx.notMatchedAction().getText}",
ctx.notMatchedAction())
new ParseException("UNRECOGNIZED_ACTION",
Array("not matched", ctx.notMatchedAction().getText), ctx.notMatchedAction())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make not matched all caps? This may make it clearer.

def combinationQueryResultClausesUnsupportedError(ctx: QueryOrganizationContext): Throwable = {
new ParseException(
"Combination of ORDER BY/SORT BY/DISTRIBUTE BY/CLUSTER BY is not supported", ctx)
new ParseException("OPERATION_UNSUPPORTED",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have a separate error class for unsupported combinations.


def lateralJoinWithNaturalJoinUnsupportedError(ctx: ParserRuleContext): Throwable = {
new ParseException("LATERAL join with NATURAL join is not supported", ctx)
new ParseException("OPERATION_UNSUPPORTED", Array("LATERAL join with NATURAL join"), ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also make a separate error class here as well, maybe about how different join types are incompatible.

private def intercept(sqlCommand: String, messages: String*): Unit =
interceptParseException(parsePlan)(sqlCommand, messages: _*)

private def interceptWithErrorClass(sqlCommand: String, messages: String*)(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

"sqlState" : "42000"
},
"INSERTED_VALUE_NUMBER_NOT_MATCH_FIELD_NUMBER" : {
"message" : [ "The number of inserted values cannot match the fields." ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: cannot -> does not

"message" : [ "Invalid pivot column '%s'. Pivot columns must be comparable." ],
"sqlState" : "42000"
},
"INSERTED_VALUE_NUMBER_NOT_MATCH_FIELD_NUMBER" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammar is a bit strange. Maybe INSERTED_VALUE_AND_FIELD_NUMBER_MISMATCH?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 26, 2021
@beliefer
Copy link
Contributor Author

@cloud-fan Could you help me to remove Stale label?

@github-actions github-actions bot closed this Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants