Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36291][SQL] Refactor second set of 20 in QueryExecutionErrors to use error classes #33839

Conversation

dgd-contributor
Copy link

What changes were proposed in this pull request?

Refactor some exceptions in QueryExecutionErrors to use error classes.

inputTypeUnsupportedError
invalidFractionOfSecondError
overflowInSumOfDecimalError
overflowInIntegralDivideError
mapSizeExceedArraySizeWhenZipMapError
copyNullFieldNotAllowedError
literalTypeUnsupportedError
noDefaultForDataTypeError
doGenCodeOfAliasShouldNotBeCalledError
orderedOperationUnsupportedByDataTypeError
regexGroupIndexLessThanZeroError
regexGroupIndexExceedGroupCountError
invalidUrlError
dataTypeOperationUnsupportedError
mergeUnsupportedByWindowFunctionError
dataTypeUnexpectedError
typeUnsupportedError
negativeValueUnexpectedError
addNewFunctionMismatchedWithFunctionError
cannotGenerateCodeForUncomparableTypeError

Why are the changes needed?

There are currently ~350 exceptions in this file; so this PR only focuses on the second set of 20.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existed UT Testcase

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon HyukjinKwon changed the title [SPARK-36291][SQL] refactor second set of 20 in QueryExecutionErrors to use error classes [SPARK-36291][SQL] Refactor second set of 20 in QueryExecutionErrors to use error classes Aug 26, 2021
@HyukjinKwon
Copy link
Member

cc @karenfeng FYI

@dgd-contributor dgd-contributor changed the title [SPARK-36291][SQL] Refactor second set of 20 in QueryExecutionErrors to use error classes [WIP][SPARK-36291][SQL] Refactor second set of 20 in QueryExecutionErrors to use error classes Aug 26, 2021
"UNABLE_TO_ACQUIRE_MEMORY" : {
"message" : [ "Unable to acquire %s bytes of memory, got %s" ]
},
"UNEXPECTED_DATATYPE" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try to merge them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge them like

"UNEXPECTED" : {
    "message" : [ "Unexpected %s" ],
    "sqlState" : "22000"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to just keep one or the other; maybe UNEXPECTED_TYPE?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think 42000 may be a better fit.

@Peng-Lei
Copy link
Contributor

There is no suitable SQLSTATE for all error class added newly ?

@dgd-contributor
Copy link
Author

dgd-contributor commented Aug 30, 2021

There is no suitable SQLSTATE for all error class added newly ?

No I haven't working on adding sqlstate, I will add them soon

Copy link
Contributor

@karenfeng karenfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @dgd-contributor! I left some comments.

core/src/main/resources/error/error-classes.json Outdated Show resolved Hide resolved
core/src/main/resources/error/error-classes.json Outdated Show resolved Hide resolved
core/src/main/resources/error/error-classes.json Outdated Show resolved Hide resolved
"UNABLE_TO_ACQUIRE_MEMORY" : {
"message" : [ "Unable to acquire %s bytes of memory, got %s" ]
},
"UNEXPECTED_DATATYPE" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to just keep one or the other; maybe UNEXPECTED_TYPE?

core/src/main/resources/error/error-classes.json Outdated Show resolved Hide resolved
core/src/main/resources/error/error-classes.json Outdated Show resolved Hide resolved
core/src/main/resources/error/error-classes.json Outdated Show resolved Hide resolved
"UNABLE_TO_ACQUIRE_MEMORY" : {
"message" : [ "Unable to acquire %s bytes of memory, got %s" ]
},
"UNEXPECTED_DATATYPE" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think 42000 may be a better fit.

core/src/main/resources/error/error-classes.json Outdated Show resolved Hide resolved
core/src/main/scala/org/apache/spark/SparkException.scala Outdated Show resolved Hide resolved
Copy link
Contributor

@karenfeng karenfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @dgd-contributor! I left some comments, please tag me when you address them.

@@ -14,13 +21,20 @@
"message" : [ "Found duplicate keys '%s'" ],
"sqlState" : "23000"
},
"EXCEED_ARRAY_SIZE_WHEN_ZIP_MAP" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make much sense grammatically. How about ZIPPED_MAPS_EXCEED_SIZE_LIMIT?

@@ -46,6 +68,10 @@
"message" : [ "Index %s must be between 0 and the length of the ArrayData." ],
"sqlState" : "22023"
},
"INTEGRAL_DIVIDE_OVERFLOW" : {
"message" : [ "Overflow in integral divide." ],
"sqlState" : "22015"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 22003 is a better fit here.

},
"GROUP_INDEX_LESS_THAN_ZERO" : {
"message" : [ "The specified group index cannot be less than zero" ],
"sqlState" : "07009"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 42000 is a better fit here as well.

"FAILED_RENAME_PATH" : {
"message" : [ "Failed to rename %s to %s as destination already exists" ],
"sqlState" : "22023"
},
"FAILED_SET_ORIGINAL_PERMISSION_BACK" : {
"message" : [ "Failed to set original permission %s back to the created path: %s. Exception: %s" ]
},
"FOUND_NEGATIVE_VALUES" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think UNEXPECTED_NEGATIVE_VALUES is a better descriptor.

"CONCURRENT_QUERY" : {
"message" : [ "Another instance of this query was just started by a concurrent session." ]
},
"COPY_NULL_FIELD" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think COPY_NULL_FIELD_NOT_ALLOWED is a better descriptor.

@@ -77,6 +111,14 @@
"message" : [ "PARTITION clause cannot contain a non-partition column name: %s" ],
"sqlState" : "42000"
},
"NOT_MATCH_FUNCTION_NAME" : {
"message" : [ "%s is not matched at addNewFunction" ],
"sqlState" : "42000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal error, let's remove the SQLSTATE.

@@ -77,6 +111,14 @@
"message" : [ "PARTITION clause cannot contain a non-partition column name: %s" ],
"sqlState" : "42000"
},
"NOT_MATCH_FUNCTION_NAME" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make much sense grammatically. How about ADD_NEW_FUNCTION_NAME_MISMATCHED?

"UNSUPPORTED_LITERAL_TYPE" : {
"message" : [ "Unsupported literal type %s %s" ],
"sqlState" : "0A000"
},
"UNSUPPORTED_ORDERED_OPERATIONS" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more specific, like ORDERED_OPERATIONS_UNSUPPORTED_BY_DATA_TYPE

"UNRECOGNIZED_SQL_TYPE" : {
"message" : [ "Unrecognized SQL type %s" ],
"sqlState" : "42000"
},
"UNSUPPORTED_DATATYPE" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: DATATYPE -> DATA_TYPE.

@dgd-contributor
Copy link
Author

@karenfeng Can you re-review, I've resolve conflict and what you suggested

@dgd-contributor dgd-contributor changed the title [WIP][SPARK-36291][SQL] Refactor second set of 20 in QueryExecutionErrors to use error classes [SPARK-36291][SQL] Refactor second set of 20 in QueryExecutionErrors to use error classes Sep 21, 2021
Copy link
Contributor

@karenfeng karenfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall - I only have one small typo nit. @HyukjinKwon, can you take a pass once that's addressed?

@@ -60,6 +70,14 @@
"GROUPING_SIZE_LIMIT_EXCEEDED" : {
"message" : [ "Grouping sets size cannot be greater than %s" ]
},
"GROUP_INDEX_EXCEED_GROUP_COUNT" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: EXCEED -> EXCEEDS

}

def invalidFractionOfSecondError(): DateTimeException = {
new SparkDateTimeException(errorClass = "INVALID_FRACTION_OF_SECOND", Array.empty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, just a clean up

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Sep 23, 2021

@dgd-contributor, BTW, I think we shouldn't use one account shared to many people because:

  1. it is ambiguous who reviewers communicate w/, e.g.) reviewers asked one thing to fix in a PR but other PRs still have the same mistakes
  2. it is unclear who's the main contributor in the change

I sent an email to the address in your GitHub profile but I haven't got a reply yet. Would you mind contacting me via gurwls223@apache.org?

@HyukjinKwon
Copy link
Member

@dgd-contributor, please contact me or private@spark.apache.org. As I shared in the email, the submissions from the specific shared account will not be accepted for now.

@dgd-contributor
Copy link
Author

@dgd-contributor, please contact me or private@spark.apache.org. As I shared in the email, the submissions from the specific shared account will not be accepted for now.

Could you please check mail from ncthanh.work@gmail.com? Sorry for the inconvenience because the email from your side has been blocked by our mail system.

@github-actions
Copy link

github-actions bot commented Jan 6, 2022

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 6, 2022
@github-actions github-actions bot closed this Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants