Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45710][SQL] Assign names to error _LEGACY_ERROR_TEMP_21[59,60,61,62] #43567

Closed
wants to merge 4 commits into from

Conversation

dengziming
Copy link
Member

@dengziming dengziming commented Oct 28, 2023

What changes were proposed in this pull request?

This PR are removing _LEGACY_ERROR_TEMP_21[59,60,61,62] and TOO_MANY_ARRAY_ELEMENTS:

  1. _LEGACY_ERROR_TEMP_2159 is used in concat/array_insert;
  2. _LEGACY_ERROR_TEMP_2160 is only used in flatten;
  3. _LEGACY_ERROR_TEMP_2161 is used in array_repeat/array_insert/array_distinct/array_union/array_intersect/array_remove;
  4. _LEGACY_ERROR_TEMP_2162 is used in array_union/array_distinct;
  5. There is another similar error class TOO_MANY_ARRAY_ELEMENTS which are used in UnsafeArrayWriter.java.

I removed these 5 similar error classes and create a new error class COLLECTION_SIZE_LIMIT_EXCEEDED with 3 sub-classes:

  1. PARAMETER is used when the parameter exceed size limit, such as array_repeat with count too large;
  2. FUNCTION is used when trying to create an array exceeding size limit in a function, for example, flatten 2 arrays to a larger array;
  3. INITIALIZE is used in UnsafeArrayWriter.java when trying to initialize an array exceeding size limit.

Why are the changes needed?

To assign proper name as a part of activity in SPARK-37935.

Does this PR introduce any user-facing change?

Yes, the error message will include the error class name.

How was this patch tested?

  1. COLLECTION_SIZE_LIMIT_EXCEEDED.PARAMETER can be tested from use code;
  2. COLLECTION_SIZE_LIMIT_EXCEEDED.FUNCTION is tested using a ColumnarArray in concat/flatten, but can't be tested in array_insert/array_distinct/array_union/array_intersect/array_remove since we need to deduplicate the data and create an array which will cause OOM.
  3. INITIALIZE is already tested in a existing case.

Was this patch authored or co-authored using generative AI tooling?

No.

@dengziming
Copy link
Member Author

The use of the 4 similar errors is a bit confusing, we have separate errors for flatten/concat/union, but we are using CREATE_XXX for array_repeat/array_insert/array_distinct/...
I firstly move CREATE_XXX to a subclass of INVALID_PARAMETER_VALUE, but moved it out since it's used in other places.
Do you think we should merge these 4 errors into one CREATE_ARRAYS_WITH_ELEMENTS_EXCEED_LIMIT. @MaxGekk

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dengziming There is another similar error class TOO_MANY_ARRAY_ELEMENTS. I would create new error class like COLLECTION_SIZE_LIMIT_EXCEEDED with sub-classes.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except of some comments. @dengziming Please, update PR's description according to your recent changes.

},
"INITIALIZE" : {
"message" : [
"cannot initialize array with specified parameters."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"cannot initialize array with specified parameters."
"cannot initialize an array with specified parameters."

"subClass" : {
"FUNCTION" : {
"message" : [
"unsuccessful try to create arrays in function <functionName>."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"unsuccessful try to create arrays in function <functionName>."
"unsuccessful try to create arrays in the function <functionName>."

},
"PARAMETER" : {
"message" : [
"the value of parameter(s) <parameter> in <functionName> is invalid."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"the value of parameter(s) <parameter> in <functionName> is invalid."
"the value of parameter(s) <parameter> in the function <functionName> is invalid."

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for CI.

new SparkIllegalArgumentException(
errorClass = "TOO_MANY_ARRAY_ELEMENTS",
errorClass = "COLLECTION_SIZE_LIMIT_EXCEEDED",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a sub-class.

@MaxGekk
Copy link
Member

MaxGekk commented Nov 5, 2023

The failed python GA is not related to the changes, and it passed a couple commits above.
+1, LGTM. Merging to master.
Thank you, @dengziming.

@MaxGekk MaxGekk closed this in 9cbc2d1 Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants