-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40403][SQL] Calculate unsafe array size using longs to avoid negative size in error message #37852
[SPARK-40403][SQL] Calculate unsafe array size using longs to avoid negative size in error message #37852
Conversation
long totalInitialSize = headerInBytes + fixedPartInBytesLong; | ||
|
||
if (totalInitialSize > Integer.MAX_VALUE) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to trigger the error from user space (from SQL for instance). If so, please, introduce an error class and place it to error-classes.json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, let's make the error more user-facing.
error class: TOO_MANY_ARRAY_ELEMENTS
message: "Cannot initialize array with %numElements elements"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a point of clarification. The issue isn't too many elements per se. It's too many elements for the given element size.
If, in the above example, I had cast val
as int, the collect_list
would have succeeded (i.e., 268271216 int elements is just fine, but the same number of bigint elements is not).
How about this for a message?:
"Cannot initialize array with %numElements elements of size %size"
Caveat: my suggested message will be a little confusing for arrays of non-primitives, where the element size is always 8 (for the size/offset).
cc @cloud-fan FYI |
ca2c406
to
22f70cd
Compare
Merged to master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
What changes were proposed in this pull request?
Change
UnsafeArrayWriter#initialize
to use longs rather than ints when calculating the initial size of the array.Why are the changes needed?
When calculating the initial size in bytes needed for the array,
UnsafeArrayWriter#initialize
uses an int expression, which can overflow. The initialize method then passes the negative size toBufferHolder#grow
, which complains about the negative size.Example (the following will run just fine on a 16GB laptop, despite the large driver size setting):
After a few minutes,
BufferHolder#grow
will throw the following exception:This query was going to fail anyway, but the message makes it looks like a bug in Spark rather than a user problem.
UnsafeArrayWriter#initialize
should calculate using a long expression and fail if the size exceedsInteger.MAX_VALUE
, showing the actual initial size in the error message.Note: This issue is not related to SPARK-39608, as far as I can tell, despite having the same symptom
Does this PR introduce any user-facing change?
Other than a better error message, no.
How was this patch tested?
New unit test.