New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException #40341
Conversation
@@ -204,7 +204,12 @@ public void initBatch( | |||
* by copying from ORC VectorizedRowBatch columns to Spark ColumnarBatch columns. | |||
*/ | |||
private boolean nextBatch() throws IOException { | |||
recordReader.nextBatch(wrap.batch()); | |||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will Parquet have the same issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughtful, i will make a test.
recordReader.nextBatch(wrap.batch()); | ||
try { | ||
recordReader.nextBatch(wrap.batch()); | ||
} catch (NegativeArraySizeException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to build unit test and catch the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your ideas, they sound nice, i will make it done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also encountered the same stack issue. How much adjustment would be appropriate. @chong0929
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
In orc batch read, the byte arrays is used to store the data of the read columns. When the total data of this batch exceeds Int.MaxValue can be caused NegativeArraySizeException, catch and throw the same exeception with a friendly msg.
Why are the changes needed?
Friendly msg where read orc file get exception about java.lang.NegativeArraySizeException.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.