-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got orc::ParseError "bad read in nextBuffer" when using SearchArgument with nested struct #1296
Comments
Thank you for reporting, @jnwan . |
I have reproduced the issue. The root cause is that the reader tried to read col3 (w/ columnId = 4) which does not have any stream (both PRESENT and DATA streams all have ZERO length as listed below). The parent of col3 is col2 (w/ columnId = 3) whose values are all null, which means the reader should stop reading at col2 w/o touching col3. Rows: 50000
Compression: ZLIB
Compression size: 65536
Calendar: Julian/Gregorian
Type: struct<col0:struct<col1:int>,col2:struct<col3:int>>
Stripe Statistics:
Stripe 1:
Column 0: count: 50000 hasNull: false
Column 1: count: 50000 hasNull: false
Column 2: count: 50000 hasNull: false min: 0 max: 149992 sum: 3752012883
Column 3: count: 0 hasNull: true
Column 4: count: 0 hasNull: true sum: 0
File Statistics:
Column 0: count: 50000 hasNull: false
Column 1: count: 50000 hasNull: false
Column 2: count: 50000 hasNull: false min: 0 max: 149992 sum: 3752012883
Column 3: count: 0 hasNull: true
Column 4: count: 0 hasNull: true sum: 0
Stripes:
Stripe: offset: 3 data: 129019 rows: 50000 tail: 68 index: 216
Stream: column 0 section ROW_INDEX start: 3 length 17
Stream: column 1 section ROW_INDEX start: 20 length 17
Stream: column 2 section ROW_INDEX start: 37 length 122
Stream: column 3 section ROW_INDEX start: 159 length 35
Stream: column 4 section ROW_INDEX start: 194 length 25
Stream: column 2 section DATA start: 219 length 129007
Stream: column 3 section PRESENT start: 129226 length 12
Stream: column 4 section PRESENT start: 129238 length 0
Stream: column 4 section DATA start: 129238 length 0
Encoding column 0: DIRECT
Encoding column 1: DIRECT
Encoding column 2: DIRECT_V2
Encoding column 3: DIRECT
Encoding column 4: DIRECT_V2 |
I will file a JIRA and fix it shortly. |
ColumnReader needs to fix this bug by processing for cases where data stream does not exist. |
Thank you so much, @wgtmac and @coderex2522 ! |
I create a new issue in Jira. |
@wgtmac has explained the root cause well. Just want to reemphasize that same issue happens on other complicated columns, like map, empty map will also get "bad read in nextBuffer" error. |
This issue has been fixed into the main branch. Please have a try and let us know if there is any issue. Thanks @jnwan ! |
Verified the issue got fixed! Thank you! |
Below is the code to reproduce the issue, it works when removing the empty struct column "col2" or writing small number of rows or changing the value to "rand() % 100"
Am I doing anything wrong?
on version 1.7.2
Code
Stack trace:
The text was updated successfully, but these errors were encountered: