Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when first data page is skipped using ColumnChunkData::Sparse #2543

Closed
thinkharderdev opened this issue Aug 21, 2022 · 0 comments · Fixed by #2552
Closed

Panic when first data page is skipped using ColumnChunkData::Sparse #2543

thinkharderdev opened this issue Aug 21, 2022 · 0 comments · Fixed by #2552
Labels
bug parquet Changes to the parquet crate

Comments

@thinkharderdev
Copy link
Contributor

Describe the bug

If you have a row selection which skips the first data page, SerializedPageReader will error incorrectly.

To Reproduce

Create a ParquetRecordBatchStream using a RowFilter that skips the first page in any column.

Expected behavior

This should work.

Additional context

In GenericColumnReader::has_next, when num_buffered_value is 0, we call GenericColumnReader::read_new_page. In GenericRecordReader::skip_records we have to check if we are at the end of the column, this will always read the first data page. In cases where the selection skips that page, it will not be fetched so we get an error.

@tustvold tustvold added the parquet Changes to the parquet crate label Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants