Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(parquet): Disabling global statistics but enabling for particular column breaks reading #4587

Closed
ozgrakkurt opened this issue Jul 29, 2023 · 2 comments · Fixed by #4589
Closed
Labels
bug parquet Changes to the parquet crate

Comments

@ozgrakkurt
Copy link

If I write files with:

.set_statistics_enabled(EnabledStatistics::None)
.set_column_statistics_enabled("block_number".into(), EnabledStatistics::Page)

When I query it with datafusion or just parquet::ParquetRecordBatchReaderBuilder, it errors with: "missing offset index"

Seems like it is skipping writing offset indices if page statistics are globally disabled?

I would expect, if it doesn't write offset indices then it shouldn't try to filter pages by statistics, also it should be documented that set_column_statistics_enabled doesn't override global settings in this way.

@tustvold
Copy link
Contributor

This appears to have been fixed by #4567, #4589 adds a test for this

tustvold added a commit that referenced this issue Aug 1, 2023
* Test disabling page index statistics (#4587)

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'parquet'} from #4589

@tustvold tustvold added the parquet Changes to the parquet crate label Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants