Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add public API for decoding parquet footer #1804

Merged
merged 2 commits into from
Jun 7, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jun 6, 2022

Which issue does this PR close?

Part of #1605

Part of #1163

Rationale for this change

This makes it possible for AsyncChunkReader implementations to implement AsyncChunkReader::get_metadata without having to duplicate functionality from the parquet crate. It also feeds into the story of separating the IO to fetch bytes, from the logic to decode those bytes.

What changes are included in this PR?

Tweaks the methods used to read parquet to allow them to be exposed publicly

Are there any user-facing changes?

The sync reader will no longer pre-fetch bytes when reading the footer. I'm inclined to think this will be complete wash in terms of performance, and makes the implementation simpler.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 6, 2022
@tustvold tustvold marked this pull request as ready for review June 7, 2022 11:03
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. DRY for the win


self.seek(SeekFrom::End(-8 - metadata_len)).await?;
let metadata_len = decode_footer(&buf)?;
self.seek(SeekFrom::End(-8 - metadata_len as i64)).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can make this magic 8 a const somewhere to help readablility -- or maybe it is already FOOTER_SIZE

Copy link
Contributor Author

@tustvold tustvold Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually makes the code harder to read, as FOOTER_SIZE is usize but then SeekFrom wants i64... The compiler checks the array size, and I don't suppose it is likely to change, so I think we should be fine.

Edit: Introduced FOOTER_SIZE_I64 to workaround

@tustvold tustvold merged commit a439f7f into apache:master Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants