Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Seek requirement for ParquetWriter #937

Closed
vlushn opened this issue Nov 9, 2021 · 4 comments
Closed

Drop Seek requirement for ParquetWriter #937

vlushn opened this issue Nov 9, 2021 · 4 comments
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@vlushn
Copy link

vlushn commented Nov 9, 2021

The Seek trait bound on arrow::file::writer::ParquetWriter prevents streaming parquet directly over a messaging bus. In the implementation, it seems to be used mainly for writing metadata about the size of the data body and the footer: https://github.com/apache/arrow-rs/blob/master/parquet/src/file/writer.rs#L196

It looks like the trait bound can be removed if the information on the number of bytes written is kept track of.

@vlushn vlushn added the enhancement Any new improvement worthy of a entry in the changelog label Nov 9, 2021
@jorgecarleitao
Copy link
Member

Fwiw I was able to achieve this in parquet2, but required re-writing the thirft library to offer us that number of written bytes. This is not doable without a backward incompatible change in thirft library.

@vlushn
Copy link
Author

vlushn commented Nov 9, 2021

Fwiw I was able to achieve this in parquet2, but required re-writing the thirft library to offer us that number of written bytes. This is not doable without a backward incompatible change in thirft library.

Oh, good to know. Do you think it might be possible to wrap the writer to collect the cumulative count of written bytes - it could be enough...

@jorgecarleitao
Copy link
Member

That is an excelent idea. Kudos for that!

@tustvold
Copy link
Contributor

Closed by #1719

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants