New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make current position available in FileWriter
.
#1691
Comments
Hello, since I opened this feature request While adapting to the breaking changes, I got the feeling that Cheers, Markus |
Thank you for your feedback, and glad to hear the API is moving in a direction that you like 😄 I think Edit: I think you're right and |
Hello @tustvold , thanks for your help here. Now I am a little bit confused. In order to implement pacman82/odbc2parquet#190 (tl;dr I want to stop writing row groups as soon as the file size surpasses a user defined threshold, and start writing the next row group into a new file) should I add the |
Sorry for confusing things, compressed size is the correct thing to use. I think the crate might be writing the wrong thing for total_byte_size but that's a separate issue I'll file if/when I confirm it. |
My test cases and users are happy both. See: pacman82/odbc2parquet#190. So |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like a way to track the file size of a parquet file I am writing, so I can split my dataset into chunks of roughly the same size. For more context, please see this issue in the downstream
odbc2parquet
crate: pacman82/odbc2parquet#190Describe the solution you'd like
Make the current stream position (i.e bytes currently written into the inner
io::Write
) available in the implementation ofSerializedFileWriter
or even through theFileWriter
trait.Describe alternatives you've considered
As a workaround I could create a wrapper of
File
which shares anRc<usize>
counter with the application logic.The text was updated successfully, but these errors were encountered: