Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get length of compressed stream so far without closing stream? #92

Open
robertfeldt opened this issue Dec 4, 2019 · 2 comments
Open

Comments

@robertfeldt
Copy link

I would like to get the length of the compressed stream up to now but without closing the stream or affecting continued compression. I understand most Codecs might not support this, given their internal block lengths etc, but maybe there are ways to get related/close to this behavior?

The use case is something like this:

  • We have a very long string/stream which has been compressed already, C(s_long)
  • We now have a set of N shorter strings S_shorts = [s1, ..., sN] and we want to calculate map(length, [C(s_long * s1), ..., C(s_long * sN)]) but without having to redo the whole C(s_long) compression for each of the shorter strings si (since calculating C(s_long) might be costly in time).
  • Note that we only need the lengths of all the C(s_long * si), not their actual bytes.

Any ideas how this can be done as fast as possible? :)

Currently I basically do a Huffman coding/tree or dictionary-based compression by hand and can thus save the intermediate tree/dictionary between each consideration of the short strings, but it would be nice if there is a way to use more advanced compressors like the CodecX ones in the TranscodingStreams framework.

@nhz2
Copy link
Member

nhz2 commented Mar 17, 2024

This seems similar to https://stackoverflow.com/questions/11662745/how-can-one-copy-the-internal-state-of-zlib-compressor-object-in-python

I think a potential solution would be to add deepcopy support for Codecs.

@robertfeldt
Copy link
Author

Yes, deepcopy would really solve this. Not sure it's very performant (which is crucial in my case) but worth to try if there is a general use case for supporting deepcopy (at least for some codecs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants