Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZlibEncoder doesn't report correct compressed size #246

Open
moulins opened this issue Jul 29, 2020 · 7 comments
Open

ZlibEncoder doesn't report correct compressed size #246

moulins opened this issue Jul 29, 2020 · 7 comments

Comments

@moulins
Copy link

moulins commented Jul 29, 2020

The following snippet:

fn main() {
    use std::io::Write;

    let buf = Vec::new();
    let mut file = flate2::write::ZlibEncoder::new(buf, flate2::Compression::best());
    file.write_all(b"Hello world!").unwrap();
    file.flush().unwrap();
    println!("reported size: {}", file.total_out());
    println!("actual size: {}", file.finish().unwrap().len());
}

Prints:

reported size: 20
actual size: 26

The 6 missing bytes correspond to the ZLIB header; ZlibEncoder doesn't take into account its size and only returns the size of the wrapped deflate stream.

If this is the intended behavior, I think it should be clarified in the total_out() documentation, because this is very unintuitive and unexpected.

@alexcrichton
Copy link
Member

Thanks for the report! Is this an issue with all the backends? Or just one? If so it may be a bug for that specific backend.

@moulins
Copy link
Author

moulins commented Jul 30, 2020

The bug is reproducible on all 4 backends.
For completeness, I also tested the read and bufread encoders, and their total_in method correctly returns the size including the header.

@alexcrichton
Copy link
Member

Hm ok if this reproduces everywhere it may be best to just update the documentation to indicate it doesn't include the 6-byte header.

@moulins
Copy link
Author

moulins commented Jul 31, 2020

This means there's an asymmetry between write::ZlibEncoder and read::ZlibEncoder though: the write encoder doesn't include the header, but the read encoder does!

Personally, I would prefer this to be fixed, but I don't really know what this entails for the implementation, and as long as the current behavior is correctly documented I can live with it.

@alexcrichton
Copy link
Member

That's true yeah, if all the backends behave consistently we can work around that in each implementation. Seems reasonable to fix then!

@moulins
Copy link
Author

moulins commented Aug 6, 2020

Note: I just found out that my explanation was slightly wrong, as the ZLIB header is only 2 bytes, not 6.

The 6 extra bytes actually correspond to:

  • The 2 bytes ZLIB header at the start of the stream
  • The 4 bytes CRC checksum at the end of the stream

@oyvindln
Copy link
Contributor

oyvindln commented Oct 4, 2020

There zlib wrapper can have further 4 bytes of checksum following the header at the start of the stream if FDICT is used (which is only available with the zlib back-end currently.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants