Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement read::FrameEncoder and rename existing APIs to match flate2 #18

Closed
wants to merge 4 commits into from

Conversation

emk
Copy link
Contributor

@emk emk commented Feb 13, 2018

Implements #17.

This patch reorganizes the API to work more like flate2 and adds a new snap::read::FrameEncoder which can be used to compress a Read object while reading from it. The new interfaces are:

  • snap::Reader becomes snap::read::FrameDecoder.
  • snap::Writer becomes snap::write::FrameEncoder.
  • The type snap::read::FrameEncoder is new.
  • We do not implement snap::write::FrameDecoder yet, but it could be added.

As for tests:

  • All the old test cases work.
  • The szip crate tests are now run as part of CI.
  • All the round-trip test cases now include tests to make sure that snap::write::FrameEncoder and snap::read::FrameEncoder produce matching compressed streams.
  • We add a test case that verifies that the output of snap::read::FrameEncoder is the same with very small and very large buffers (because it uses a faster code path for larger buffers).

I was able to refactor out the core of snap::write::FrameEncoder and re-use it, but the the two snap::read implementations do not yet share code, because it didn't seem to help the clarity very much.

We include `szip` in the standard build, but exclude `snappy-cpp` for
now, because it requires C++ dependencies and isn't normally used. We
also add it to the standard builds on Travis and AppVeyor.
These renamings are as per the discussion at
BurntSushi#17, and they're
intended to match the API of the `flate2` crate.
We factor out some of the code from `snap::write::FrameEncoder`, but it
might be possible to factor out some more code from
`snap::read::FrameDecoder` as well. We provide test cases to make sure
that we encode all the same files the same way that the original code
does.

We have some code which attempts to use either an internal compression
buffer or the one provided by our caller if it's big enough. We probably
want one or two more tests to make sure that this works correctly both
ways.
The `--all` flag doesn't work with older versions of cargo.
@emk emk changed the title WIP: Implement read::FrameEncoder Implement read::FrameEncoder and rename existing APIs to match flate2 Feb 22, 2018
@emk
Copy link
Contributor Author

emk commented Feb 22, 2018

OK, this should be ready for code review! I feel like this is only at about 90% of my usual standard, because it took longer than I had planned and I'm feeling a bit out of it this morning. So please don't hesitate to suggest to improvements! I think it's probably still possible to share more code between snap::read::FrameDecoder and snap::write::FrameEncoder, but this morning I can't see any way to do it without adding a whole pile of potentially-clunky abstractions that feel a little forced. I was able to share the core of the framing code between write and read, so that's nice.

Thank you very much for considering this patch. And also for all your other awesome Rust stuff; we actually use several of your tools at Faraday and we're always delighted by the quality.

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emk Hi! Thanks so much. Apologies for the delay, but it honestly might be a bit before I can review this more thoroughly. I thank you very much for splitting this out into separate commits---that will make it much easier on me. :-)

I did a brief skim first to see if there's anything I could request at the outset, and I think my biggest request would be more tests. Is there a way to hook into the existing test infrastructure so that this gets the same coverage as the other framed reader/writer?

ci/script.sh Outdated
@@ -2,7 +2,7 @@

set -ex

cargo build --all --verbose
cargo build --verbose
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How old? Since we're going to do a semver bump anyway, we can increase the minimum Rust version required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not 100% necessary to bump the minimum Rust version, because we already run a cargo test --all --verbose further down, once we've established that we have a reasonably new Rust version. So test coverage has actually gone up from before.

Copy link
Contributor Author

@emk emk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In answer to your question, here's a summary of the tests I added. I think we already have the tests you want, but I'm not 100% sure.

fn read_and_write_frame_encoder_match() {
use super::{read_frame_press, write_frame_press};
let d = &$data[..];
assert_eq!(read_frame_press(d), write_frame_press(d));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BurntSushi This is the most important of the new tests. It's part of your testtrip! macro, and for every round-trip test case you already have, it verifies that read::FrameEncoder and write::FrameEncoder produce identical output. Does this seem sufficient to you?

// The `read::FrameEncoder` code uses different code paths depending on buffer
// size, so let's test both. Also, very small buffers are a good stress test.
#[test]
fn read_frame_encoder_big_and_little_buffers() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case tries to verify some buffer copying optimizations in the new code, by trying with a huge output buffer and a ridiculously tiny one, and making sure we get the same results

@BurntSushi BurntSushi mentioned this pull request Feb 8, 2020
Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think this might be a new personal best for me. A whole ~two years have passed since you submitted this pull request! It has actually been sitting in my inbox this whole time, but I just couldn't bring myself to context switch back into this project. It actually took me an entire morning to re-familiarize myself with this crate and thoroughly review this PR!

In any case, since it's been so long, I'm going to make the change I requested myself (unless I hear back from you) and other bring your commits in via a rollup PR (which will do various other cleanup things).

I just wanted to thank you again for this work. It really is outstanding!

// up to the specified number of bytes from a file.
self.src.truncate(0);
try!(self.r.by_ref().take(MAX_BLOCK_SIZE as u64)
.read_to_end(&mut self.src));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's been a while, but do you remember what led you to do this instead of just self.r.read(&mut self.src)? Granted, the latter is somewhat at the mercy of the underlying reader. If the underlying reader emits very small reads, then compression is going to be pretty stinky. On the other hand, this approach means that the reader will stubbornly wait until it has a pretty beefy number of bytes or until it hits EOF. I guess I'm just trying to picture what happens when the reader is a blocking network socket. Should our reader really try to sit and wait until its buffer is full?

I'm not a compression expert, so I suppose we should probably do whatever others do... So I drilled down into the flate2 crate, and it looks like it does indeed just issue a read call without trying to fill up as many bytes as it can: https://github.com/alexcrichton/flate2-rs/blob/4d62a8936981a74a377459155e6d41333c66cf62/src/zio.rs#L118-L157 (There is a loop, but that loop is just trying to make sure it reads a non-zero number of bytes.)

Thinking about this more, I think flate2's behavior is "better." While either choice can result in bad things, I think 1) potentially blocking like this is perhaps more surprising and 2) since self.src is appropriately sized, a "well behaved" implementation of std::io::Read really should fill up the buffer with as many bytes as it can. For sure, OS level read calls generally do this.

@emk
Copy link
Contributor Author

emk commented Feb 8, 2020

Please do not feel guilty about how long it took! I also maintain many projects, though not nearly as many as you, and I understand completely.

Please make any changes that you think are appropriate. I fear that any insight I might have today, after a long week, is no special value.

Thank you once again for so many excellent crates! I've used so much of your Rust code over the years, and it's one of the things that I really enjoy about Rust.

BurntSushi pushed a commit that referenced this pull request Feb 9, 2020
We factor out some of the code from `snap::write::FrameEncoder`. We
provide test cases to make sure that we encode all the same files the
same way that the original code does.

Closes #17, Closes #18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants