Implement `read::FrameEncoder` and rename existing APIs to match `flate2` #18

emk · 2018-02-13T13:40:42Z

Implements #17.

This patch reorganizes the API to work more like flate2 and adds a new snap::read::FrameEncoder which can be used to compress a Read object while reading from it. The new interfaces are:

snap::Reader becomes snap::read::FrameDecoder.
snap::Writer becomes snap::write::FrameEncoder.
The type snap::read::FrameEncoder is new.
We do not implement snap::write::FrameDecoder yet, but it could be added.

As for tests:

All the old test cases work.
The szip crate tests are now run as part of CI.
All the round-trip test cases now include tests to make sure that snap::write::FrameEncoder and snap::read::FrameEncoder produce matching compressed streams.
We add a test case that verifies that the output of snap::read::FrameEncoder is the same with very small and very large buffers (because it uses a faster code path for larger buffers).

I was able to refactor out the core of snap::write::FrameEncoder and re-use it, but the the two snap::read implementations do not yet share code, because it didn't seem to help the clarity very much.

We include `szip` in the standard build, but exclude `snappy-cpp` for now, because it requires C++ dependencies and isn't normally used. We also add it to the standard builds on Travis and AppVeyor.

These renamings are as per the discussion at BurntSushi#17, and they're intended to match the API of the `flate2` crate.

We factor out some of the code from `snap::write::FrameEncoder`, but it might be possible to factor out some more code from `snap::read::FrameDecoder` as well. We provide test cases to make sure that we encode all the same files the same way that the original code does. We have some code which attempts to use either an internal compression buffer or the one provided by our caller if it's big enough. We probably want one or two more tests to make sure that this works correctly both ways.

The `--all` flag doesn't work with older versions of cargo.

emk · 2018-02-22T12:48:57Z

OK, this should be ready for code review! I feel like this is only at about 90% of my usual standard, because it took longer than I had planned and I'm feeling a bit out of it this morning. So please don't hesitate to suggest to improvements! I think it's probably still possible to share more code between snap::read::FrameDecoder and snap::write::FrameEncoder, but this morning I can't see any way to do it without adding a whole pile of potentially-clunky abstractions that feel a little forced. I was able to share the core of the framing code between write and read, so that's nice.

Thank you very much for considering this patch. And also for all your other awesome Rust stuff; we actually use several of your tools at Faraday and we're always delighted by the quality.

BurntSushi

@emk Hi! Thanks so much. Apologies for the delay, but it honestly might be a bit before I can review this more thoroughly. I thank you very much for splitting this out into separate commits---that will make it much easier on me. :-)

I did a brief skim first to see if there's anything I could request at the outset, and I think my biggest request would be more tests. Is there a way to hook into the existing test infrastructure so that this gets the same coverage as the other framed reader/writer?

BurntSushi · 2018-02-26T12:08:28Z

ci/script.sh

@@ -2,7 +2,7 @@

 set -ex

-cargo build --all --verbose
+cargo build --verbose


How old? Since we're going to do a semver bump anyway, we can increase the minimum Rust version required.

It's probably not 100% necessary to bump the minimum Rust version, because we already run a cargo test --all --verbose further down, once we've established that we have a reasonably new Rust version. So test coverage has actually gone up from before.

emk

In answer to your question, here's a summary of the tests I added. I think we already have the tests you want, but I'm not 100% sure.

emk · 2018-02-26T14:16:33Z

src/tests.rs

+            fn read_and_write_frame_encoder_match() {
+                use super::{read_frame_press, write_frame_press};
+                let d = &$data[..];
+                assert_eq!(read_frame_press(d), write_frame_press(d));


@BurntSushi This is the most important of the new tests. It's part of your testtrip! macro, and for every round-trip test case you already have, it verifies that read::FrameEncoder and write::FrameEncoder produce identical output. Does this seem sufficient to you?

emk · 2018-02-26T14:17:57Z

src/tests.rs

+// The `read::FrameEncoder` code uses different code paths depending on buffer
+// size, so let's test both. Also, very small buffers are a good stress test.
+#[test]
+fn read_frame_encoder_big_and_little_buffers() {


This test case tries to verify some buffer copying optimizations in the new code, by trying with a huge output buffer and a ridiculously tiny one, and making sure we get the same results

BurntSushi

Well, I think this might be a new personal best for me. A whole ~two years have passed since you submitted this pull request! It has actually been sitting in my inbox this whole time, but I just couldn't bring myself to context switch back into this project. It actually took me an entire morning to re-familiarize myself with this crate and thoroughly review this PR!

In any case, since it's been so long, I'm going to make the change I requested myself (unless I hear back from you) and other bring your commits in via a rollup PR (which will do various other cleanup things).

I just wanted to thank you again for this work. It really is outstanding!

BurntSushi · 2020-02-08T13:36:09Z

src/read.rs

+        // up to the specified number of bytes from a file.
+        self.src.truncate(0);
+        try!(self.r.by_ref().take(MAX_BLOCK_SIZE as u64)
+            .read_to_end(&mut self.src));


I know it's been a while, but do you remember what led you to do this instead of just self.r.read(&mut self.src)? Granted, the latter is somewhat at the mercy of the underlying reader. If the underlying reader emits very small reads, then compression is going to be pretty stinky. On the other hand, this approach means that the reader will stubbornly wait until it has a pretty beefy number of bytes or until it hits EOF. I guess I'm just trying to picture what happens when the reader is a blocking network socket. Should our reader really try to sit and wait until its buffer is full?

I'm not a compression expert, so I suppose we should probably do whatever others do... So I drilled down into the flate2 crate, and it looks like it does indeed just issue a read call without trying to fill up as many bytes as it can: https://github.com/alexcrichton/flate2-rs/blob/4d62a8936981a74a377459155e6d41333c66cf62/src/zio.rs#L118-L157 (There is a loop, but that loop is just trying to make sure it reads a non-zero number of bytes.)

Thinking about this more, I think flate2's behavior is "better." While either choice can result in bad things, I think 1) potentially blocking like this is perhaps more surprising and 2) since self.src is appropriately sized, a "well behaved" implementation of std::io::Read really should fill up the buffer with as many bytes as it can. For sure, OS level read calls generally do this.

emk · 2020-02-08T18:53:18Z

Please do not feel guilty about how long it took! I also maintain many projects, though not nearly as many as you, and I understand completely.

Please make any changes that you think are appropriate. I fear that any insight I might have today, after a long week, is no special value.

Thank you once again for so many excellent crates! I've used so much of your Rust code over the years, and it's one of the things that I really enjoy about Rust.

We factor out some of the code from `snap::write::FrameEncoder`. We provide test cases to make sure that we encode all the same files the same way that the original code does. Closes #17, Closes #18

emk added 2 commits February 13, 2018 08:33

Set up a cargo workspace

e79039c

We include `szip` in the standard build, but exclude `snappy-cpp` for now, because it requires C++ dependencies and isn't normally used. We also add it to the standard builds on Travis and AppVeyor.

Rename Reader and Writer (BREAKING CHANGE)

033c471

These renamings are as per the discussion at BurntSushi#17, and they're intended to match the API of the `flate2` crate.

emk mentioned this pull request Feb 13, 2018

Add analogue to flate2::read::GzEncoder that compresses in Read #17

Closed

emk force-pushed the read_compressing branch from 51aad42 to b329a2b Compare February 22, 2018 12:06

Don't try to build --all with old Rust versions

b47a553

The `--all` flag doesn't work with older versions of cargo.

emk changed the title ~~WIP: Implement read::FrameEncoder~~ Implement read::FrameEncoder and rename existing APIs to match flate2 Feb 22, 2018

BurntSushi reviewed Feb 26, 2018

View reviewed changes

emk commented Feb 26, 2018

View reviewed changes

BurntSushi mentioned this pull request Feb 8, 2020

2018 Edition #25

Closed

BurntSushi reviewed Feb 8, 2020

View reviewed changes

BurntSushi closed this in d485cb6 Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `read::FrameEncoder` and rename existing APIs to match `flate2` #18

Implement `read::FrameEncoder` and rename existing APIs to match `flate2` #18

emk commented Feb 13, 2018 •

edited

Loading

emk commented Feb 22, 2018

BurntSushi left a comment

BurntSushi Feb 26, 2018

emk Feb 26, 2018

emk left a comment

emk Feb 26, 2018

emk Feb 26, 2018

BurntSushi left a comment

BurntSushi Feb 8, 2020

emk commented Feb 8, 2020

Implement read::FrameEncoder and rename existing APIs to match flate2 #18

Implement read::FrameEncoder and rename existing APIs to match flate2 #18

Conversation

emk commented Feb 13, 2018 • edited Loading

emk commented Feb 22, 2018

BurntSushi left a comment

Choose a reason for hiding this comment

BurntSushi Feb 26, 2018

Choose a reason for hiding this comment

emk Feb 26, 2018

Choose a reason for hiding this comment

emk left a comment

Choose a reason for hiding this comment

emk Feb 26, 2018

Choose a reason for hiding this comment

emk Feb 26, 2018

Choose a reason for hiding this comment

BurntSushi left a comment

Choose a reason for hiding this comment

BurntSushi Feb 8, 2020

Choose a reason for hiding this comment

emk commented Feb 8, 2020

Implement `read::FrameEncoder` and rename existing APIs to match `flate2` #18

Implement `read::FrameEncoder` and rename existing APIs to match `flate2` #18

emk commented Feb 13, 2018 •

edited

Loading