-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement read::FrameEncoder
and rename existing APIs to match flate2
#18
Conversation
We include `szip` in the standard build, but exclude `snappy-cpp` for now, because it requires C++ dependencies and isn't normally used. We also add it to the standard builds on Travis and AppVeyor.
These renamings are as per the discussion at BurntSushi#17, and they're intended to match the API of the `flate2` crate.
We factor out some of the code from `snap::write::FrameEncoder`, but it might be possible to factor out some more code from `snap::read::FrameDecoder` as well. We provide test cases to make sure that we encode all the same files the same way that the original code does. We have some code which attempts to use either an internal compression buffer or the one provided by our caller if it's big enough. We probably want one or two more tests to make sure that this works correctly both ways.
51aad42
to
b329a2b
Compare
The `--all` flag doesn't work with older versions of cargo.
read::FrameEncoder
read::FrameEncoder
and rename existing APIs to match flate2
OK, this should be ready for code review! I feel like this is only at about 90% of my usual standard, because it took longer than I had planned and I'm feeling a bit out of it this morning. So please don't hesitate to suggest to improvements! I think it's probably still possible to share more code between Thank you very much for considering this patch. And also for all your other awesome Rust stuff; we actually use several of your tools at Faraday and we're always delighted by the quality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emk Hi! Thanks so much. Apologies for the delay, but it honestly might be a bit before I can review this more thoroughly. I thank you very much for splitting this out into separate commits---that will make it much easier on me. :-)
I did a brief skim first to see if there's anything I could request at the outset, and I think my biggest request would be more tests. Is there a way to hook into the existing test infrastructure so that this gets the same coverage as the other framed reader/writer?
ci/script.sh
Outdated
@@ -2,7 +2,7 @@ | |||
|
|||
set -ex | |||
|
|||
cargo build --all --verbose | |||
cargo build --verbose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How old? Since we're going to do a semver bump anyway, we can increase the minimum Rust version required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not 100% necessary to bump the minimum Rust version, because we already run a cargo test --all --verbose
further down, once we've established that we have a reasonably new Rust version. So test coverage has actually gone up from before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In answer to your question, here's a summary of the tests I added. I think we already have the tests you want, but I'm not 100% sure.
fn read_and_write_frame_encoder_match() { | ||
use super::{read_frame_press, write_frame_press}; | ||
let d = &$data[..]; | ||
assert_eq!(read_frame_press(d), write_frame_press(d)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BurntSushi This is the most important of the new tests. It's part of your testtrip!
macro, and for every round-trip test case you already have, it verifies that read::FrameEncoder
and write::FrameEncoder
produce identical output. Does this seem sufficient to you?
// The `read::FrameEncoder` code uses different code paths depending on buffer | ||
// size, so let's test both. Also, very small buffers are a good stress test. | ||
#[test] | ||
fn read_frame_encoder_big_and_little_buffers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case tries to verify some buffer copying optimizations in the new code, by trying with a huge output buffer and a ridiculously tiny one, and making sure we get the same results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think this might be a new personal best for me. A whole ~two years have passed since you submitted this pull request! It has actually been sitting in my inbox this whole time, but I just couldn't bring myself to context switch back into this project. It actually took me an entire morning to re-familiarize myself with this crate and thoroughly review this PR!
In any case, since it's been so long, I'm going to make the change I requested myself (unless I hear back from you) and other bring your commits in via a rollup PR (which will do various other cleanup things).
I just wanted to thank you again for this work. It really is outstanding!
// up to the specified number of bytes from a file. | ||
self.src.truncate(0); | ||
try!(self.r.by_ref().take(MAX_BLOCK_SIZE as u64) | ||
.read_to_end(&mut self.src)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's been a while, but do you remember what led you to do this instead of just self.r.read(&mut self.src)
? Granted, the latter is somewhat at the mercy of the underlying reader. If the underlying reader emits very small reads, then compression is going to be pretty stinky. On the other hand, this approach means that the reader will stubbornly wait until it has a pretty beefy number of bytes or until it hits EOF. I guess I'm just trying to picture what happens when the reader is a blocking network socket. Should our reader really try to sit and wait until its buffer is full?
I'm not a compression expert, so I suppose we should probably do whatever others do... So I drilled down into the flate2
crate, and it looks like it does indeed just issue a read
call without trying to fill up as many bytes as it can: https://github.com/alexcrichton/flate2-rs/blob/4d62a8936981a74a377459155e6d41333c66cf62/src/zio.rs#L118-L157 (There is a loop, but that loop is just trying to make sure it reads a non-zero number of bytes.)
Thinking about this more, I think flate2
's behavior is "better." While either choice can result in bad things, I think 1) potentially blocking like this is perhaps more surprising and 2) since self.src
is appropriately sized, a "well behaved" implementation of std::io::Read
really should fill up the buffer with as many bytes as it can. For sure, OS level read
calls generally do this.
Please do not feel guilty about how long it took! I also maintain many projects, though not nearly as many as you, and I understand completely. Please make any changes that you think are appropriate. I fear that any insight I might have today, after a long week, is no special value. Thank you once again for so many excellent crates! I've used so much of your Rust code over the years, and it's one of the things that I really enjoy about Rust. |
Implements #17.
This patch reorganizes the API to work more like
flate2
and adds a newsnap::read::FrameEncoder
which can be used to compress aRead
object while reading from it. The new interfaces are:snap::Reader
becomessnap::read::FrameDecoder
.snap::Writer
becomessnap::write::FrameEncoder
.snap::read::FrameEncoder
is new.snap::write::FrameDecoder
yet, but it could be added.As for tests:
szip
crate tests are now run as part of CI.snap::write::FrameEncoder
andsnap::read::FrameEncoder
produce matching compressed streams.snap::read::FrameEncoder
is the same with very small and very large buffers (because it uses a faster code path for larger buffers).I was able to refactor out the core of
snap::write::FrameEncoder
and re-use it, but the the twosnap::read
implementations do not yet share code, because it didn't seem to help the clarity very much.