-
Notifications
You must be signed in to change notification settings - Fork 104
Fix: zstd decompression is truncated to the first frame in the stream #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Ok. This particular .zstd file has multiple frames in it. async-compression can only handle one frame, which is wrong. A stream can have as many frames as it wants. See this code in compression-codecs: impl Decode for ZstdDecoder {
fn decode(...) -> Result<bool, ...) {
let status = self
.decoder
.get_mut()
.run_on_buffers(input.unwritten(), output.unwritten_mut())?;
input.advance(status.bytes_read);
output.advance(status.bytes_written);
Ok(status.remaining == 0)
}
}See https://docs.rs/zstd/latest/zstd/stream/raw/struct.Status.html --
Do not take my word for this being correct, I do not know, but if you do - Ok(status.remaining == 0)
+ Ok(status.remaining == 0 && status.bytes_read == 0)Then this test passes. |
0718e9d to
e5dd0d5
Compare
This fails with 131072 bytes read, which is coincidentally zstd's max block size.
e5dd0d5 to
3b6506a
Compare
|
The failing test |
|
Thanks, maybe we need to properly support multi-frame, like how gzip does it (IIRC gzip has a reset implementation) |
|
zstd does have a async-compression already has a Our decoder assumes that you know about if multiple frames, and cal |
|
Hmmm. I guess this test does pass with However, the reason I went so deep on this was I REALLY expected ZstdDecoder to do what (echo hello | zstd; echo goodbye | zstd) | zstd -d
hello
goodbye
(echo hello | gzip; echo goodbye | gzip) | gzip -d
hello
goodbyeio::copy(&mut ZstdDecoder::new(stdin), &mut stdout).unwrap();
"hello"Obviously that will be a breaking change so probably requires a version bump, but it may also be a "fixing change" by making the code work like a lot of people already assumed it did. |
|
I think we'd need better documentation for that. And optionally, we may have a |
Summary: Remote execution with BuildBuddy is still impacted by the bug described in <facebook#1103 (comment)>. Zstd compression can be re-enabled after `async-compression` has been fixed, potentially by <Nullus157/async-compression#400>. Differential Revision: D87992619
…bers(true) Summary: This fixes the truncated materializations when using Remote Execution with BuildBuddy described in <facebook#1103 (comment)>. The fix was discovered by cormacrelf and NobodyXu in <Nullus157/async-compression#400>. Differential Revision: D88011967
…bers(true) Summary: This fixes the truncated materializations when using Remote Execution with BuildBuddy described in <facebook#1103 (comment)>. The fix was discovered by cormacrelf and NobodyXu in <Nullus157/async-compression#400>. Differential Revision: D88011967
…bers(true) (#1156) Summary: Pull Request resolved: #1156 This fixes the truncated materializations when using Remote Execution with BuildBuddy described in <#1103 (comment)>. The fix was discovered by cormacrelf and NobodyXu in <Nullus157/async-compression#400>. Reviewed By: diliop Differential Revision: D88011967 fbshipit-source-id: bc3d92ff77812c5aa67325fcab696de7bd827ce1
This fails with 131072 bytes read, which is coincidentally zstd's max block size. Bug discovered in facebook/buck2#1103.
Clearly poll_read is returning "ready" with 0 bytes read when it should still be pending. I have not done a bisect to find when this bug comes from, but you could do it with a script to apply this commit as a patch at each bisect eval point.