-
Notifications
You must be signed in to change notification settings - Fork 22
Description
In trying to use Zstd::StreamingCompress.new
, I seem to have encountered a couple different data corruption issues. These present themselves in two different ways:
- If you use
Zstd::StreamingCompress.new
and use<<
or.write
to pass in a string from ~16,000 bytes to 131,072 bytes, it will sometimes randomly generate a resulting compressed string thatZstd.decompress
will generate differing output than the input for (it doesn't fail, but the output is different and corrupt). However, the compressed string is capable of being decompressed successfully by thezstd
CLI tool, and in that case it matches the input exactly, so the compression doesn't necessarily seem corrupt, butZstd.decompress
itself seems to not be able to handle this string. - If you use
Zstd::StreamingCompress.new
and use<<
or.write
to pass in a string of 131,072 bytes or more in length, thenZstd::StreamingCompress
will consistently produce corrupt content that cannot be decompressed (by either the gem or the CLI tool). - If you use
Zstd::StreamingCompress.new
and use.compress
(instead of<<
or.write
) to pass in data, then it only exhibits the first issue above, and not the second issue. So data passed to.compress
greater than ~16,000 bytes will randomly generate corrupt output if passed toZstd.decompress
, but it will work via thezstd
CLI. However, instead of the second issue above with data greater than 131,072 consistently failing completely, these longer strings will still exhibit the first issue (mismatchedZstd.decompress
output).
I cannot reproduce these issues if I use Zstd.compress
, so this seems specific to the streaming compression.
Reproduction script
I've reproduce this with both zstd-ruby 1.5.7.0 and 2.0.0.pre.preview1 on the following platforms:
ruby 3.4.5 (2025-07-16 revision 20cda200d3) +PRISM [arm64-darwin24]
ruby 3.4.5 (2025-07-16 revision 20cda200d3) +PRISM [x86_64-linux]
Here's my attempt at a script to reproduce this if you save the following to test_zstd.rb
. Sorry it's maybe a bit convoluted to test all 3 situations, but more explanation on usage and some abbreviated output below:
require "bundler/inline"
require "digest"
require "tempfile"
gemfile do
source "https://rubygems.org"
gem "zstd-ruby", "1.5.7.0"
end
def compare_compressed(original:, compressed:)
begin
decompressed = Zstd.decompress(compressed)
rescue => e
decompress_error = e
end
if original != decompressed
if decompress_error
puts "Decompression error for #{original.bytesize} bytes input (#{decompress_error})"
else
puts "Content mismatch for #{original.bytesize} bytes input"
end
puts " Original: #{original.bytesize} bytes, #{Digest::SHA256.hexdigest(original)[0, 10]} checksum"
if decompressed
puts " Zstd.decompress: #{decompressed.bytesize} bytes, #{Digest::SHA256.hexdigest(decompressed)[0, 10]} checksum"
end
begin
cli_decompressed = Tempfile.create(binmode: true) do |temp_write|
temp_write.write(compressed)
temp_write.close
Tempfile.create(binmode: true) do |temp_read|
system "zstd", "--decompress", "--quiet", "--force", "-o", temp_read.path, temp_write.path, exception: true
File.read(temp_read.path, binmode: true)
end
end
puts " zstd cli: #{cli_decompressed.bytesize} bytes, #{Digest::SHA256.hexdigest(cli_decompressed)[0, 10]} checksum"
rescue => e
puts " zstd cli error: #{e}"
end
end
end
def test_stream_write
(1..256_000).each do |length|
original = "a" * length
stream = Zstd::StreamingCompress.new
stream << original
res = stream.finish
compare_compressed(original: original, compressed: res)
end
end
def test_stream_compress
(1..256_000).each do |length|
original = "a" * length
stream = Zstd::StreamingCompress.new
res = stream.compress(original)
res << stream.finish
compare_compressed(original: original, compressed: res)
end
end
def test_compress
(1..256_000).each do |length|
original = "a" * length
res = Zstd.compress(original)
compare_compressed(original: original, compressed: res)
end
end
case ARGV[0]
when "stream_write"
puts "=== Zstd::StreamingCompress.new with << ==="
test_stream_write
when "stream_compress"
puts "=== Zstd::StreamingCompress.new with .compress ==="
test_stream_compress
when "compress"
puts "=== Zstd.compress ==="
test_compress
else
abort "Unknown test mode: #{ARGV[0].inspect}"
end
Reproduction script usage
- Run
ruby test_zstd.rb stream_write
to testZstd::StreamingCompress.new
with<<
which should exhibit the first issue above randomly for input sizes in the ~16,000-131,072 byte range, and the second issue consistently for inputs greater than or equal to 131,072 bytes. - Run
ruby test_zstd.rb stream_compress
to testZstd::StreamingCompress.new
with.compress
which should exhibit the third issue describe above with inputs sizes greater than ~16,000 bytes randomly having issues. - RUn
ruby test_zstd.rb compress
to testZstd.compress
which generates no errors for me.
Reproduction script example output
-
For
ruby test_zstd.rb stream_write
note that given the Ruby stream compression input, thatzstd
CLI actually does produce the same output as the original input, even whenZstd.decompress
does not (this is what the checksum of the content is in the output for). However, once you get to 131,072 bytes, then all decompression starts to fail completely.=== Zstd::StreamingCompress.new with << === Content mismatch for 16937 bytes input Original: 16937 bytes, e31be8f076 checksum Zstd.decompress: 16937 bytes, 41fb77f572 checksum zstd cli: 16937 bytes, e31be8f076 checksum Content mismatch for 21515 bytes input Original: 21515 bytes, 1cccd688d3 checksum Zstd.decompress: 21515 bytes, 6695186c17 checksum zstd cli: 21515 bytes, 1cccd688d3 checksum [...] Content mismatch for 97075 bytes input Original: 97075 bytes, 1f642ed1a7 checksum Zstd.decompress: 97075 bytes, acc3eb205e checksum zstd cli: 97075 bytes, 1f642ed1a7 checksum Decompression error for 131072 bytes input (not compressed by zstd: Unspecified error code) Original: 131072 bytes, b44ffb72fc checksum zstd: /var/folders/td/52lw67lj0wz36_rhqflz24_19mm_gh/T/20250813-83789-e4qsiy: unknown header zstd cli error: Command failed with exit 1: zstd Decompression error for 131073 bytes input (not compressed by zstd: Unspecified error code) Original: 131073 bytes, 7e009ea4ef checksum zstd: /var/folders/td/52lw67lj0wz36_rhqflz24_19mm_gh/T/20250813-83789-uzf2ug: unsupported format zstd cli error: Command failed with exit 1: zstd [...]
-
For
ruby test_zstd.rb stream_compress
note it still exhibits the first issue, but it behaves the same above 131,072 bytes of input:=== Zstd::StreamingCompress.new with .compress === Content mismatch for 16671 bytes input Original: 16671 bytes, 75fa71ee56 checksum Zstd.decompress: 16671 bytes, fefcb91c80 checksum zstd cli: 16671 bytes, 75fa71ee56 checksum Content mismatch for 16936 bytes input Original: 16936 bytes, a9d4d5bb65 checksum Zstd.decompress: 16936 bytes, adfd120dfd checksum zstd cli: 16936 bytes, a9d4d5bb65 checksum [...] Content mismatch for 98731 bytes input Original: 98731 bytes, 093614bb66 checksum Zstd.decompress: 98731 bytes, fa9fe924fd checksum zstd cli: 98731 bytes, 093614bb66 checksum Content mismatch for 131185 bytes input Original: 131185 bytes, b5b6d4d116 checksum Zstd.decompress: 131185 bytes, 16ab74a052 checksum zstd cli: 131185 bytes, b5b6d4d116 checksum [...] Content mismatch for 244541 bytes input Original: 244541 bytes, 951b4d7ef8 checksum Zstd.decompress: 244541 bytes, 170cce21a4 checksum zstd cli: 244541 bytes, 951b4d7ef8 checksum Content mismatch for 247016 bytes input Original: 247016 bytes, 2b51d7363f checksum Zstd.decompress: 247016 bytes, c5bc5222b8 checksum zstd cli: 247016 bytes, 2b51d7363f checksum
-
For
ruby test_zstd.rb compress
when not using the streaming compressor, it seems like everything works and the tests produce no output of mismatched things:=== Zstd.compress ===