Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress() #86
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello!
Version
Problem
I came across an infinite loop bug when I'm trying to unzipping an ASCII file whose size is larger than 4GB.
The following code is an example of my usage:
My program stuck at
trace_file.read_to_string()
.In order to investigate the problem, I analyzed the code carefully and find root cause in the bzip2 rust library.
Root Cause
The function
read_to_string()
indirectly invokesstd::io::default_read_to_end()
. The following is source ofstd::io::default_read_to_end()
:The stdlib function
std::io::default_read_to_end()
double the buffer's capacity every time when it's full. As the buffer's initial capacity is 32, the capacity of theVec
buf will always be a power of two.As my ASCII file to be unzipped is a little larger than 4GB (0x1_0000_0000h, 4294967296). The capacity of buffer will be extended to 8GB (0x2_0000_0000h, 8589934592).
In
default_read_to_end()
, theread_buf
asBorrowedBuf
is borrowed from the original buffer, indicating the start of spare space inbuf
.There exists a time when
read_buf.len()
is exactly 0x1_0000_0000 andbuf.len()
is 0x2_0000_0000, which means first half part ofbuf
is all unzipped data and last half part is available spare space for filling unzipped data.The function call
r.read_buf(cursor.reborrow())
indirectly callsDecompress::decompress()
, whose source is listed below:At this point, the length of
output
is0x1_0000_0000
.When casting to
c_uint
(uint32),output.len()
's high 32 bits are lost, soavail_out
is zero when invoking C unzipping function.In that case, no data will be extracted and function directly returns. However, parent functions keeps asking for unzipping the left datas. Thus endless loop occurred.
Patch
The following is my patch.
self.inner.raw.avail_out
's assign logic is modified. Whenavail_out
is exceeding the available range ofc_uint
, it is set to the max value ofc_uint
.Other function may contain the same problem.