Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress() #86

bjrjk · 2022-12-23T16:15:01Z

Hello!

Version

stable-x86_64-unknown-linux-gnu (default)
rustc 1.66.0 (69f9c33d7 2022-12-12)

Problem

I came across an infinite loop bug when I'm trying to unzipping an ASCII file whose size is larger than 4GB.

The following code is an example of my usage:

    pub fn from_zip_file(file_path: String) -> Trace {
        let zip_file =
            fs::File::open(&file_path).expect(&format!("Could not open file {}", &file_path));
        let mut zip_archive = zip::ZipArchive::new(zip_file)
            .expect(&format!("Could not open archive {}", &file_path));

        println!("Trace::from_zip_file: unzipping {}", &file_path);
        let mut trace_file = zip_archive.by_index(0).unwrap();
        let trace_file_path = trace_file.sanitized_name().to_str().unwrap().to_string();
        println!("Trace::from_zip_file: unzipped {}", &file_path);

        let mut trace_content = String::new();
        trace_file
            .read_to_string(&mut trace_content)
            .expect(&format!("Could not read unzipped file {}", trace_file_path));
        println!("Trace::from_zip_file: read_to_string {}", &file_path);
    }

My program stuck at trace_file.read_to_string().
In order to investigate the problem, I analyzed the code carefully and find root cause in the bzip2 rust library.

Root Cause

The function read_to_string() indirectly invokes std::io::default_read_to_end(). The following is source of std::io::default_read_to_end():

// This uses an adaptive system to extend the vector when it fills. We want to
// avoid paying to allocate and zero a huge chunk of memory if the reader only
// has 4 bytes while still making large reads if the reader does have a ton
// of data to return. Simply tacking on an extra DEFAULT_BUF_SIZE space every
// time is 4,500 times (!) slower than a default reservation size of 32 if the
// reader has a very small amount of data to return.
pub(crate) fn default_read_to_end<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> Result<usize> {
    let start_len = buf.len();
    let start_cap = buf.capacity();

    let mut initialized = 0; // Extra initialized bytes from previous loop iteration
    loop {
        if buf.len() == buf.capacity() {
            buf.reserve(32); // buf is full, need more space
        }

        let mut read_buf: BorrowedBuf<'_> = buf.spare_capacity_mut().into();

        // SAFETY: These bytes were initialized but not filled in the previous loop
        unsafe {
            read_buf.set_init(initialized);
        }

        let mut cursor = read_buf.unfilled();
        match r.read_buf(cursor.reborrow()) {
            Ok(()) => {}
            Err(e) if e.kind() == ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        }

        if cursor.written() == 0 {
            return Ok(buf.len() - start_len);
        }

        // store how much was initialized but not filled
        initialized = cursor.init_ref().len();

        // SAFETY: BorrowedBuf's invariants mean this much memory is initialized.
        unsafe {
            let new_len = read_buf.filled().len() + buf.len();
            buf.set_len(new_len);
        }

        if buf.len() == buf.capacity() && buf.capacity() == start_cap {
            // The buffer might be an exact fit. Let's read into a probe buffer
            // and see if it returns `Ok(0)`. If so, we've avoided an
            // unnecessary doubling of the capacity. But if not, append the
            // probe buffer to the primary buffer and let its capacity grow.
            let mut probe = [0u8; 32];

            loop {
                match r.read(&mut probe) {
                    Ok(0) => return Ok(buf.len() - start_len),
                    Ok(n) => {
                        buf.extend_from_slice(&probe[..n]);
                        break;
                    }
                    Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
                    Err(e) => return Err(e),
                }
            }
        }
    }
}

The stdlib function std::io::default_read_to_end() double the buffer's capacity every time when it's full. As the buffer's initial capacity is 32, the capacity of the Vec buf will always be a power of two.

As my ASCII file to be unzipped is a little larger than 4GB (0x1_0000_0000h, 4294967296). The capacity of buffer will be extended to 8GB (0x2_0000_0000h, 8589934592).

In default_read_to_end(), the read_buf as BorrowedBuf is borrowed from the original buffer, indicating the start of spare space in buf.

There exists a time when read_buf.len() is exactly 0x1_0000_0000 and buf.len() is 0x2_0000_0000, which means first half part of buf is all unzipped data and last half part is available spare space for filling unzipped data.

The function call r.read_buf(cursor.reborrow()) indirectly calls Decompress::decompress(), whose source is listed below:

    /// Decompress a block of input into a block of output.
    pub fn decompress(&mut self, input: &[u8], output: &mut [u8]) -> Result<Status, Error> {
        self.inner.raw.next_in = input.as_ptr() as *mut _;
        self.inner.raw.avail_in = input.len() as c_uint;
        self.inner.raw.next_out = output.as_mut_ptr() as *mut _;
        self.inner.raw.avail_out = output.len() as c_uint;
        unsafe {
            match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
                ffi::BZ_OK => Ok(Status::Ok),
                ffi::BZ_MEM_ERROR => Ok(Status::MemNeeded),
                ffi::BZ_STREAM_END => Ok(Status::StreamEnd),
                ffi::BZ_PARAM_ERROR => Err(Error::Param),
                ffi::BZ_DATA_ERROR => Err(Error::Data),
                ffi::BZ_DATA_ERROR_MAGIC => Err(Error::DataMagic),
                ffi::BZ_SEQUENCE_ERROR => Err(Error::Sequence),
                c => panic!("wut: {}", c),
            }
        }
    }

At this point, the length of output is 0x1_0000_0000.

When casting to c_uint (uint32), output.len()'s high 32 bits are lost, so avail_out is zero when invoking C unzipping function.

In that case, no data will be extracted and function directly returns. However, parent functions keeps asking for unzipping the left datas. Thus endless loop occurred.

Patch

The following is my patch. self.inner.raw.avail_out 's assign logic is modified. When avail_out is exceeding the available range of c_uint, it is set to the max value of c_uint.

Other function may contain the same problem.

    /// Decompress a block of input into a block of output.
    pub fn decompress(&mut self, input: &[u8], output: &mut [u8]) -> Result<Status, Error> {
        self.inner.raw.next_in = input.as_ptr() as *mut _;
        self.inner.raw.avail_in = input.len() as c_uint;
        self.inner.raw.next_out = output.as_mut_ptr() as *mut _;
        self.inner.raw.avail_out = {
            let avail_out = output.len();
            if (avail_out > 0) && (avail_out & c_uint::MAX as usize == 0) {
                c_uint::MAX
            } else {
                avail_out as c_uint
            }
        };
        unsafe {
            match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
                ffi::BZ_OK => Ok(Status::Ok),
                ffi::BZ_MEM_ERROR => Ok(Status::MemNeeded),
                ffi::BZ_STREAM_END => Ok(Status::StreamEnd),
                ffi::BZ_PARAM_ERROR => Err(Error::Param),
                ffi::BZ_DATA_ERROR => Err(Error::Data),
                ffi::BZ_DATA_ERROR_MAGIC => Err(Error::DataMagic),
                ffi::BZ_SEQUENCE_ERROR => Err(Error::Sequence),
                c => panic!("wut: {}", c),
            }
        }
    }

…actly

src/mem.rs

bjrjk · 2023-01-05T00:36:21Z

Codes in Impl Decompress::decompress are accordingly patched.
It seems that codes in Impl Compress::compress have the same problem, accordingly patched.

bjrjk · 2023-01-10T01:57:08Z

Assigned CVE-2023-22895.

This vulnerability is also known as GHSA-96jv-r488-c2rj. Versions of the bzip2 crate before 0.4.4 contain a Denial of Service vulnerability that could cause the compression and / or decompression to run into an infinite loop. For more details see <https://rustsec.org/advisories/RUSTSEC-2023-0004.html> or <alexcrichton/bzip2-rs#86>.

Solved infinite loop problem when output.length() is 0x1_0000_0000 ex…

b101109

…actly

alexcrichton reviewed Jan 4, 2023

View reviewed changes

src/mem.rs Outdated Show resolved Hide resolved

Fixed incomplete and too complicated patch

d73a725

alexcrichton approved these changes Jan 5, 2023

View reviewed changes

alexcrichton merged commit 90c9c18 into alexcrichton:master Jan 5, 2023

Plecra mentioned this pull request Jan 31, 2023

Patched bug in bzip2 0.4.3 rustsec/advisory-db#1546

Closed

tdunlap607 mentioned this pull request Mar 31, 2023

[GHSA-96jv-r488-c2rj] bzip2 allows attackers to cause a denial of service via a large file that triggers an integer overflow github/advisory-database#1901

Merged

striezel mentioned this pull request Aug 29, 2023

bump bzip2 to 0.4.4 to fix RUSTSEC-2023-0004 / CVE-2023-22895 zip-rs/zip-old#392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress() #86

Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress() #86

bjrjk commented Dec 23, 2022

bjrjk commented Jan 5, 2023

bjrjk commented Jan 10, 2023

Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress() #86

Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress() #86

Conversation

bjrjk commented Dec 23, 2022

Version

Problem

Root Cause

Patch

bjrjk commented Jan 5, 2023

bjrjk commented Jan 10, 2023