Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patched an infinite loop bug in src/mem.rs, impl Decompress::decompress() #86

Merged
merged 2 commits into from Jan 5, 2023

Conversation

bjrjk
Copy link
Contributor

@bjrjk bjrjk commented Dec 23, 2022

Hello!

Version

stable-x86_64-unknown-linux-gnu (default)
rustc 1.66.0 (69f9c33d7 2022-12-12)

Problem

I came across an infinite loop bug when I'm trying to unzipping an ASCII file whose size is larger than 4GB.

The following code is an example of my usage:

    pub fn from_zip_file(file_path: String) -> Trace {
        let zip_file =
            fs::File::open(&file_path).expect(&format!("Could not open file {}", &file_path));
        let mut zip_archive = zip::ZipArchive::new(zip_file)
            .expect(&format!("Could not open archive {}", &file_path));

        println!("Trace::from_zip_file: unzipping {}", &file_path);
        let mut trace_file = zip_archive.by_index(0).unwrap();
        let trace_file_path = trace_file.sanitized_name().to_str().unwrap().to_string();
        println!("Trace::from_zip_file: unzipped {}", &file_path);

        let mut trace_content = String::new();
        trace_file
            .read_to_string(&mut trace_content)
            .expect(&format!("Could not read unzipped file {}", trace_file_path));
        println!("Trace::from_zip_file: read_to_string {}", &file_path);
    }

My program stuck at trace_file.read_to_string().
In order to investigate the problem, I analyzed the code carefully and find root cause in the bzip2 rust library.

Root Cause

The function read_to_string() indirectly invokes std::io::default_read_to_end(). The following is source of std::io::default_read_to_end():

// This uses an adaptive system to extend the vector when it fills. We want to
// avoid paying to allocate and zero a huge chunk of memory if the reader only
// has 4 bytes while still making large reads if the reader does have a ton
// of data to return. Simply tacking on an extra DEFAULT_BUF_SIZE space every
// time is 4,500 times (!) slower than a default reservation size of 32 if the
// reader has a very small amount of data to return.
pub(crate) fn default_read_to_end<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> Result<usize> {
    let start_len = buf.len();
    let start_cap = buf.capacity();

    let mut initialized = 0; // Extra initialized bytes from previous loop iteration
    loop {
        if buf.len() == buf.capacity() {
            buf.reserve(32); // buf is full, need more space
        }

        let mut read_buf: BorrowedBuf<'_> = buf.spare_capacity_mut().into();

        // SAFETY: These bytes were initialized but not filled in the previous loop
        unsafe {
            read_buf.set_init(initialized);
        }

        let mut cursor = read_buf.unfilled();
        match r.read_buf(cursor.reborrow()) {
            Ok(()) => {}
            Err(e) if e.kind() == ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        }

        if cursor.written() == 0 {
            return Ok(buf.len() - start_len);
        }

        // store how much was initialized but not filled
        initialized = cursor.init_ref().len();

        // SAFETY: BorrowedBuf's invariants mean this much memory is initialized.
        unsafe {
            let new_len = read_buf.filled().len() + buf.len();
            buf.set_len(new_len);
        }

        if buf.len() == buf.capacity() && buf.capacity() == start_cap {
            // The buffer might be an exact fit. Let's read into a probe buffer
            // and see if it returns `Ok(0)`. If so, we've avoided an
            // unnecessary doubling of the capacity. But if not, append the
            // probe buffer to the primary buffer and let its capacity grow.
            let mut probe = [0u8; 32];

            loop {
                match r.read(&mut probe) {
                    Ok(0) => return Ok(buf.len() - start_len),
                    Ok(n) => {
                        buf.extend_from_slice(&probe[..n]);
                        break;
                    }
                    Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
                    Err(e) => return Err(e),
                }
            }
        }
    }
}

The stdlib function std::io::default_read_to_end() double the buffer's capacity every time when it's full. As the buffer's initial capacity is 32, the capacity of the Vec buf will always be a power of two.

As my ASCII file to be unzipped is a little larger than 4GB (0x1_0000_0000h, 4294967296). The capacity of buffer will be extended to 8GB (0x2_0000_0000h, 8589934592).

In default_read_to_end(), the read_buf as BorrowedBuf is borrowed from the original buffer, indicating the start of spare space in buf.

There exists a time when read_buf.len() is exactly 0x1_0000_0000 and buf.len() is 0x2_0000_0000, which means first half part of buf is all unzipped data and last half part is available spare space for filling unzipped data.

The function call r.read_buf(cursor.reborrow()) indirectly calls Decompress::decompress(), whose source is listed below:

    /// Decompress a block of input into a block of output.
    pub fn decompress(&mut self, input: &[u8], output: &mut [u8]) -> Result<Status, Error> {
        self.inner.raw.next_in = input.as_ptr() as *mut _;
        self.inner.raw.avail_in = input.len() as c_uint;
        self.inner.raw.next_out = output.as_mut_ptr() as *mut _;
        self.inner.raw.avail_out = output.len() as c_uint;
        unsafe {
            match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
                ffi::BZ_OK => Ok(Status::Ok),
                ffi::BZ_MEM_ERROR => Ok(Status::MemNeeded),
                ffi::BZ_STREAM_END => Ok(Status::StreamEnd),
                ffi::BZ_PARAM_ERROR => Err(Error::Param),
                ffi::BZ_DATA_ERROR => Err(Error::Data),
                ffi::BZ_DATA_ERROR_MAGIC => Err(Error::DataMagic),
                ffi::BZ_SEQUENCE_ERROR => Err(Error::Sequence),
                c => panic!("wut: {}", c),
            }
        }
    }

At this point, the length of output is 0x1_0000_0000.

When casting to c_uint (uint32), output.len()'s high 32 bits are lost, so avail_out is zero when invoking C unzipping function.

In that case, no data will be extracted and function directly returns. However, parent functions keeps asking for unzipping the left datas. Thus endless loop occurred.

Patch

The following is my patch. self.inner.raw.avail_out 's assign logic is modified. When avail_out is exceeding the available range of c_uint, it is set to the max value of c_uint.

Other function may contain the same problem.

    /// Decompress a block of input into a block of output.
    pub fn decompress(&mut self, input: &[u8], output: &mut [u8]) -> Result<Status, Error> {
        self.inner.raw.next_in = input.as_ptr() as *mut _;
        self.inner.raw.avail_in = input.len() as c_uint;
        self.inner.raw.next_out = output.as_mut_ptr() as *mut _;
        self.inner.raw.avail_out = {
            let avail_out = output.len();
            if (avail_out > 0) && (avail_out & c_uint::MAX as usize == 0) {
                c_uint::MAX
            } else {
                avail_out as c_uint
            }
        };
        unsafe {
            match ffi::BZ2_bzDecompress(&mut *self.inner.raw) {
                ffi::BZ_OK => Ok(Status::Ok),
                ffi::BZ_MEM_ERROR => Ok(Status::MemNeeded),
                ffi::BZ_STREAM_END => Ok(Status::StreamEnd),
                ffi::BZ_PARAM_ERROR => Err(Error::Param),
                ffi::BZ_DATA_ERROR => Err(Error::Data),
                ffi::BZ_DATA_ERROR_MAGIC => Err(Error::DataMagic),
                ffi::BZ_SEQUENCE_ERROR => Err(Error::Sequence),
                c => panic!("wut: {}", c),
            }
        }
    }

src/mem.rs Outdated Show resolved Hide resolved
@bjrjk
Copy link
Contributor Author

bjrjk commented Jan 5, 2023

Codes in Impl Decompress::decompress are accordingly patched.
It seems that codes in Impl Compress::compress have the same problem, accordingly patched.

@alexcrichton alexcrichton merged commit 90c9c18 into alexcrichton:master Jan 5, 2023
@bjrjk
Copy link
Contributor Author

bjrjk commented Jan 10, 2023

Assigned CVE-2023-22895.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants