Skip to content

check output buffer bounds in SnappyBlockDecompressor#63910

Open
sahvx655-wq wants to merge 3 commits into
apache:masterfrom
sahvx655-wq:snappy-block-output-bounds
Open

check output buffer bounds in SnappyBlockDecompressor#63910
sahvx655-wq wants to merge 3 commits into
apache:masterfrom
sahvx655-wq:snappy-block-output-bounds

Conversation

@sahvx655-wq
Copy link
Copy Markdown

@sahvx655-wq sahvx655-wq commented May 30, 2026

Problem

SnappyBlockDecompressor::decompress in be/src/util/decompressor.cpp can write past the line-reader output buffer when handling a crafted SNAPPYBLOCK stream (e.g. a CSV load).

The decompress loop has two levels. The large-block length is read into remaining_decompressed_large_block_len and bounds-checked against the remaining output:

if (remaining_output_len < remaining_decompressed_large_block_len) {
    // need more output buffer ...
}

But inside the inner loop each small block's uncompressed length comes from the per-block snappy header via snappy::GetUncompressedLength, which is attacker-controlled and independent of the large-block length. That value is then passed straight to snappy::RawUncompress, which has no destination-capacity argument:

snappy::GetUncompressedLength(input_ptr, compressed_small_block_len, &decompressed_small_block_len);
snappy::RawUncompress(input_ptr, compressed_small_block_len, output_ptr); // no capacity check

So a small block can declare a decompressed length larger than the space actually left in the output buffer, and RawUncompress writes out of bounds (heap OOB write past the output buffer).

Fix

Before calling RawUncompress, check decompressed_small_block_len against the remaining output space (output_max_len - (output_ptr - output)) and return an error if it doesn't fit, instead of trusting the per-block header.

Behavior change

Valid SNAPPYBLOCK streams are unaffected. A malformed/crafted stream that previously overflowed now returns an InternalError reporting the declared length and the available output length.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sahvx655-wq
Copy link
Copy Markdown
Author

Updated the description with the problem, the exact overflow path, and the fix. Short version: the large-block length is bounds-checked but each small block's uncompressed length comes from its own snappy header and goes straight into RawUncompress (no capacity arg), so a crafted block can write past the output buffer. The patch checks the per-block length against the remaining output before decompressing.

// without a destination-capacity argument, so the header-declared length must be
// checked against the remaining output buffer to avoid an out-of-bounds write.
std::size_t available_output_len = output_max_len - (output_ptr - output);
if (decompressed_small_block_len > available_output_len) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add beut for this case

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a beut in be/test/util/snappy_block_decompressor_test.cpp. It crafts a single-small-block SNAPPYBLOCK stream whose snappy header declares 4096 bytes but hands the decompressor a 64-byte output buffer (with the large-block length set to 1 so the outer check passes and we reach the inner one). Before the fix that path overflowed; now it returns an error. There's also a valid round-trip case so the check doesn't regress normal streams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants