blosc2_decompress_ctx can overrun src #40

asomers · 2018-04-20T17:38:46Z

blosc2_decompress_ctx does not take the size of the source buffer as an argument. Instead, it assumes that the buffer is complete and correctly formed. That can result in array overruns for buffers that are truncated, malicious or otherwise corrupt. It should take the source buffer length as an argument, and never read beyond that.

The text was updated successfully, but these errors were encountered:

FrancescAlted · 2018-08-10T16:54:44Z

Yes, for Blosc1 compressed chunks, I think this makes total sense. The proposal would be to add a new function for safe decompression, something like:

int blosc2_decompress_safe_ctx(blosc2_context* context, const void* src, size_t srcsize, void* dest, size_t destsize);

Also, I was thinking in optionally adding a (64-bit) hash at the end of frames (a new container which has been recently introduced in Blosc2). This could bring much higher safety for them. If a hash exists, functions like blosc2_frame_decompress_chunk() could double check that the contents of the frame are ok and safely proceed if that is the case. What do you think?

asomers · 2018-08-10T17:16:04Z

I think that if blosc2 is already making backwards-incompatible changes to the API, then why bother to introduce a new decompression function? I think it would be simpler just to make the normal decompression function safe-by-default. An alternate version, however, would be useful for blosc1, which doesn't want to break backwards compatibility.

I think that a builtin checksum might be useful for some applications, but it doesn't help with the safety situation. A malicious chunk, after all, could still have a valid checksum. Checksums, even cryptographic ones, only protect against accidental damage.

Regarding bundled checksums, there is one benefit that really intrigues me: cache efficiency. Using a separate checksum layer atop of the compression layer requires an application to traverse a block of data twice. Later parts of the first pass could evict useful data from the CPU cache. But a combined hashcompress algorithm could do the whole thing in a single pass. I don't know how much of a performance impact that would have, but it sounds like it might be significant.

FrancescAlted · 2019-08-13T12:09:56Z

I have put some serious thought on this, and I think that in scenarios where safety is important, the user can still call blosc_cbuffer_sizes() first to check whether the cbytes (i.e. the number of bytes to be read from src buffer) is safe for her. I have added a note about this in commit 2c9d598.

One might argue that the destsize argument could be removed for the same reason, but IMO, limiting the destination size is generally more useful, so I think it is fine keeping it.

FrancescAlted · 2019-08-13T12:20:44Z

For reference, I have just filed a new ticket for implementing checksums.

asomers · 2019-08-13T14:49:58Z

However, blosc_cbuffer_sizes works by interrogating the provided buffer. So it still doesn't protect against a corrupted buffer. Blosc's compression and decompression routines should be safe even if provided with a corrupted or garbage buffer. Even a maliciously crafted buffer shouldn't be allowed to cause an array overrun.

FrancescAlted · 2019-08-13T15:37:17Z

Uh, I don't completely understand the difference between passing a size of the source buffer and taking a decision based on the blosc_cbuffer_sizes() feedback. Normally you have an educated guess for the size of the source buffer and this can be compared with the cbytes returned by blosc_cbuffer_sizes(); in case there is too much a difference you can distrust the buffer, right?

On the other hand, one cannot implement the limitation of size of the source buffer as you suggest because it happens that blosc compressed buffers are not decompressed strictly sequentially, but require to access the index of the different blocks inside the buffer that are at the end of the buffer, so we are enforced to read at minimum cbytes bytes.

For the record, the reason why the compressed buffers need the index at the end is because the parallelization creates blocks of different sizes, and the decoder needs this info in order to pass the starting points for every block for every decompressing thread.

asomers · 2019-08-13T16:14:58Z

What I have in mind is an application that receives a blosc buffer from an untrusted source. The pseudocode would look like this:

let buffer = read_from_(net|disk|other);
let destlen = length_that_I_expect_based_on_metadata_stored_somewhere_else();
let dest = make_buffer(destlen);
let err = blosc2_decompress_safe_ctx(ctx, buffer.base_address, buffer.size, dest.base_address, destlen);

Even if the received buffer is malicious, then blosc2_decompress_safe_ctx won't overrun the source buffer, because it knows the source length. You seem to suggest that I should do the following instead:

let buffer = read_from_(net|disk|other);
let destlen = length_that_I_expect_based_on_metadata_stored_somewhere_else();
let dest = make_buffer(destlen);
let (nbytes, cbytes, blocksize) = blosc_cbuffer_sizes(buffer.base_address);
if (cbytes != buffer.len)
    return EINVAL;
if (nbytes != destlen)
    return EINVAL;
let err = blosc2_decompress_safe_ctx(ctx, buffer.base_address, dest.base_address, destlen);

Not only does that strategy require additional error-prone code in every blosc consumer, but the output of blosc_cbuffer_sizes cannot be trusted if the buffer came from an untrusted source.

Why can't blosc2_decompress_safe_ctx enforce the length of the source buffer? It shouldn't be a problem that the buffer is read non-sequentially. All that is required is that it never read beyond buffer.base_address + buffer.size.

FrancescAlted · 2019-08-13T16:49:18Z

Yes, I am suggesting exactly that. I agree that you cannot trust the output of blosc_cbuffer_sizes(), but you can trust your buffer.lenand if they do not match, then you should not proceed decompressing the buffer. I concede that this is a bit more work for users (not that much, just one more call), but this should be a fair price to pay for people that are worried about source trustyness.

FrancescAlted closed this as completed Aug 13, 2019

FrancescAlted mentioned this issue Aug 14, 2019

Add blosc2_decompress_safe() #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blosc2_decompress_ctx can overrun src #40

blosc2_decompress_ctx can overrun src #40

asomers commented Apr 20, 2018

FrancescAlted commented Aug 10, 2018

asomers commented Aug 10, 2018

FrancescAlted commented Aug 13, 2019 •

edited

Loading

FrancescAlted commented Aug 13, 2019

asomers commented Aug 13, 2019

FrancescAlted commented Aug 13, 2019

asomers commented Aug 13, 2019

FrancescAlted commented Aug 13, 2019

blosc2_decompress_ctx can overrun src #40

blosc2_decompress_ctx can overrun src #40

Comments

asomers commented Apr 20, 2018

FrancescAlted commented Aug 10, 2018

asomers commented Aug 10, 2018

FrancescAlted commented Aug 13, 2019 • edited Loading

FrancescAlted commented Aug 13, 2019

asomers commented Aug 13, 2019

FrancescAlted commented Aug 13, 2019

asomers commented Aug 13, 2019

FrancescAlted commented Aug 13, 2019

FrancescAlted commented Aug 13, 2019 •

edited

Loading