-
-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blosc2_decompress_ctx can overrun src #40
Comments
Yes, for Blosc1 compressed chunks, I think this makes total sense. The proposal would be to add a new function for safe decompression, something like:
Also, I was thinking in optionally adding a (64-bit) hash at the end of frames (a new container which has been recently introduced in Blosc2). This could bring much higher safety for them. If a hash exists, functions like |
I think that if blosc2 is already making backwards-incompatible changes to the API, then why bother to introduce a new decompression function? I think it would be simpler just to make the normal decompression function safe-by-default. An alternate version, however, would be useful for blosc1, which doesn't want to break backwards compatibility. I think that a builtin checksum might be useful for some applications, but it doesn't help with the safety situation. A malicious chunk, after all, could still have a valid checksum. Checksums, even cryptographic ones, only protect against accidental damage. Regarding bundled checksums, there is one benefit that really intrigues me: cache efficiency. Using a separate checksum layer atop of the compression layer requires an application to traverse a block of data twice. Later parts of the first pass could evict useful data from the CPU cache. But a combined hashcompress algorithm could do the whole thing in a single pass. I don't know how much of a performance impact that would have, but it sounds like it might be significant. |
I have put some serious thought on this, and I think that in scenarios where safety is important, the user can still call One might argue that the |
For reference, I have just filed a new ticket for implementing checksums. |
However, |
Uh, I don't completely understand the difference between passing a size of the source buffer and taking a decision based on the On the other hand, one cannot implement the limitation of size of the source buffer as you suggest because it happens that blosc compressed buffers are not decompressed strictly sequentially, but require to access the index of the different blocks inside the buffer that are at the end of the buffer, so we are enforced to read at minimum For the record, the reason why the compressed buffers need the index at the end is because the parallelization creates blocks of different sizes, and the decoder needs this info in order to pass the starting points for every block for every decompressing thread. |
What I have in mind is an application that receives a blosc buffer from an untrusted source. The pseudocode would look like this:
Even if the received buffer is malicious, then
Not only does that strategy require additional error-prone code in every blosc consumer, but the output of Why can't |
Yes, I am suggesting exactly that. I agree that you cannot trust the output of |
blosc2_decompress_ctx
does not take the size of the source buffer as an argument. Instead, it assumes that the buffer is complete and correctly formed. That can result in array overruns for buffers that are truncated, malicious or otherwise corrupt. It should take the source buffer length as an argument, and never read beyond that.The text was updated successfully, but these errors were encountered: