Skip to content

[COMPRESS-726] Add optional decompression bomb protection to CompressorStreamFactory#777

Draft
tomasilluminati wants to merge 1 commit into
apache:masterfrom
tomasilluminati:COMPRESS-726-bomb-guard
Draft

[COMPRESS-726] Add optional decompression bomb protection to CompressorStreamFactory#777
tomasilluminati wants to merge 1 commit into
apache:masterfrom
tomasilluminati:COMPRESS-726-bomb-guard

Conversation

@tomasilluminati

Copy link
Copy Markdown

Adds output-counting guard BombGuardCompressorInputStream that caps decompressed size and an optional mean compression-ratio check with a configurable grace. Off by default, opt-in via CompressorStreamFactory setters. CompressorInputStream now implements InputStreamStatistics. Tests and a changes.xml entry are included.

…orStreamFactory

Adds output-counting guard BombGuardCompressorInputStream that caps decompressed size and an optional mean compression-ratio check with a configurable grace. Off by default, opt-in via CompressorStreamFactory setters. CompressorInputStream now implements InputStreamStatistics. Tests and a changes.xml entry are included.
@tomasilluminati

Copy link
Copy Markdown
Author

@ppkarwasz COMPRESS-726 PR Opened

@garydgregory garydgregory left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi All,
I have some initial thoughts on this line of work:

  • Let's loose the whole "bomb" FUD terminology. I might just want to enforce limits. I should be able to do this in one place that also happens to catch possible attacks if an app decides to accept untrusted input. The goal is to enforce a limit, regardless of the input's intent. I have a door on my house to avoid people, animals, weather, and so on from coming in, I don't have a "thief-guard".
  • Use a builder for construction with a private ctor(Builder), otherwise, this will be another class that will get one constructor after another over time.
  • Why no pull up the feature in the super class or move it to IO's BoundedInputStream? The code already uses a BoundedInputStream after all.

@tomasilluminati

Copy link
Copy Markdown
Author

@garydgregory
Thanks, this is really helpful, and honestly it points at a simpler shape than what I pushed.

If all we want is a decompressed-size limit, IO's BoundedInputStream already does it (setMaxCount plus an onMaxCount that throws). So we don't really need a compress-specific class or a new exception. I'm happy to drop BombGuardCompressorInputStream and DecompressionBombException and just have CompressorStreamFactory wrap the stream in a BoundedInputStream when a max size is set. That's the single place you're describing, it reuses IO, and the bomb wording goes away. The builder point sorts itself out too, since there's no class left to build.

The only bit that doesn't fit there is the compression-ratio check, since BoundedInputStream has no idea about ratios and it needs getCompressedCount() from the compressor side. So really the only open question is scope. Just the size limit, or the ratio guard too?

  • Size only, I drop the class and the exception and wire BoundedInputStream into the factory. Smallest change.
  • Ratio too, I keep it but move it onto CompressorInputStream where both counts already live, still exposed through the factory.

My hunch from your note is size-only, but just say the word and I'll redo it that way. Either way the InputStreamStatistics change stays, the compressed count is handy on its own.

Thanks again for the steer.

@garydgregory

Copy link
Copy Markdown
Member

@tomasilluminati
My initial reaction to the ratio check feature is that it's nice but I doubt folks will know how to use it properly without shooting themselves in the foot. "Our app blew up when we allowed users to import files of type foo but everything was fine with bar files" kind of thing. That said, there is a way to implement a ratio check I think using org.apache.commons.io.input.ProxyInputStream.AbstractBuilder.setAfterRead(IOIntConsumer) where you'd do the check in the consumer (BoundedInputStream extends ProxyInputStream).
This one can simmer... ;) maybe @ppkarwasz has some thoughts.

@tomasilluminati

Copy link
Copy Markdown
Author

@garydgregory
Yeah, I'm with you on the footgun. The ratio is genuinely hard to tune and easy to get wrong, exactly the "works for bar, blows up on foo" trap, so I'd rather not hand people a sharp tool they can't aim.

Nice pointer on setAfterRead, hadn't thought of it. That actually means neither piece needs a new class. The size cap is setMaxCount plus a throwing onMaxCount, and if the ratio ever comes back it can ride along as a setAfterRead consumer on the same BoundedInputStream.

So my lean is to keep it minimal, wire an absolute decompressed-size limit into CompressorStreamFactory via BoundedInputStream, drop the bomb class and the exception, and leave the ratio out for now. We can always document it as a recipe later.

Happy to let it simmer and hear what @ppkarwasz thinks before I change anything.

@tomasilluminati

Copy link
Copy Markdown
Author

@garydgregory , I've got the minimal version ready, absolute decompressed-size limit composed over IO's BoundedInputStream, dropping the extra class and the exception. One note, since the cap is on the real decompressed output, it already covers the faked-payload-size case Piotr raised (it ignores any declared size). @ppkarwasz, given that, do you still want the mean-ratio guard and an on-by-default limit? The ratio fires earlier and needs no tuned number, so I'm happy to fold it back as a second, opt-in check if you want it. Otherwise I'll push the size-only version.

@ppkarwasz

Copy link
Copy Markdown
Member

My initial reaction to the ratio check feature is that it's nice but I doubt folks will know how to use it properly without shooting themselves in the foot. "Our app blew up when we allowed users to import files of type foo but everything was fine with bar files" kind of thing.

As with most features, we probably need to provide sensible defaults, so users don't need to modify them. AI gives the following usual compression rates:

  • Binaries: 2-3×
  • Text/source code: 3-5×
  • XML/JSON: 5-15×
  • Log files: 10-30×

Higher compression rates are impossible, unless you compress zeros, so 100× looks like a safe default. When I chose the default ratio for apache/mina#52, I tried to copy what HTTP Server does: mod_deflate, has these knobs:

  • DeflateInflateRatioLimit defaults to 200,
  • Rather than a grace size, mod_deflate only rejects a stream once the cumulative ratio stays above the limit for DeflateInflateRatioBurst + 1 (default 4) consecutive output buffers. With the default 8 KiB buffer, that corresponds to roughly 32 KiB of decompressed output.

@ppkarwasz

Copy link
Copy Markdown
Member

@ppkarwasz, given that, do you still want the mean-ratio guard and an on-by-default limit?

I guess, we can deliver a mean-ratio guard in a separate wrapper: only a limited number of users need it. Basically only if the compressed stream comes primarily from untrusted sources (like in a server), users might want to add a mean-ratio check as a fail-fast mechanism to prevent a DoS.

However, I think it would be nice if every CompressorInputStream inherited from BoundedInputStream: for archive formats that contain compressed data (ZIP, 7z), we usually have a (declared) decompressed size, so we could set that size as limit and prevent corrupted archives to fill the disk. Malicious archives, of course, would need the user to set a hard maximum decompressed size.

@garydgregory

Copy link
Copy Markdown
Member

To make it clear, I'll move this PR draft since it is not ready as there are a few aspects still being discussed.

@garydgregory garydgregory marked this pull request as draft July 2, 2026 11:46
@tomasilluminati

Copy link
Copy Markdown
Author

@ppkarwasz thanks, that lines up well.

A separate opt-in wrapper for the mean-ratio guard sounds right, it keeps the cost at zero for the common case and puts the fail-fast check where untrusted-source users actually want it. Happy to add it that way, following the mod_deflate shape you described (a burst tolerance in output buffers rather than a byte grace), with 100x as the default once it is opt-in.

On making every CompressorInputStream inherit from BoundedInputStream to use the declared ZIP/7z size: appealing for catching corrupt archives, but it is a much bigger change than this PR, it touches the base type every format extends and the japicmp surface, and the declared size lives at the archive layer rather than the codec layer, so wiring it cleanly is its own design. I'd keep it out of 726 and open a separate ticket if you want to pursue it.

So where I think this lands: absolute decompressed-size cap as the core here (composed over IO's BoundedInputStream), the mean-ratio guard as a separate opt-in wrapper with a 100x default, and the inherit-from-BoundedInputStream idea as its own follow-up. Does that match what you and @garydgregory have in mind before I rework the PR?

@garydgregory

Copy link
Copy Markdown
Member

Giving defaults like:

  • Binaries: 2-3×
  • Text/source code: 3-5×
  • XML/JSON: 5-15×
  • Log files: 10-30×

is why this should NOT be added as a feature to this class. It's based on guesses at best and will certainly shoot someone in the foot or a more useful body part.

@garydgregory

garydgregory commented Jul 2, 2026

Copy link
Copy Markdown
Member

@ppkarwasz, given that, do you still want the mean-ratio guard and an on-by-default limit?

I guess, we can deliver a mean-ratio guard in a separate wrapper: only a limited number of users need it. Basically only if the compressed stream comes primarily from untrusted sources (like in a server), users might want to add a mean-ratio check as a fail-fast mechanism to prevent a DoS.

However, I think it would be nice if every CompressorInputStream inherited from BoundedInputStream: for archive formats that contain compressed data (ZIP, 7z), we usually have a (declared) decompressed size, so we could set that size as limit and prevent corrupted archives to fill the disk. Malicious archives, of course, would need the user to set a hard maximum decompressed size.

I think it would be nice if every CompressorInputStream inherited from BoundedInputStream:

I think I like that idea.

What we are saying today is: If you want to bound decompression, you must compose your CompressorInputStream with a BoundedInputStream yourself and configure it.

Instead, what we would be saying is: Bounding decompression is so important to all possible call sites, that we are no longer offering composition as an option, we are now hard-coding the feature by inheriting it and making it available to all call sites, all the time. You still need to set a boundary in you call site.

Doing so would require breaking binary compatibility because the deprecated method CompressorInputStream.getCount() returns an int and BoundedInputStream.getCount() returns a long. Food for thought...

@tomasilluminati

Copy link
Copy Markdown
Author

@ppkarwasz, given that, do you still want the mean-ratio guard and an on-by-default limit?

I guess, we can deliver a mean-ratio guard in a separate wrapper: only a limited number of users need it. Basically only if the compressed stream comes primarily from untrusted sources (like in a server), users might want to add a mean-ratio check as a fail-fast mechanism to prevent a DoS.
However, I think it would be nice if every CompressorInputStream inherited from BoundedInputStream: for archive formats that contain compressed data (ZIP, 7z), we usually have a (declared) decompressed size, so we could set that size as limit and prevent corrupted archives to fill the disk. Malicious archives, of course, would need the user to set a hard maximum decompressed size.

I think it would be nice if every CompressorInputStream inherited from BoundedInputStream:

I think I like that idea.

What we are saying today is: If you want to bound decompression, you must compose your CompressorInputStream with a BoundedInputStream yourself and configure it.

Instead, what we would be saying is: Bounding decompression is so important to all possible call sites, that we are no longer offering composition as an option, we are now hard-coding the feature by inheriting it and making it available to all call sites, all the time. You still need to set a boundary in you call site.

Doing so would require breaking binary compatibility because the deprecated method CompressorInputStream.getCount() returns an int and BoundedInputStream.getCount() returns a long. Food for thought…

I like the intent, one honest catch beyond the getCount int/long break you flagged. BoundedInputStream bounds the bytes it reads from its wrapped stream, which for a decompressor is the compressed input side, whereas bomb protection needs to bound the decompressed output. So the inheritance only does the right thing if every format routes its output through the proxy, which is the big part of the change. Before anyone builds it, worth pinning down what the inherited bound counts, decompressed output or the compressed input it proxies. Happy to prototype whichever way you and @ppkarwasz settle on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants