OPENNLP-1821: Prevent OutOfMemory Due To Huge Array Allocation #1022
OPENNLP-1821: Prevent OutOfMemory Due To Huge Array Allocation #1022mawiesne merged 8 commits intoapache:mainfrom
Conversation
rzo1
left a comment
There was a problem hiding this comment.
Overall, looks good to me, but can we make MAX_ENTRIES configurable via a system property? In addition, it might be worth to apply the same to ModelParameterChunker.readUTF() and TwoPassDataIndexer (EventStream.read()), no?
I thought this limit is sufficient and we don't need more customisation here. Do we need a system property? |
|
I will add the restriction in ModelParameterChunker but TwoPassDataIndexer usage seems not user controlled as it's coming from a temp file. @rzo1 |
|
We never know what people are doing in the wild, so a config option would be the safe way, imho. |
@rzo1 : Fixed. Please review |
|
I did a search for the same pattern in the morning and found Judging from Regardless of that: Checkstyle is currently failing. |
|
Thx @subbudvk for the recent contributions. Really looking forward for the next round of releases with your name in the contributors list. |
* Fix : Prevent OOM/DoS from Crafted Inputs * Customizable entry code in OpenNLP * Use Max_Entries Declared to prevent OOM * Use correct exception in fix for OOM (cherry picked from commit 96a073f)
Description
getOutcomes(), getOutcomePatterns(), and getPredicates() in AbstractModelReader
read a 32-bit integer from the binary stream and use it directly as an array size
with no bounds check. A malformed model file with any count field set to
Integer.MAX_VALUE causes an OutOfMemoryError at allocation time, before any
model data is validated.
Fix
Added a MAX_ENTRIES = 10_000_000 limit. All three methods now throw
InvalidFormatException if the count field is negative or exceeds the limit.