OPENNLP-1821: Prevent OutOfMemory Due To Huge Array Allocation by subbudvk · Pull Request #1022 · apache/opennlp

subbudvk · 2026-04-26T09:37:07Z

Description

getOutcomes(), getOutcomePatterns(), and getPredicates() in AbstractModelReader
read a 32-bit integer from the binary stream and use it directly as an array size
with no bounds check. A malformed model file with any count field set to
Integer.MAX_VALUE causes an OutOfMemoryError at allocation time, before any
model data is validated.

Fix

Added a MAX_ENTRIES = 10_000_000 limit. All three methods now throw
InvalidFormatException if the count field is negative or exceeds the limit.

rzo1

Overall, looks good to me, but can we make MAX_ENTRIES configurable via a system property? In addition, it might be worth to apply the same to ModelParameterChunker.readUTF() and TwoPassDataIndexer (EventStream.read()), no?

subbudvk · 2026-04-26T14:50:44Z

Overall, looks good to me, but can we make MAX_ENTRIES configurable via a system property? In addition, it might be worth to apply the same to ModelParameterChunker.readUTF() and TwoPassDataIndexer (EventStream.read()), no?

I thought this limit is sufficient and we don't need more customisation here. Do we need a system property?

subbudvk · 2026-04-26T14:56:01Z

I will add the restriction in ModelParameterChunker but TwoPassDataIndexer usage seems not user controlled as it's coming from a temp file. @rzo1

rzo1 · 2026-04-26T14:59:01Z

We never know what people are doing in the wild, so a config option would be the safe way, imho.

subbudvk · 2026-04-26T15:07:22Z

Overall, looks good to me, but can we make MAX_ENTRIES configurable via a system property? In addition, it might be worth to apply the same to ModelParameterChunker.readUTF() and TwoPassDataIndexer (EventStream.read()), no?

@rzo1 : Fixed. Please review

mawiesne

Thx @subbudvk - I've left several comments and one (open) question.

rzo1 · 2026-04-27T06:06:52Z

I did a search for the same pattern in the morning and found HeadRules (and AncoraSpanishHeadRules) : What do you guys think about HeadRules#199 (` AncoraSpanishHeadRules#215)?

Judging from HEAD_RULES_MODEL_ENTRY_NAME in ParserModel it could be reachable from loading a .bin?

Regardless of that: Checkstyle is currently failing.

mawiesne · 2026-04-27T07:25:54Z

Thx @subbudvk for the recent contributions. Really looking forward for the next round of releases with your name in the contributors list.

subbudvk · 2026-04-27T13:38:00Z

Thanks @mawiesne @rzo1

Thx @subbudvk for the recent contributions. Really looking forward for the next round of releases with your name in the contributors list.

Looking forward to make more meaningful contributions!

* Fix : Prevent OOM/DoS from Crafted Inputs * Customizable entry code in OpenNLP * Use Max_Entries Declared to prevent OOM * Use correct exception in fix for OOM (cherry picked from commit 96a073f)

subbudvk · 2026-04-27T17:03:33Z

Thanks team for merging my fix.

…

On Mon, 27 Apr, 2026, 12:26 am Richard Zowalla, ***@***.***> wrote: ***@***.**** approved this pull request. — Reply to this email directly, view it on GitHub <#1022 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3SG4T3ZARBMZ5K63H3AXDL4XZLW3AVCNFSM6AAAAACYGXV3VKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DCNZXGMYDIMZUGQ> . Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS <https://github.com/notifications/mobile/ios/A3SG4T3UMXOUMTVA22DAMX34XZLW3A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMJXG4ZTANBTGQ2KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KUZTPN52GK4S7NFXXG> and Android <https://github.com/notifications/mobile/android/A3SG4T6K2X7WYQTXYE6RP3L4XZLW3A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMJXG4ZTANBTGQ2KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K4ZTPN52GK4S7MFXGI4TPNFSA>. Download it today! You are receiving this because you authored the thread.Message ID: ***@***.***>

subbudvk added 2 commits April 26, 2026 15:01

Fix : Prevent OOM/DoS from Crafted Inputs

1a7cce5

Fix : Prevent OOM/DoS from Crafted Inputs

4fda81f

rzo1 reviewed Apr 26, 2026

View reviewed changes

subbudvk added 2 commits April 26, 2026 20:34

Customizable entry code in OpenNLP

ea9062b

Use Max_Entries Declared to prevent OOM

41b03c1

subbudvk requested a review from rzo1 April 26, 2026 15:07

rzo1 requested a review from mawiesne April 26, 2026 15:54

rzo1 approved these changes Apr 26, 2026

View reviewed changes

rzo1 changed the title ~~Prevent OOM/DoS in AbstractModelReader~~ OPENNLP-1821 - Prevent OutOfMemory Due To Huge Array Allocation Apr 26, 2026

rzo1 requested review from atarora and jzonthemtn April 26, 2026 19:11

mawiesne changed the title ~~OPENNLP-1821 - Prevent OutOfMemory Due To Huge Array Allocation~~ OPENNLP-1821: Prevent OutOfMemory Due To Huge Array Allocation Apr 26, 2026

mawiesne requested changes Apr 26, 2026

View reviewed changes

subbudvk added 3 commits April 27, 2026 09:09

Fix : Prevent OOM/DoS from Crafted Inputs

792cb9d

Remove changes from ModelParameterChunker

e2ec0f5

Use correct exception in fix for OOM

ab4b1ab

subbudvk requested a review from mawiesne April 27, 2026 03:42

mawiesne approved these changes Apr 27, 2026

View reviewed changes

mawiesne assigned subbudvk Apr 27, 2026

mawiesne added the java Pull requests that update Java code label Apr 27, 2026

Fix : Checkstyle errors

821e4b2

mawiesne approved these changes Apr 27, 2026

View reviewed changes

mawiesne merged commit 96a073f into apache:main Apr 27, 2026
9 checks passed

Conversation

subbudvk commented Apr 26, 2026

Uh oh!

rzo1 left a comment

Choose a reason for hiding this comment

Uh oh!

subbudvk commented Apr 26, 2026

Uh oh!

subbudvk commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzo1 commented Apr 26, 2026

Uh oh!

subbudvk commented Apr 26, 2026

Uh oh!

mawiesne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rzo1 commented Apr 27, 2026

Uh oh!

mawiesne commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

subbudvk commented Apr 27, 2026

Uh oh!

subbudvk commented Apr 27, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

subbudvk commented Apr 26, 2026 •

edited

Loading

mawiesne commented Apr 27, 2026 •

edited

Loading