fix(engine, java): reduce heap retention by releasing AST and detection state eagerly#417
Open
vdstech wants to merge 1 commit into
Open
fix(engine, java): reduce heap retention by releasing AST and detection state eagerly#417vdstech wants to merge 1 commit into
vdstech wants to merge 1 commit into
Conversation
b870b60 to
4169621
Compare
…n state eagerly Fixes cbomkit#371. Large Java codebases (e.g. 28 000-file projects) could retain hundreds of fully-resolved Java ASTs simultaneously because DetectionExecutive held a strong reference to the tree after analysis, and DetectionStore instances kept their full value/child collections alive until the end of the scan. Changes: - DetectionExecutive: null out tree in a finally block inside start() so the AST is eligible for GC immediately after rule analysis, even on exception. - DetectionExecutive: track deferred-hook registrations with a boolean flag; call releaseResources() eagerly when no deferred hooks are present, and expose releaseDeferredResources() / hasDeferredHooks() / isReleased() for the language layer to drive cleanup after all hooks have fired. - IStatusReporting: add default method onDeferredHookRegistration() as a backward-compatible lifecycle callback so DetectionExecutive can be notified when a deferred hook is registered without breaking existing implementations. - DetectionStore: add release() to recursively clear detectionValues, children, and actionValue; simplify ifPresentOrElse(..., () -> {}) calls to ifPresent. - JavaBaseDetectionRule: collect executives that registered deferred hooks and call releaseDeferredResources() on all of them in leaveFile(), ensuring state is freed after every file regardless of deferred activity. - JavaAggregator: add resetLanguageSupport() and call JavaScanMemoryLogger.reset() inside reset() to keep observability counters aligned with scan lifecycle. - JavaScanMemoryLogger: new lightweight utility that samples JVM heap usage and logs progress every N files, making memory regressions visible in scan logs. - OutputFileJob: retain JavaScanMemoryLogger.Snapshot logging for observability; no functional change to CBOM output path. Tests added: - DetectionExecutiveLifecycleTest (8 tests) - deferred hook state machine - DetectionStoreReleaseTest (7 tests) - recursive release correctness - JavaScanMemoryLoggerTest (9 tests) - counter, throttling, and reset behaviour Signed-off-by: Karunakar Mattaparthi <karunakar.mattaparthi@nokia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
4169621 to
22ae4ac
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #371
Problem
Large Java codebases (28 000+ files) caused the scanner to consume up to
90 GB of heap. The root cause was
DetectionExecutiveholding strongreferences to fully-resolved Java ASTs long after analysis completed,
combined with
DetectionStoreretaining its value/child collections forthe entire scan duration.
Solution
Eagerly release ASTs and detection state after each file:
DetectionExecutive: null outtreein afinallyblock instart()so the AST is GC-eligible immediately after rule analysis.DetectionExecutive: expose a deferred-hook lifecycle — when nodeferred hooks are registered,
releaseResources()is called eagerlyafter
emitFinding(); when deferred hooks exist,JavaBaseDetectionRuledrives cleanup via
releaseDeferredResources()inleaveFile().IStatusReporting: addonDeferredHookRegistration()default methodas a backward-compatible lifecycle callback.
DetectionStore: addrelease()to recursively cleardetectionValues,children, andactionValue.JavaBaseDetectionRule: callreleaseDeferredResources()on alldeferred executives in
leaveFile().JavaScanMemoryLogger: new utility to log JVM heap usage every Nfiles, making memory regressions visible in scan output.
Testing
24 new unit tests covering the deferred-hook state machine, recursive
store release, and memory logger behaviour.
Observed impact
Validated against Elasticsearch (28 000 Java files) and Kafka (6 000 Java
files). Peak heap dropped from >90 GB to normal scanner levels