Skip to content

fix(engine, java): reduce heap retention by releasing AST and detection state eagerly#417

Open
vdstech wants to merge 1 commit into
cbomkit:mainfrom
vdstech:fix/issue-371-reduce-heap-retention
Open

fix(engine, java): reduce heap retention by releasing AST and detection state eagerly#417
vdstech wants to merge 1 commit into
cbomkit:mainfrom
vdstech:fix/issue-371-reduce-heap-retention

Conversation

@vdstech
Copy link
Copy Markdown

@vdstech vdstech commented May 20, 2026

Closes #371

Problem

Large Java codebases (28 000+ files) caused the scanner to consume up to
90 GB of heap. The root cause was DetectionExecutive holding strong
references to fully-resolved Java ASTs long after analysis completed,
combined with DetectionStore retaining its value/child collections for
the entire scan duration.

Solution

Eagerly release ASTs and detection state after each file:

  • DetectionExecutive: null out tree in a finally block in
    start() so the AST is GC-eligible immediately after rule analysis.
  • DetectionExecutive: expose a deferred-hook lifecycle — when no
    deferred hooks are registered, releaseResources() is called eagerly
    after emitFinding(); when deferred hooks exist, JavaBaseDetectionRule
    drives cleanup via releaseDeferredResources() in leaveFile().
  • IStatusReporting: add onDeferredHookRegistration() default method
    as a backward-compatible lifecycle callback.
  • DetectionStore: add release() to recursively clear
    detectionValues, children, and actionValue.
  • JavaBaseDetectionRule: call releaseDeferredResources() on all
    deferred executives in leaveFile().
  • JavaScanMemoryLogger: new utility to log JVM heap usage every N
    files, making memory regressions visible in scan output.

Testing

24 new unit tests covering the deferred-hook state machine, recursive
store release, and memory logger behaviour.

Observed impact

Validated against Elasticsearch (28 000 Java files) and Kafka (6 000 Java
files). Peak heap dropped from >90 GB to normal scanner levels

@vdstech vdstech requested a review from a team as a code owner May 20, 2026 04:34
@vdstech vdstech force-pushed the fix/issue-371-reduce-heap-retention branch 2 times, most recently from b870b60 to 4169621 Compare May 20, 2026 06:51
…n state eagerly

Fixes cbomkit#371. Large Java codebases (e.g. 28 000-file projects) could retain
hundreds of fully-resolved Java ASTs simultaneously because DetectionExecutive
held a strong reference to the tree after analysis, and DetectionStore instances
kept their full value/child collections alive until the end of the scan.

Changes:
- DetectionExecutive: null out tree in a finally block inside start() so the
  AST is eligible for GC immediately after rule analysis, even on exception.
- DetectionExecutive: track deferred-hook registrations with a boolean flag;
  call releaseResources() eagerly when no deferred hooks are present, and
  expose releaseDeferredResources() / hasDeferredHooks() / isReleased() for
  the language layer to drive cleanup after all hooks have fired.
- IStatusReporting: add default method onDeferredHookRegistration() as a
  backward-compatible lifecycle callback so DetectionExecutive can be notified
  when a deferred hook is registered without breaking existing implementations.
- DetectionStore: add release() to recursively clear detectionValues, children,
  and actionValue; simplify ifPresentOrElse(..., () -> {}) calls to ifPresent.
- JavaBaseDetectionRule: collect executives that registered deferred hooks and
  call releaseDeferredResources() on all of them in leaveFile(), ensuring state
  is freed after every file regardless of deferred activity.
- JavaAggregator: add resetLanguageSupport() and call JavaScanMemoryLogger.reset()
  inside reset() to keep observability counters aligned with scan lifecycle.
- JavaScanMemoryLogger: new lightweight utility that samples JVM heap usage and
  logs progress every N files, making memory regressions visible in scan logs.
- OutputFileJob: retain JavaScanMemoryLogger.Snapshot logging for observability;
  no functional change to CBOM output path.

Tests added:
- DetectionExecutiveLifecycleTest (8 tests) - deferred hook state machine
- DetectionStoreReleaseTest (7 tests) - recursive release correctness
- JavaScanMemoryLoggerTest (9 tests) - counter, throttling, and reset behaviour

Signed-off-by: Karunakar Mattaparthi <karunakar.mattaparthi@nokia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@vdstech vdstech force-pushed the fix/issue-371-reduce-heap-retention branch from 4169621 to 22ae4ac Compare May 20, 2026 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High heap memory requirements

1 participant