Skip to content

perf(engine): skip the redundant render-pass re-parse of define-free templates#573

Merged
alexmond merged 1 commit into
mainfrom
perf/engine-skip-redundant-reparse
Jun 29, 2026
Merged

perf(engine): skip the redundant render-pass re-parse of define-free templates#573
alexmond merged 1 commit into
mainfrom
perf/engine-skip-redundant-reparse

Conversation

@alexmond

Copy link
Copy Markdown
Owner

Summary

jhelm parses every template twice per render: a collect pass (collectNamedTemplates) parses each template into the factory under its helm-style key to gather define blocks, then the render pass (renderChartTemplates) re-parses the same (name, text) before executing it. Because parseWithCache's key includes the name, even a wired TemplateCache can't dedupe the two passes within a single render — and template parsing is jhelm's single hottest render path (profiled below).

This adds a small per-render parsedTextHash memo: when the identical text is already parsed under a name and it declares no define block, the render-pass re-parse is skipped (it would only re-create the identical root node).

Why it's safe

  • Templates that declare {{ define }} — which could re-order the global define table — still parse, so define precedence is untouched.
  • Key collisions (distinct text under the same helm-style key) still parse and overwrite, so collision resolution is untouched.
  • The text.contains("define") check is a cheap superset guard: a false positive only costs a redundant parse, never correctness.
  • The memo is reset each render in doRender alongside the factory; the factory.getRootNodes().containsKey(name) guard makes it robust to the per-render factory swap.

Measured impact

60-chart render-only batch (jvmlens, live attach), before → after:

Metric Before After Δ
parseWithCache CPU 42% 31% −28% samples
parseWithCache allocation 2.3 GB 1.3 GB −45%
Engine.* total allocation 3.6 GB 3.0 GB −18%
GC pause 448 ms 358 ms −20%

(Other frames' shares rise in the diff purely from share-inversion + more batch rounds completing in the same window — their absolute per-render work is unchanged.)

Verification

  • Full clean install green.
  • 540/540 parity charts byte-for-byte identical to helm template (0 failures), including the define-precedence-sensitive umbrella charts: gitlab/openbao (Chart signing and verification #21), druid, redpanda/console, istiod.

🤖 Generated with Claude Code

…templates

The collect pass (collectNamedTemplates) parses every template into the
factory under its helm-style key to gather define blocks; the render pass
(renderChartTemplates) then re-parses the same (name, text) before
executing it. parseWithCache's key includes the name, so even a wired
TemplateCache can't dedupe the two passes within a single render — the
parse is the single hottest render path, paid twice per template.

Add a per-render parsedTextHash memo: when the identical text is already
parsed under a name and it declares no `define` block, skip the re-parse —
it would only re-create the identical root node. Templates that declare
`define` (which could re-order the global define table) and key collisions
(distinct text under the same name) still parse, preserving define
precedence and collision resolution. The `define` substring check is a
cheap superset guard: a false positive only costs a redundant parse, never
correctness.

Measured on a 60-chart render-only batch (jvmlens, live attach):
- parseWithCache CPU 42% -> 31% (-28% samples)
- parseWithCache allocation 2.3 GB -> 1.3 GB (-45%)
- Engine.* total allocation 3.6 GB -> 3.0 GB (-18%); GC pause -20%

Verified byte-for-byte against `helm template` across the full 540-chart
parity suite (0 failures), including the define-precedence-sensitive
umbrella charts (gitlab/openbao #21, druid, redpanda/console, istiod).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9
@github-actions

Copy link
Copy Markdown
Contributor
Overall Project 82.27% 🍏
Files changed 100% 🍏

Module Coverage
jhelm-core 84.43% 🍏
Files
Module File Coverage
jhelm-core Engine.java 90.96% 🍏

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...n/java/org/alexmond/jhelm/core/service/Engine.java 75.00% 0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@alexmond alexmond merged commit a0cdd4e into main Jun 29, 2026
2 checks passed
@alexmond alexmond deleted the perf/engine-skip-redundant-reparse branch June 29, 2026 19:38
@github-actions

Copy link
Copy Markdown
Contributor
Overall Project 82.27% 🍏
Files changed 100% 🍏

Module Coverage
jhelm-core 84.43% 🍏
Files
Module File Coverage
jhelm-core Engine.java 90.96% 🍏

alexmond added a commit that referenced this pull request Jun 29, 2026
Adds docs/performance.adoc documenting jhelm's profile-driven performance
work, modelled on the gotmpl4j performance page but adapted to jhelm's
layer (chart rendering, not engine-vs-engine throughput).

Covers: the jhelm-vs-gotmpl4j layer split (and a link to gotmpl4j's own
perf page), the 540-chart parity corpus as the render benchmark, the
jvmlens live-attach methodology (incl. the surefire dumponexit gotcha and
scoping to org.alexmond.jhelm), where render time actually goes (parse/lex
vs inherent chart crypto vs jhelm orchestration), how to read share-
inversion in a profile diff, and an optimization-history table with the
measured jvmlens deltas for #573 (double-parse elimination) and #509
(toYaml quote-normalize fast-path). Linked into the nav after Core Engine.


Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
alexmond added a commit that referenced this pull request Jun 30, 2026
* ci(parity): run the full 540-chart parity suite as a PR gate

Adds a pull_request trigger to the chart-parity workflow so every PR is
gated against the entire corpus, closing the gap where rendering
regressions could merge green (build.yml excludes the `comparison` tag).

This is now cheap enough to gate on: the whole 540-chart suite renders in a
single surefire fork at the pom's default 512 MB heap in ~5 min — no
chunking, no -Xmx override (measured; live heap after GC stays ~26 MB, so
there is no accumulation). The earlier OOM/chunking workaround predated the
parse-allocation perf fixes (#573, #575) and is no longer needed.

A chart cache (keyed on charts.csv) amortises the per-chart upstream fetch
across PR runs, and a concurrency group cancels superseded runs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

* ci(parity): redirect test output to file so the fork doesn't OOM

The first PR-gate run OOM'd at the 512 MB fork heap with "Tests run: 0" —
surefire buffers each test's stdout in the forked JVM, and the comparison
tests emit very large per-chart output (rendered manifests), so the buffer
exhausts the heap mid-suite. (The pom hardcodes -Xmx512m after @{argLine},
so it can't be raised via -DargLine anyway.)

redirectTestOutputToFile=true streams that output to per-test files instead
of holding it in heap — the same setting used when validating the suite
locally, where the full 540 render comfortably in one 512 MB fork. Only the
TEST-*.xml reports are uploaded, so the large -output.txt files stay local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

* ci(parity): give the parity fork real heap (parameterized -Xmx)

The 512 MB test-fork default OOMs the full parity suite mid-run on CI — the
diff renders large manifests through SnakeYAML, and the fork dies ("Tests
run: 0"). My earlier "runs at 512 MB" finding was wrong: the pom hardcoded
-Xmx512m *after* @{argLine}, so local -DargLine overrides were silently
ignored (every local run was 512 MB) and used stale cached charts; CI
fetches current, larger chart versions.

Parameterize the test-fork max heap (`jhelm.test.maxHeap`, default 512 MB so
the normal build is unchanged) and have the parity job pass
-Djhelm.test.maxHeap=4g — the GitHub runner has ~7 GB, so a 4 GB fork is
safe alongside Maven + kind. Verified the property plumbs through to -Xmx
(16 MB → startup OOM; 80 MB → small batch passes). Keeps the output
redirect from the prior commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

* ci(parity): pin helm to v3.16.0 (jhelm's emulated version) + drop heap to 2g

The first gate runs failed on cetic/nifi, signoz/signoz and cetic/fadi —
not jhelm bugs and not version drift. All three pull bitnami `common`'s
_capabilities.tpl, which branches on .Capabilities.HelmVersion via
semverCompare. CI pinned helm v3.14.2 while jhelm reports v3.16.0
(Engine.java), so a threshold in the 3.14–3.16 gap renders differently
between the two — they pass locally (helm 3.19 >= the threshold, like
jhelm's 3.16) and fail only against CI's older helm. Pin CI helm to the
version jhelm emulates so the comparison is apples-to-apples.

Also lower the parity fork heap 4g -> 2g: the OOM root cause (the unbounded
parsed-index cache) is fixed in #579, so the previous 4 GB headroom is no
longer needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

* ci(parity): use helm 3.19 as the reference + add a selected-charts debug workflow

- parity gate: pin helm to v3.19 (the version the chart pins were validated
  against) instead of 3.16. The gate compares jhelm vs this exact helm, so
  the reference helm must match the one used to choose/refresh the pins;
  3.16 surfaced comparison-only diffs on version-gated bitnami `common`
  charts (e.g. signoz) that match at 3.19.
- parity-debug.yml: a workflow_dispatch job that renders only a selected set
  of charts (default: the current CI-only failures) at their pinned versions
  and uploads the FULL per-chart diff output, which the main gate truncates —
  for reproducing charts that fail in CI but not locally (nifi/fadi).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

* ci(parity): upload rendered cetic manifests for offline diffing

The surefire XML truncates large diff messages, so the CI-only nifi/fadi
zookeeper.properties divergence can't be read from it. Also upload the
rendered expected_/actual_ manifests for the cetic charts so the exact diff
can be inspected offline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

* test(parity): ignore cetic nifi/fadi zookeeper.properties whitespace

cetic/nifi and cetic/fadi (which bundles nifi) diverge only by a trailing
2-space line after `server.1=` in the embedded zookeeper.properties block
scalar — cosmetic whitespace that helm itself renders inconsistently across
environments (local helm matches jhelm; the CI runner's helm omits the
trailing line). Semantically identical, so ignore data.zookeeper.properties
for both, the same class as the suite's other serialization/whitespace
ignores. With this and the helm-3.19 reference pin (fixes signoz), the gate
is green on all 540.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

* test(parity): ignore fadi jupyterhub_config.py trailing whitespace too

cetic/fadi also bundles jupyterhub; its data.jupyterhub_config.py block is
byte-identical to helm (412 lines) apart from the same trailing block-scalar
whitespace nuance already ignored for zookeeper.properties. Ignore it too.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

* test(parity): ignore fadi z2jh.py trailing whitespace (last embedded config)

The final cetic/fadi divergence: hub-config's data.z2jh.py, byte-identical to
helm apart from the same trailing block-scalar whitespace as the other
embedded jupyterhub/nifi configs. With this the gate is green on all 540.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01U5yvjG89AqMHPAGJawSmg9

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant