v1.7.1 — 2026-06-09
Open cycle — bug-fix / housekeeping. Entries land here as they merge.
Performance
-
Text wrapping stops re-measuring the growing line prefix. The greedy line
wrapper inTextFlowSupportnow keeps a running line width and measures each
token once, instead of re-measuring the whole accumulated line on every token.
This removes O(line-length × tokens) measured-character work — and the
per-glyph sanitize/encode it triggered — from paragraph layout. Output is
byte-identical: all layout and visual-regression snapshots pass unchanged.
The effect is workload-dependent and concentrated in long-text documents;
measured locally (same-session A/B, full profile) a long multi-page proposal
rendered markedly faster, and a measurement-count probe showed ~9× fewer
measured characters on a long paragraph. No public API or behaviour change. -
Long-token line breaking is no longer quadratic.
TextFlowSupport.fitCharacters
now binary-searches the break point instead of re-measuring every growing prefix
one character at a time. For an unbreakable run (long URL/ID, no-space CJK, or a
very narrow column) this cuts measurement calls and measured characters by
~80–85% (probe: 652 → 97 width calls, 36k → 7k measured chars on a 600-char
token). Output is byte-identical — the fit predicate is monotonic, so the
search returns the same break index. No public API or behaviour change. -
Text measurement no longer embeds binary fonts into a throwaway document.
The layout measurement pipeline used to subset-embed every Google/custom font
family into a privatePDDocumentthat was immediately discarded — repeated on
every newDocumentSession, because each render in a server opens a fresh
session. Measurement now resolves binary families to a per-thread cached
font (mirroring the existing parsed-TrueType cache) bound to a reusable,
never-saved document, so a family embeds once per worker thread instead of once
per session, and opening measurement resources owns no PDF document at all.
Output is byte-identical — both paths read glyph widths and metrics from the
same parsedTrueTypeFont; proven by a 960-case render-vs-measurement
width-parity check (max |Δ| = 0.0), a newMeasurementFontParityTest, and the
full visual-regression / snapshot suite passing unchanged. Only Google/custom-font
documents are affected (the standard-14 path never embedded); a measurement probe
showed the per-session embed waste drop ~94–97% (≈1.5–3 MB and ≈2–4.5 ms of font
subsetting removed per session after the first on a thread). Standard-14-only
documents are unaffected. No public API or behaviour change. -
Glyph-coverage probing is memoized instead of repeated per glyph. The render
sanitizer (GlyphFallbackLogger.sanitize— shared by paragraph spans, table
cells, watermark and header/footer chrome, and by width measurement) used to
callPDFont.encodefor every code point of every string — allocating a
Stringper glyph and, for any glyph the font cannot encode, throwing and
catching an exception — at measurement and again at render. Coverage is now
memoized per(font, code point):encoderuns once per distinct glyph, then
it is a map lookup, and kept glyphs append by code point with no per-glyph
String. Output is byte-identical — the substitution decision is the same
encode, only cached; the glyph-fallback warning cadence is unchanged (pinned
byPdfFontSanitizerTest, and width parity byMeasurementFontParityTest).
This removes real per-glyph work from the render hot path: a long document
re-probed tens of thousands of glyph occurrences that now collapse to roughly
the number of distinct characters it uses. No public API or behaviour change. -
Paragraph render writes font and colour operators only when they change. The
paragraph render handler emitted asetFont(Tf) andsetNonStrokingColor
(rg) operator for every text span, even across the spans of a single-style
paragraph. It now tracks the last-written(font, size)and colour across the
paragraph's graphics-state block and re-emits only on a real change (invalidating
after inline images/shapes), so a multi-span single-style paragraph carries one
Tf+ onerginstead of one pair per span — fewer operators for PDFBox to
serialize. Rendered output is unchanged (the skipped operators were
redundant); pinned by the visual-regression suite plus a content-stream test
asserting oneTfacross many drawn spans. No public API or behaviour change. -
Table cell text is sanitized once per cell instead of three times. Resolving
a table ran each cell's lines throughsanitizeCellLinesseparately in the
natural-width, natural-height and resolve passes, rebuilding the list and its
per-line control-character cleanup up to three times per cell. The sanitized
lines are now computed once when the logical grid is built and reused by all
three passes. Output is byte-identical (sanitization is deterministic); on a
large table this removes the dominant per-cell layout allocation. No public API
or behaviour change. -
Process-wide line-metrics cache stops inserting instead of flushing when full.
The static line-metrics cacheclear()-ed every entry once it passed 50,000
distinct styles — a full flush whose non-atomic check-then-clear is a
thundering-herd recompute under concurrent rendering. It now stops inserting at
the cap and keeps the existing entries (distinct styles are few in real use, so
this is only a pathological-explosion guard; it runs on a cache miss, never on
the per-measurement path). Measured line metrics are unchanged. No public API
or behaviour change. -
Auto-size font fitting binary-searches the size grid. A paragraph with
autoSize(...)resolved its font size by scanning every step from max down to
min, re-measuring the line at each candidate (up to ~50 measurements). Line width
is linear in font size, so the fit is monotonic — the search now binary-searches
the grid for the same boundary in ~log2(n) measurements instead of n. Output is
byte-identical — it returns the same grid size the linear scan did (covered by
the existing auto-size integration and snapshot tests). No public API or behaviour
change. -
Table pagination stops re-copying the tail on every page split. A table that
spans many pages is split page-by-page, and each split re-sliced the shrinking
tail byList.copyOf-ing its row and row-height lists — even though the source
layout already holds those lists immutably, so the copy made continuation
O(rows × pages). The body-only slice now reuses the immutable sub-list views
directly. Output is byte-identical — same rows in the same order (all table
layout, pagination, and visual-regression tests pass unchanged); a deterministic
allocation probe on a 2,500-row / 68-page table shows warm compile allocation
drop 11,155 KB → 9,851 KB (−11.7%). No public API or behaviour change.
Deprecations
-
Font.adjustFontSizeToFit(...)is deprecated. The engine-internal
Font#adjustFontSizeToFit(and itsPdfFont/WordFontimplementations) is
unused and incorrect — the only real implementation re-measured with the
unchanged style, so it always returned the minimum size. Canonical auto-size is
resolved by the layout compiler. The method is kept for binary compatibility and
scheduled for removal in the next major. -
The legacy ECS engine packages are deprecated.
com.demcha.compose.engine.core,
engine.layout(andengine.layout.container), andengine.paginationare the
originalEntity-based layout/pagination engine — a parallel second engine
whose execution path the canonical pipeline
(GraphCompose.document() → DocumentSession → LayoutCompiler) never runs; it
imports nothing from them directly, and the formerGraphCompose.pdf(...)
entry point has already been removed. The ECS execution engine runs only under
the legacy engine regression tests. The packages are now@Deprecated(package
level, so no deprecation-warning cascade)
with corrected package docs, to stop misdirecting contributors into optimizing a
dead engine. The genuinely shared engine packages (engine.components,
engine.measurement,engine.font,engine.render) are not deprecated.
No public API or behaviour change. -
TextMeasurementSystemdecoupled fromengine.core.SystemECS. The shared
text-measurement contract (engine.measurement.TextMeasurementSystem) dropped
its vestigialextends SystemECSand the no-opprocess(EntityManager)default
it carried — it was never consumed as an ECS system. The legacy ECS engine now
obtains the measurement service viaSystemRegistry.registerTextMeasurement(...)
/textMeasurement()instead of enrolling it as aprocess()-driven system,
completing the isolation of the deprecatedengine.corefrom live and shared
code (only the legacy engine regression tests still reference it). Dropping the
super-interface is binary-incompatible on paper, so
engine.measurement.TextMeasurementSystemis excluded from the japicmp gate
until the baseline advances past this release. No canonical API or behaviour
change. -
The legacy ECS PDF render pipeline is deprecated. Follow-up to the ECS
engine deprecation above. TheEntity-based PDFBox renderer
(PdfRenderingSystemECSand its collaborators —PdfRenderSession,PdfCanvas,
PdfStream,PdfImageCache,PdfFileManagerSystem,PdfGuidesRenderer, the
render-marker handlers, and theTableCellBox/PdfBookmarkBuilderhelpers) is
the renderer for the removedGraphCompose.pdf(...)surface and now runs only
under the legacy engine regression tests; canonical PDF output goes through
com.demcha.compose.document.backend.fixed.pdf. Becauseengine.render.pdfis a
mixed package — it also holds the canonical-sharedPdfFont,
GlyphFallbackLogger, and the header/footer + watermark post-processors — the
legacy classes were physically moved into a newengine.render.pdf.ecs
(with.handlers/.helperssub-packages), which is then@Deprecatedat
package level (so no deprecation-warning cascade, same pattern as the ECS engine
packages). The four genuinely sharedengine.render.pdftypes are not
deprecated and stay put. No behaviour change. The relocated renderer has no
public entry point and carries no binary-compatibility promise, so the move is
excluded from the japicmp gate rather than treated as a breaking removal.
Internal
- Text-measurement line metrics resolve through the
Fontcontract instead of a
PDF-specific fast path.FontLibraryTextMeasurementSystempreviously
special-casedinstanceof PdfFontto obtain real ascent/descent/leading — every
other backend font fell back to a degradedlineHeight-only metric — which
coupled the shared measurement system toengine.render.pdf.PdfFontand meant a
new backend could get first-class metrics only by editing shared code. Vertical
metrics and the process-wide cache key now live on the backend-neutralFont<T>
seam (Font.lineMetrics(...)+Font.measurementCacheKey(...), bothdefault
methods; newFontLineMetricsrecord), so a backend supplies first-class metrics
by overriding the contract and the shared measurement system no longer imports
PdfFont. Binary-compatible (default methods only; japicmp green) and
behaviour-neutral — PDF and Word produce identical metrics, covered by the
existing suite plus new polymorphism tests.
Tests / tooling
-
Benchmark regression gate and measurement probe (benchmarks module, not part
of the published library).BenchmarkVerdictToolcompares a current-speed run
to the committed baseline (baselines/current-speed-full.json) and reports
improved / neutral / regressed. The hard gate fails only on an average-latency
regression beyond the noise band; peak heap is advisory (thepeakHeapMb
used-heap delta is GC-timing noisy — use the probe's per-compile allocation
bytes for deterministic heap). A single run is advisory; the hard gate needs a
median (-Repeat>= 2).
MeasurementCountBenchmark+CountingTextMeasurementSystemcapture
deterministic measurement-call counts and per-compile allocation bytes for
proving algorithmic / allocation changes (the probe warms up the JVM before its
allocation window, soAlloc KBreflects steady state, not one-time
class-load / JIT cold-start).scripts/run-benchmarks.ps1gains the
11-verdict-current-speedstep (skippable via-SkipVerdict). -
Cross-platform A/B benchmark harness.
scripts/ab-bench.sh(Linux / macOS /
Windows Git Bash) joins the PowerShellscripts/ab-bench.ps1to compare engine
speed between two branches — interleaved runs, median, per-scenario diff via the
existingBenchmarkMedianTool/BenchmarkDiffTool. A path-filtered
ab-bench-smokeCI job runs it on Linux;.gitattributespins*.shandmvnw
to LF so the wrappers stay runnable cross-platform. Benchmark tooling only — not
part of the published library.