Skip to content

GraphCompose v1.7.1

Latest

Choose a tag to compare

@github-actions github-actions released this 09 Jun 20:00

v1.7.1 — 2026-06-09

Open cycle — bug-fix / housekeeping. Entries land here as they merge.

Performance

  • Text wrapping stops re-measuring the growing line prefix. The greedy line
    wrapper in TextFlowSupport now keeps a running line width and measures each
    token once, instead of re-measuring the whole accumulated line on every token.
    This removes O(line-length × tokens) measured-character work — and the
    per-glyph sanitize/encode it triggered — from paragraph layout. Output is
    byte-identical: all layout and visual-regression snapshots pass unchanged.

    The effect is workload-dependent and concentrated in long-text documents;
    measured locally (same-session A/B, full profile) a long multi-page proposal
    rendered markedly faster, and a measurement-count probe showed ~9× fewer
    measured characters on a long paragraph. No public API or behaviour change.

  • Long-token line breaking is no longer quadratic. TextFlowSupport.fitCharacters
    now binary-searches the break point instead of re-measuring every growing prefix
    one character at a time. For an unbreakable run (long URL/ID, no-space CJK, or a
    very narrow column) this cuts measurement calls and measured characters by
    ~80–85% (probe: 652 → 97 width calls, 36k → 7k measured chars on a 600-char
    token). Output is byte-identical — the fit predicate is monotonic, so the
    search returns the same break index. No public API or behaviour change.

  • Text measurement no longer embeds binary fonts into a throwaway document.
    The layout measurement pipeline used to subset-embed every Google/custom font
    family into a private PDDocument that was immediately discarded — repeated on
    every new DocumentSession, because each render in a server opens a fresh
    session. Measurement now resolves binary families to a per-thread cached
    font (mirroring the existing parsed-TrueType cache) bound to a reusable,
    never-saved document, so a family embeds once per worker thread instead of once
    per session, and opening measurement resources owns no PDF document at all.
    Output is byte-identical — both paths read glyph widths and metrics from the
    same parsed TrueTypeFont; proven by a 960-case render-vs-measurement
    width-parity check (max |Δ| = 0.0), a new MeasurementFontParityTest, and the
    full visual-regression / snapshot suite passing unchanged. Only Google/custom-font
    documents are affected (the standard-14 path never embedded); a measurement probe
    showed the per-session embed waste drop ~94–97% (≈1.5–3 MB and ≈2–4.5 ms of font
    subsetting removed per session after the first on a thread). Standard-14-only
    documents are unaffected. No public API or behaviour change.

  • Glyph-coverage probing is memoized instead of repeated per glyph. The render
    sanitizer (GlyphFallbackLogger.sanitize — shared by paragraph spans, table
    cells, watermark and header/footer chrome, and by width measurement) used to
    call PDFont.encode for every code point of every string — allocating a
    String per glyph and, for any glyph the font cannot encode, throwing and
    catching an exception
    — at measurement and again at render. Coverage is now
    memoized per (font, code point): encode runs once per distinct glyph, then
    it is a map lookup, and kept glyphs append by code point with no per-glyph
    String. Output is byte-identical — the substitution decision is the same
    encode, only cached; the glyph-fallback warning cadence is unchanged (pinned
    by PdfFontSanitizerTest, and width parity by MeasurementFontParityTest).
    This removes real per-glyph work from the render hot path: a long document
    re-probed tens of thousands of glyph occurrences that now collapse to roughly
    the number of distinct characters it uses. No public API or behaviour change.

  • Paragraph render writes font and colour operators only when they change. The
    paragraph render handler emitted a setFont (Tf) and setNonStrokingColor
    (rg) operator for every text span, even across the spans of a single-style
    paragraph. It now tracks the last-written (font, size) and colour across the
    paragraph's graphics-state block and re-emits only on a real change (invalidating
    after inline images/shapes), so a multi-span single-style paragraph carries one
    Tf + one rg instead of one pair per span — fewer operators for PDFBox to
    serialize. Rendered output is unchanged (the skipped operators were
    redundant); pinned by the visual-regression suite plus a content-stream test
    asserting one Tf across many drawn spans. No public API or behaviour change.

  • Table cell text is sanitized once per cell instead of three times. Resolving
    a table ran each cell's lines through sanitizeCellLines separately in the
    natural-width, natural-height and resolve passes, rebuilding the list and its
    per-line control-character cleanup up to three times per cell. The sanitized
    lines are now computed once when the logical grid is built and reused by all
    three passes. Output is byte-identical (sanitization is deterministic); on a
    large table this removes the dominant per-cell layout allocation. No public API
    or behaviour change.

  • Process-wide line-metrics cache stops inserting instead of flushing when full.
    The static line-metrics cache clear()-ed every entry once it passed 50,000
    distinct styles — a full flush whose non-atomic check-then-clear is a
    thundering-herd recompute under concurrent rendering. It now stops inserting at
    the cap and keeps the existing entries (distinct styles are few in real use, so
    this is only a pathological-explosion guard; it runs on a cache miss, never on
    the per-measurement path). Measured line metrics are unchanged. No public API
    or behaviour change.

  • Auto-size font fitting binary-searches the size grid. A paragraph with
    autoSize(...) resolved its font size by scanning every step from max down to
    min, re-measuring the line at each candidate (up to ~50 measurements). Line width
    is linear in font size, so the fit is monotonic — the search now binary-searches
    the grid for the same boundary in ~log2(n) measurements instead of n. Output is
    byte-identical
    — it returns the same grid size the linear scan did (covered by
    the existing auto-size integration and snapshot tests). No public API or behaviour
    change.

  • Table pagination stops re-copying the tail on every page split. A table that
    spans many pages is split page-by-page, and each split re-sliced the shrinking
    tail by List.copyOf-ing its row and row-height lists — even though the source
    layout already holds those lists immutably, so the copy made continuation
    O(rows × pages). The body-only slice now reuses the immutable sub-list views
    directly. Output is byte-identical — same rows in the same order (all table
    layout, pagination, and visual-regression tests pass unchanged); a deterministic
    allocation probe on a 2,500-row / 68-page table shows warm compile allocation
    drop 11,155 KB → 9,851 KB (−11.7%). No public API or behaviour change.

Deprecations

  • Font.adjustFontSizeToFit(...) is deprecated. The engine-internal
    Font#adjustFontSizeToFit (and its PdfFont / WordFont implementations) is
    unused and incorrect — the only real implementation re-measured with the
    unchanged style, so it always returned the minimum size. Canonical auto-size is
    resolved by the layout compiler. The method is kept for binary compatibility and
    scheduled for removal in the next major.

  • The legacy ECS engine packages are deprecated. com.demcha.compose.engine.core,
    engine.layout (and engine.layout.container), and engine.pagination are the
    original Entity-based layout/pagination engine — a parallel second engine
    whose execution path the canonical pipeline
    (GraphCompose.document() → DocumentSession → LayoutCompiler) never runs; it
    imports nothing from them directly, and the former GraphCompose.pdf(...)
    entry point has already been removed. The ECS execution engine runs only under
    the legacy engine regression tests. The packages are now @Deprecated (package
    level, so no deprecation-warning cascade)
    with corrected package docs, to stop misdirecting contributors into optimizing a
    dead engine. The genuinely shared engine packages (engine.components,
    engine.measurement, engine.font, engine.render) are not deprecated.
    No public API or behaviour change.

  • TextMeasurementSystem decoupled from engine.core.SystemECS. The shared
    text-measurement contract (engine.measurement.TextMeasurementSystem) dropped
    its vestigial extends SystemECS and the no-op process(EntityManager) default
    it carried — it was never consumed as an ECS system. The legacy ECS engine now
    obtains the measurement service via SystemRegistry.registerTextMeasurement(...)
    / textMeasurement() instead of enrolling it as a process()-driven system,
    completing the isolation of the deprecated engine.core from live and shared
    code (only the legacy engine regression tests still reference it). Dropping the
    super-interface is binary-incompatible on paper, so
    engine.measurement.TextMeasurementSystem is excluded from the japicmp gate
    until the baseline advances past this release. No canonical API or behaviour
    change.

  • The legacy ECS PDF render pipeline is deprecated. Follow-up to the ECS
    engine deprecation above. The Entity-based PDFBox renderer
    (PdfRenderingSystemECS and its collaborators — PdfRenderSession, PdfCanvas,
    PdfStream, PdfImageCache, PdfFileManagerSystem, PdfGuidesRenderer, the
    render-marker handlers, and the TableCellBox / PdfBookmarkBuilder helpers) is
    the renderer for the removed GraphCompose.pdf(...) surface and now runs only
    under the legacy engine regression tests; canonical PDF output goes through
    com.demcha.compose.document.backend.fixed.pdf. Because engine.render.pdf is a
    mixed package — it also holds the canonical-shared PdfFont,
    GlyphFallbackLogger, and the header/footer + watermark post-processors — the
    legacy classes were physically moved into a new engine.render.pdf.ecs
    (with .handlers / .helpers sub-packages), which is then @Deprecated at
    package level (so no deprecation-warning cascade, same pattern as the ECS engine
    packages). The four genuinely shared engine.render.pdf types are not
    deprecated and stay put. No behaviour change. The relocated renderer has no
    public entry point and carries no binary-compatibility promise, so the move is
    excluded from the japicmp gate rather than treated as a breaking removal.

Internal

  • Text-measurement line metrics resolve through the Font contract instead of a
    PDF-specific fast path.
    FontLibraryTextMeasurementSystem previously
    special-cased instanceof PdfFont to obtain real ascent/descent/leading — every
    other backend font fell back to a degraded lineHeight-only metric — which
    coupled the shared measurement system to engine.render.pdf.PdfFont and meant a
    new backend could get first-class metrics only by editing shared code. Vertical
    metrics and the process-wide cache key now live on the backend-neutral Font<T>
    seam (Font.lineMetrics(...) + Font.measurementCacheKey(...), both default
    methods; new FontLineMetrics record), so a backend supplies first-class metrics
    by overriding the contract and the shared measurement system no longer imports
    PdfFont. Binary-compatible (default methods only; japicmp green) and
    behaviour-neutral — PDF and Word produce identical metrics, covered by the
    existing suite plus new polymorphism tests.

Tests / tooling

  • Benchmark regression gate and measurement probe (benchmarks module, not part
    of the published library).
    BenchmarkVerdictTool compares a current-speed run
    to the committed baseline (baselines/current-speed-full.json) and reports
    improved / neutral / regressed. The hard gate fails only on an average-latency
    regression beyond the noise band; peak heap is advisory (the peakHeapMb
    used-heap delta is GC-timing noisy — use the probe's per-compile allocation
    bytes for deterministic heap). A single run is advisory; the hard gate needs a
    median (-Repeat >= 2).
    MeasurementCountBenchmark + CountingTextMeasurementSystem capture
    deterministic measurement-call counts and per-compile allocation bytes for
    proving algorithmic / allocation changes (the probe warms up the JVM before its
    allocation window, so Alloc KB reflects steady state, not one-time
    class-load / JIT cold-start). scripts/run-benchmarks.ps1 gains the
    11-verdict-current-speed step (skippable via -SkipVerdict).

  • Cross-platform A/B benchmark harness. scripts/ab-bench.sh (Linux / macOS /
    Windows Git Bash) joins the PowerShell scripts/ab-bench.ps1 to compare engine
    speed between two branches — interleaved runs, median, per-scenario diff via the
    existing BenchmarkMedianTool / BenchmarkDiffTool. A path-filtered
    ab-bench-smoke CI job runs it on Linux; .gitattributes pins *.sh and mvnw
    to LF so the wrappers stay runnable cross-platform. Benchmark tooling only — not
    part of the published library.