Skip to content

v1.2.0 — addSignaturePlaceholder, ASN.1 fix, page-by-page streaming, UAX #9 embeddings, USE-lite classifier, smart tables

Choose a tag to compare

@Nizoka Nizoka released this 27 May 21:04
· 2 commits to main since this release
5312a53

Closes issues #45 (addSignaturePlaceholder() API) and #46 (X.509 issuer/subject DN slice corruption), ships object-boundary page-by-page streaming, completes UAX #9 with embedding controls (LRE/RLE/LRO/RLO/PDF), and lands a USE-lite cluster classifier for future Indic shaper rewires. 100% backward-compatible. Every new feature is additive or opt-in. Pre-existing PDFs are byte-identical for unchanged code paths.

Two roadmap items intentionally slip to v1.3.0: COLRv1 colour emoji (renderer needs PDF shading-dictionary polish) and pixel-diff visual regression (PNG-baseline tooling). Monochrome emoji from v1.1.0 is unchanged. The USE-lite classifier ships as a public API in v1.2.0; rewiring the Devanagari/Bengali/Tamil shapers to consume it is the v1.3.0 follow-up.

Highlights

  • feat(crypto, #45): new addSignaturePlaceholder(pdfBytes, options?) API — inject an AcroForm + invisible signature widget plus a /Sig dictionary into any existing PDF via an incremental update (ISO 32000-1 §7.5.6). Idempotent: returns the input unchanged when an /FT /Sig widget already exists. Enables the one-call signPdfBytes(addSignaturePlaceholder(buildDocumentPDFBytes(...))) ergonomic that downstream tooling (pdfnative-cli) previously shipped as a local workaround.
  • fix(crypto, #46): parseCertificate() issuer and subject raw slices now correctly begin with the ASN.1 SEQUENCE tag 0x30. ASN.1 decodeAt() only patched direct-child offsets, so grandchildren carried offsets relative to their parent's value buffer rather than the original DER — producing malformed slices that broke CMS IssuerAndSerialNumber parsing in Adobe Reader and openssl-cms. decodeAt() now walks descendants recursively to absolutise every offset; a defensive raw[0] === 0x30 assertion lives at the parseName() boundary.
  • feat(core): buildDocumentPDFStreamPageByPage() and buildPDFStreamPageByPage() — emit an existing PDF binary as an AsyncGenerator<Uint8Array> chunked at PDF object boundaries (\nendobj\n). Useful for streaming the assembled PDF over HTTP / Node WriteStream without holding the full body in memory beyond a single chunk. Internal page-by-page assembly (one page object at a time before the final binary exists) remains a v1.3 target — flagged in the JSDoc.
  • feat(shaping): UAX #9 explicit embeddings — normalizeBidiEmbeddings() rewrites LRE / RLE / LRO / RLO / PDF (U+202A–U+202E) to their sealed-isolate equivalents (LRI / RLI / PDI) using a stack with max depth 125 before the BiDi resolver runs. resolveBidiRuns() invokes the normaliser internally, so existing callers gain support transparently. Combined with the v1.1.0 isolates work, pdfnative now handles every UAX #9 directional control in common use. Character-level direction overrides inside LRO/RLO scopes (UAX #9 X4–X5) are simplified — only the base direction is normalised; full override tracking is deferred until users demand it.
  • feat(shaping): USE-lite cluster classifier in src/shaping/use-lite.tsclassifyUseCategory(cp) + classifyClusters(cps) return per-cluster { base, reph, prebase, postbase, premarks, postmarks } with per-script tables for Devanagari, Bengali, and Tamil. Public API ready to ship; consumed by the v1.3.0 shaper rewire.
  • feat(core): Smart tables — planner-driven table rendering with automatic wrap-on-overflow, multi-page slicing with repeated headers, optional zebra striping, captions, and configurable minimum row height / cell padding. Six new optional TableBlock fields ship: wrap ('auto' | 'always' | 'never', default 'auto'), repeatHeader (default true), zebra, caption, minRowHeight, cellPadding. Existing tables that fit on one page are byte-identical to v1.1.0 output. Tagged-mode (/Table, /TR, /TH, /TD, /Caption) is preserved across slices via a shared structure-tree accumulator (ISO 14289-1 §7.10.6). See docs/guides/tables.md.

Fixed

  • fix(core, tables): renderTable no longer hardcodes the 4th column (i === 3) as the Amount column with Helvetica-Bold + credit/debit colouring. The styling is now driven by the explicit, opt-in ColumnDef.kind === 'amount' field. Combined with the wrap-aware truncate (next bullet), this resolves the table-smart-autofit.pdf clipping where the Notes column was unintentionally rendered bold and the auto-fit planner — measuring with regular metrics — sized the column too narrowly. The legacy buildPDF() financial path keeps the historical i === 3 heuristic for byte-identical v1.0/v1.1 output. (src/core/pdf-renderers.ts, src/types/pdf-types.ts)
  • fix(core, tables): emitCell only applies the v1.1 character-truncate (mx / mxH) when wrap: 'never'. Under wrap: 'auto' (the v1.2.0 default) and wrap: 'always' the planner has already sized the column to fit the text; the redundant char-truncate previously truncated text that genuinely fits, producing spurious ellipses in auto-fitted tables. (src/core/pdf-renderers.ts)
  • fix(crypto, #46): ASN.1 decodeAt() now recursively rewrites every descendant node's offset to be absolute against the original DER buffer. Previously, only direct children were patched, so parseName()'s fullDer.subarray(node.offset, …) returned a slice off by exactly the offset of the parent's value field. CMS signatures using these slices in IssuerAndSerialNumber now validate in Adobe Reader, openssl-cms, and pdfnative's own verify path. Defensive raw[0] === 0x30 assertion added at the parseName() boundary to catch any future regression. (src/crypto/asn1.ts, src/crypto/x509.ts)
  • fix(shaping): invisible Unicode bidirectional formatting characters (LRM/RLM U+200E/F, LRE/RLE/PDF/LRO/RLO U+202A–E, LRI/RLI/FSI/PDI U+2066–9) are now stripped at the encoder boundary. The BiDi resolver consumed them when it ran, but it only runs on RTL paragraphs — pure-LTR text containing an orphan PDF or isolate marker would otherwise reach the cmap as .notdef and render as tofu (􀀀). New public stripBidiControls(text) helper exported from the root; applied transparently in pdfString(), helveticaWidth(), and the Unicode encoding context's textRuns() / ps(). Zero behaviour change on text without control characters. (src/shaping/bidi.ts, src/fonts/encoding.ts, src/core/encoding-context.ts)- fix(fonts, tables): right- and centre-aligned bold text — table headers (Helvetica-Bold via enc.f2) and table captions — are now measured with Adobe Helvetica-Bold AFM advance widths instead of Helvetica-Regular. Pre-1.2.0, the renderer measured "Amount" at ~25.44pt (Regular) but the glyphs actually rendered ~30.22pt wide (Bold) at 8pt, so the trailing glyph overshot the column boundary by ~2pt and the t was clipped or overhung into the neighbour column. Fix: new helveticaBoldWidth(str, sz) public function in src/fonts/encoding.ts and an opt-in bold flag on txtR/txtC/txtRTagged/txtCTagged in src/core/pdf-text.ts. Wired through smart-table headers (src/core/pdf-renderers.ts), legacy buildPDF() headers (src/core/pdf-builder.ts), and autoFitColumns header measurement (src/core/pdf-column-fit.ts). Visual: the t of Amount now sits comfortably inside the column on every table sample. Unicode/CIDFont mode uses per-font metrics and is unaffected. (src/fonts/encoding.ts, src/core/pdf-text.ts, src/core/pdf-renderers.ts, src/core/pdf-builder.ts, src/core/pdf-column-fit.ts)

Added

  • feat(types, tables): new optional ColumnDef.kind?: 'amount' field. Opt-in replacement for the pre-1.2.0 hardcoded i === 3 heuristic in renderTable — when set, data cells in the column render in Helvetica-Bold with credit/debit colouring driven by row.type. Reserved enum (further kind values may be added in future minor releases). (src/types/pdf-types.ts)
  • feat(core, mcp): new PDF_A_CONFORMANCE_TARGETS = ['pdfa1b', 'pdfa2b', 'pdfa2u', 'pdfa3b'] as const and PdfAConformanceTarget type exported from the root. Single source of truth for tooling — the pdfnative-mcp server's add_table / generate_basic_pdf tool schemas can import { PDF_A_CONFORMANCE_TARGETS } from 'pdfnative' and feed the array straight into their JSON-schema enum: field instead of hardcoding string literals. Materially improves how Gemini-CLI and other LLM agents discover the legal pdfA values. (src/core/pdf-tags.ts)
  • feat(crypto, #45): addSignaturePlaceholder(pdfBytes, options?) exported from the root. Options: placeholderBytes (default 16 384), fieldName (default 'Signature1'), pageIndex (default 0), signingTime / name / reason / location / contactInfo (forwarded to the /Sig dictionary). Throws on encrypted input. Idempotent on already-signed PDFs (verified by a dedicated test case + sample generator). (src/core/pdf-sig-placeholder.ts)
  • refactor(crypto): new SigDictMetadata interface in src/core/pdf-signature.ts — the metadata-only subset of PdfSignOptions (name, reason, location, contactInfo, signingTime) reused by both buildSigDict() and addSignaturePlaceholder(). PdfSignOptions now extends SigDictMetadata.
  • refactor(parser): src/parser/pdf-modifier.ts gains addRawObject(body) plus an internal rawBodies: Map<number, string> so placeholder-style raw object payloads (containing /Contents <00…00>) round-trip through the incremental-save path without re-serialisation that would corrupt the hex placeholder.
  • feat(core): buildDocumentPDFStreamPageByPage() and buildPDFStreamPageByPage() exported from the root. Both return AsyncGenerator<Uint8Array> chunked at PDF object boundaries (\nendobj\n). Honour a chunkSize option for further sub-chunking; default is 65 536 bytes. (src/core/pdf-stream-writer.ts)
  • feat(shaping): normalizeBidiEmbeddings(text) in src/shaping/bidi.ts — exported alongside resolveBidiRuns(). Standalone for callers that want to pre-normalise text before their own BiDi pipeline.
  • feat(shaping): USE-lite classifier — UseCategory, UseClassifiedCp, UseCluster, classifyUseCategory(cp), classifyClusters(cps) exported from the root. (src/shaping/use-lite.ts)
  • scripts(samples): two new sample generators wired into npm run test:generate:
    • scripts/generators/signature-placeholder.ts — produces test-output/signature/signature-placeholder-unsigned.pdf and signature-placeholder-idempotent.pdf (the latter byte-equal to the former, proving the no-op contract).
    • scripts/generators/bidi-embeddings-showcase.ts — produces test-output/bidi/bidi-embeddings-showcase.pdf exercising LRE / RLE / LRO / RLO / PDF in Hebrew/English mixed paragraphs.
  • feat(core, tables): six new optional TableBlock fields, all @since 1.2.0, fully backward-compatible:
    • wrap?: 'auto' | 'always' | 'never''auto' (default) keeps single-line rows when content fits the column and wraps only on overflow; 'always' wraps every cell; 'never' clips like v1.1.0.
    • repeatHeader?: boolean — when true (default), the header row reprints at the top of every continuation page so the reader never loses context.
    • zebra?: boolean | PdfColor — alternating data-row fill. true uses the v1.2.0 default '0.969 0.973 0.984'; any PdfColor (hex, tuple, or PDF rgb string) overrides.
    • caption?: string — caption printed once above the first slice of the table; tagged-mode emits a /Caption structure element as a child of /Table (ISO 14289-1 §7.10.6).
    • minRowHeight?: number — minimum visual height per row in points (default 12).
    • cellPadding?: number — internal cell padding in points (default 3).
  • feat(core, tables): new internal planTable(table, x, y, width, ctx, … ) measurement function and internal TableSlice type in src/core/pdf-renderers.ts. The planner runs once per table; _paginateBlocks() slices the result at row boundaries before any drawing happens. This separation keeps renderTable() page-lifecycle-free and lets the document paginator make multi-page decisions deterministically. Not re-exported from the package root — see docs/guides/tables.md for the internal contract. (src/core/pdf-document.ts, src/core/pdf-renderers.ts)
  • scripts(samples): new scripts/generators/document-table-parity.ts — four samples covering the new table features:
    • test-output/document/table-wrap-auto.pdfwrap: 'auto' with mixed short/long cells.
    • test-output/document/table-multipage-header-repeat.pdf — 60-row table with header reprinted on each continuation page.
    • test-output/document/table-zebra-caption.pdf — zebra striping + caption + minRowHeight.
    • test-output/document/table-smart-autofit.pdfautoFit: true columns combined with wrap: 'auto'.

Changed

  • feat(fonts): helveticaBoldWidth(str, sz) exported from the root (also re-exported from pdfnative/fonts). Mirrors the existing helveticaWidth but uses Adobe Helvetica-Bold AFM advance widths. Strips invisible BiDi controls before measuring (zero-width per UAX #9). Drives the bold-header positioning fix described above. (src/fonts/encoding.ts)
  • feat(core): txtR, txtC, txtRTagged, txtCTagged in src/core/pdf-text.ts gain an optional trailing bold: boolean = false parameter that switches Latin-mode width measurement to helveticaBoldWidth. Backward-compatible default.
  • chore(types): SigDictMetadata interface now re-exported from the package root — the v1.2.0 release notes already documented it as a stable public type; this aligns the runtime surface.
  • chore(meta): version bumped to 1.2.0. Still zero runtime dependencies.
  • test: 53 test files / 1822 tests, all green. New coverage: 13 cases for addSignaturePlaceholder, 8 for page-by-page streaming, 13 for normalizeBidiEmbeddings, 23 for the USE-lite classifier, 6 for stripBidiControls, 14 for smart tables (7 planner unit tests + 7 end-to-end including byte-stability, header repetition, zebra, caption, tagged mode, wrap modes), 8 for helveticaBoldWidth, 2 for bold-header positioning (regression guard against pre-1.2.0 column overflow), 3 for ColumnDef.kind === 'amount' opt-in and wrap-aware truncate (the v1.2.0 polish fix).
  • feat(core, tables): wrap defaults to 'auto' (was effectively 'never' / clip in v1.1.0) and repeatHeader defaults to true. Single-page tables that fit without wrapping remain byte-identical to v1.1.0 for their body rendering; right- and centre-aligned header cells shift by 2–5pt vs v1.1.0 because the bold-width fix corrects the historical positioning bug — a genuine glyph-placement improvement, not a regression. To opt back into the v1.1.0 single-pass body behaviour, set repeatHeader: false and wrap: 'never' (the header positioning fix is unconditional and not opt-out).
  • scripts(samples): emoji-basic.pdf and emoji-table.pdf now register 'latin' alongside 'emoji' so ASCII codepoints (digits in the Duration column, punctuation between emoji on long lines) route to Noto Sans VF with proportional advance widths instead of Noto Emoji's em-wide glyphs. Visual regressions reported on the v1.2.0 preview builds (Duration column rendering as "1 s2", right-margin overflow on the Transport row) now resolved. Signature samples (digital-signature.*, signature-placeholder-*) gain inline clarifier paragraphs explaining the expected Adobe Reader validator output for self-signed certificates and unsigned placeholders.
  • scripts(samples): bidi-embeddings-showcase.pdf — restored a missing space in the orphan-PDF demo paragraph (was "textwith", now "text with"). Cosmetic fix; no behavioural change.

Deferred to v1.3.0

  • COLRv1 colour emoji. Extractor for COLR/CPAL is staged in tools/build-font-data.cjs but the PDF renderer (axial shading dictionaries + PaintComposite/PaintMask) deserves a dedicated polish pass. Monochrome emoji via Noto Emoji from v1.1.0 is unchanged.
  • USE-lite shaper rewire. The classifier ships as a public API in v1.2.0; the Devanagari/Bengali/Tamil shapers continue to use their v1.1.0 ad-hoc cluster logic for now. v1.3.0 will rewire them to consume classifyClusters() and fix the remaining nukta+virama, half-form, eyelash-ra, and ya-phalaa edge cases.
  • Internal page-by-page assembly. The current buildDocumentPDFStreamPageByPage() chunks an already-assembled PDF at object boundaries. True one-page-at-a-time assembly (where the full binary never exists in memory) requires factoring the 1000-line buildDocumentPDF() body around a page generator — a risky refactor we declined to ship in v1.2.0 in favour of correctness on the issues above.
  • Pixel-diff visual regression on the test-output/extreme/ baselines. Tooling (zero-dep PNG decoder, baseline PNGs, CI workflow) deferred.
  • Universal Shaping Engine (full). v1.2.0 ships USE-lite — a pragmatic subset covering documented Bengali/Devanagari/Tamil edge cases. Full USE (Khmer, Myanmar, complex Sinhala) tracked for v1.3+.
  • WASM acceleration of font subsetting and compression.
  • UAX #9 character-level overrides inside LRO/RLO scopes (X4–X5). v1.2.0 normalises base direction only — sufficient for the embeddings use cases reported in the wild; full override tracking gated on demand.

Upgrade

npm install pdfnative@1.2.0

New one-call sign workflow:

import {
    buildDocumentPDFBytes,
    addSignaturePlaceholder,
    signPdfBytes,
} from 'pdfnative';

const unsigned = buildDocumentPDFBytes(params);
const placeheld = addSignaturePlaceholder(unsigned, { fieldName: 'Author' });
const signed = signPdfBytes(placeheld, { signerCert, rsaKey, algorithm: 'rsa-sha256' });

Object-boundary page-by-page streaming:

import { buildDocumentPDFBytes, buildPDFStreamPageByPage } from 'pdfnative';
import { createWriteStream } from 'node:fs';

const bytes = buildDocumentPDFBytes(params);
const out = createWriteStream('huge-report.pdf');
for await (const chunk of buildPDFStreamPageByPage(bytes)) out.write(chunk);
out.end();

UAX #9 embeddings (LRE/RLE/LRO/RLO/PDF) now Just Work:

const para = `English text \u202B${'Hebrew text'}\u202C continues in English.`;
// resolveBidiRuns(para) sees RLI/PDI internally — same visual output as the isolate form.

Smart tables — wrap, repeated headers, zebra, caption:

import { buildDocumentPDFBytes } from 'pdfnative';

const bytes = buildDocumentPDFBytes({
    blocks: [
        {
            type: 'table',
            columns: [
                { key: 'item', label: 'Item', width: 0.6, autoFit: true },
                { key: 'qty', label: 'Qty', width: 0.2, align: 'right' },
                { key: 'price', label: 'Price', width: 0.2, align: 'right' },
            ],
            rows: bigInvoiceRows, // any length — slices across pages automatically
            wrap: 'auto',         // single-line when it fits, wraps on overflow
            repeatHeader: true,   // header reprints on every continuation page
            zebra: true,          // alternating row fill
            caption: 'Invoice line items',
            minRowHeight: 14,
            cellPadding: 5,
        },
    ],
});

No code changes required for existing users — every API from v1.1.0 still works and produces byte-identical output for the same inputs.

Downstream integration notes

This section coordinates v1.2.0 changes with the rest of the ecosystem (pdfnative-cli, pdfnative-mcp, and any third-party integrators). Adopting v1.2.0 is opt-in and 100% backward-compatible; the items below are improvements you can light up by upgrading.

For pdfnative-mcp maintainers

  • prepare_signature_placeholder tool — now a thin wrapper. v0.3.0 ships a local re-implementation of placeholder injection. From pdfnative 1.2.0 onward, this collapses to one call: addSignaturePlaceholder(pdfBytes, { fieldName, placeholderBytes, signingTime, name, reason, location, contactInfo }). The local logic can be removed; behaviour is byte-identical and idempotent (returns input unchanged on already-signed PDFs).
  • v0.4 roadmap item "sign_pdf placeholder auto-injection — sign any PDF in a single call". Now trivially implementable: signPdfBytes(addSignaturePlaceholder(pdfBytes), opts).
  • inspect_pdf tool — new field opportunity. Expose whether the input PDF already contains an /FT /Sig widget (helps AI agents decide between "sign" and "re-sign" workflows). Detection logic is the same heuristic addSignaturePlaceholder() uses internally.
  • add_table tool — six new optional fields to forward. wrap, repeatHeader, zebra, caption, minRowHeight, cellPadding. Defaults (wrap: 'auto', repeatHeader: true) match v1.2.0's documented defaults — surface them as optional MCP-tool parameters so agent-driven invoice/report workflows get multi-page-safe tables out of the box.
  • PDF/A target enum — single source of truth. Replace any hardcoded enum: ['pdfa1b','pdfa2b','pdfa2u','pdfa3b'] in your tool schemas with import { PDF_A_CONFORMANCE_TARGETS } from 'pdfnative' and spread the array. Keeps the MCP tool schema in lockstep with the pdfnative tagged option as new conformance targets are added.
  • ColumnDef.kind — explicit amount styling. When the agent renders a financial table, set columns[i].kind = 'amount' on the amount column to opt into Helvetica-Bold + credit/debit colouring driven by row.type. The pre-1.2.0 implicit i === 3 heuristic is gone in the document builder.

For pdfnative-cli maintainers

  • sign command — drop local placeholder logic. v0.3.0's sign subcommand carries its own placeholder injector; replace with addSignaturePlaceholder() from pdfnative@1.2.0. Eliminates a class of subtle xref//ByteRange bugs.
  • verify command — issuer/subject DNs now correct on every signed PDF. Fix #46 (ASN.1 grandchild offsets in parseName()) means CMS IssuerAndSerialNumber parses correctly. Any cached X.509 issuer/subject slices from previously-signed PDFs should be invalidated.
  • render --stream — new page-by-page mode. buildDocumentPDFStreamPageByPage() complements the existing streamDocumentPdf() with object-boundary chunking — useful when piping huge PDFs through stdout without buffering.
  • render — smart tables enabled by default. Documents emitted by the CLI that include large TableBlocks now wrap on overflow and reprint headers across pages automatically. To preserve v1.1.0 output bit-for-bit, callers can set wrap: 'never' and repeatHeader: false on each table block.

For third-party integrators

  • The new public exports (addSignaturePlaceholder, buildPDFStreamPageByPage, buildDocumentPDFStreamPageByPage, normalizeBidiEmbeddings, classifyUseCategory, classifyClusters, UseCategory, UseClassifiedCp, UseCluster, SigDictMetadata, helveticaBoldWidth, PDF_A_CONFORMANCE_TARGETS, PdfAConformanceTarget) are all stable. No removals, no signature changes, no behavioural regressions on existing exports. Six new optional TableBlock fields (wrap, repeatHeader, zebra, caption, minRowHeight, cellPadding) plus one new optional ColumnDef.kind field are additive; omitting them keeps v1.1.0 single-page body bytes identical (header glyph positioning shifts by 2–5pt for right-/centre-aligned headers — a documented correctness fix; tables with no kind: 'amount' column no longer render any cell in bold credit/debit colour, which is a documented behaviour change for document-builder tables that previously relied on the i === 3 implicit heuristic). planTable() is an internal renderer primitive (not re-exported from the root) — it is documented in docs/guides/tables.md for contributors, not as part of the public API surface.
  • Cross-repo coordination uses explicit version pins, not shared knowledge bases. If you build on pdfnative, pin a minor in your package.json and re-pin per release after re-running your integration tests.

Credits

  • ISO 32000-1:2008 §12.7 (interactive forms) / §12.8 (digital signatures) / §7.5.6 (incremental updates).
  • RFC 5280 (X.509 v3 certificates) and RFC 5652 (CMS SignedData) for the issuer/subject slice fix.
  • Unicode Bidirectional Algorithm (UAX #9) for the embeddings work.
  • Universal Shaping Engine (Microsoft) for the cluster-classification baseline.