Skip to content

v1.1.0 — PDF/A Latin embedding, BiDi isolates, Arabic harakat, emoji

Choose a tag to compare

@Nizoka Nizoka released this 29 Apr 22:49
· 4 commits to main since this release
62f8c2e

Released 2026-04-30

Maximalist minor release. Closes the two largest open epics — issue #28 (PDF/A Latin font embedding) and issue #25 (full UAX #9 BiDi isolates + GPOS MarkBasePos for Arabic harakat) — and adds first-class monochrome emoji support, auto-fit table columns, and per-cell clipping. Folds the alpha.1 / alpha.2 medium-term items into a single stable cut.

100% backward-compatible. All new features are opt-in and gated on font registration or explicit table flags. Pre-existing PDFs are byte-identical. 1726 tests green across 48 files.

Highlights

  • fonts(latin): Noto Sans VF (OFL-1.1) is now bundleable as a fallback for PDF/A documents that use non-WinAnsi Latin (curly quotes, em-dash, ellipsis…). Opt-in via registerFont('latin', () => import('pdfnative/fonts/noto-sans-data.js')). Automatically activates for PDF/A modes when the encoding context detects characters outside WinAnsi. Closes #28.
  • shaping(bidi): UAX #9 isolate handling — LRI / RLI / FSI / PDI (U+2066–U+2069) are now honoured. Mixed-script paragraphs containing isolated runs are resolved correctly with full recursion. Three-tier dispatcher: public resolveBidiRuns() finds outermost isolate pairs, resolveBidiRunsForced() recurses with forced level, resolveBidiCore() runs the W1–W7 / N1–N2 / L2 pipeline.
  • shaping(arabic): GPOS MarkBasePos applied to transparent marks (harakat — fatha, kasra, damma, sukun, shadda, …). Marks now anchor on the preceding base glyph using font-provided GPOS anchor data, falling back to (0, 0) when absent. Closes the visual half of #25.
  • shaping(drivers): new shared gsub-driver.ts (tryLigature()) and gpos-positioner.ts (positionMarkOnBase()) modules. Bengali, Tamil, Devanagari, and Arabic shapers now route through a single GSUB lookup helper and a single GPOS anchor helper instead of three duplicated implementations.
  • shaping(emoji): monochrome emoji via Noto Emoji (OFL-1.1, 1891 glyphs). Opt-in via registerFont('emoji', () => import('pdfnative/fonts/noto-emoji-data.js')). Detection covers the full BMP/SMP emoji ranges (U+1F300–U+1FAFF, U+2600–U+27BF, …) plus Fitzpatrick modifiers (U+1F3FB–U+1F3FF), ZWJ (U+200D), and VS-15 / VS-16 (U+FE0E / U+FE0F). Multi-font run splitting routes emoji codepoints to the registered 'emoji' font automatically.
  • core(table): TableBlock.autoFitColumns (alpha.2) and TableBlock.clipCells (alpha.2) now part of the stable surface. Defaults preserve v1.0.x byte output.

Fixed (PDF/A conformance hardening)

  • pdfa(font embedding): Object 3 (/F1) and Object 4 (/F2) are now Type0 redirector dicts pointing to the embedded CIDFontType2 / FontFile2 chain when a Latin font entry is registered — eliminating unembedded Helvetica / Helvetica-Bold standard-14 references that broke veraPDF (ISO 19005-1 §6.3.4 / ISO 19005-2 §6.2.11.4.1).
  • pdfa(xmp utf-8): XMP metadata streams now go through a binary-safe UTF-8 encoder (utf8EncodeBinaryString()) before toBytes(), preserving em-dash, ellipsis, smart quotes, CJK in <dc:title> and matching /Info /Title byte-for-byte (ISO 19005-1 §6.7.3 t1).
  • pdfa(xmp parity): buildXMPMetadata() emits <dc:description> and <pdf:Keywords> whenever /Info /Subject and /Info /Keywords are set, satisfying ISO 19005-1 §6.7.3 t4 / t5 parity rules. Unblocks PDF/A-1b validation for documents carrying subject or keywords metadata.
  • pdfa(encoding fallback): createEncodingContext(fontEntries, pdfA=true) disables the WinAnsi/Helvetica fallback. Characters outside the primary CIDFont's cmap render as .notdef instead of routing to an unembedded Type1 font.
  • pdfa(annotations /F 4): Link annotations (/Subtype /Link, both /URI and /GoTo) and form widgets (/Subtype /Widget) now emit /F 4 (Print flag set, NoView/Hidden/Invisible cleared) per ISO 19005-2 §6.5.3 / veraPDF rule 6.3.2-1. Required on every annotation in PDF/A-2 / PDF/A-3.
  • scripts(samples): Five PDF/A-claiming sample generators (barcode-tagged, compressed-tagged-pdfa2b, header-footer-tagged, tagged-accessibility-complex, toc-tagged) now register a latin font entry so the generated samples pass veraPDF rule 6.2.11.4.1-1. The pdfa-variants and pdfa-latin-embedding generators were already wired in alpha.1.
  • ci(verapdf): veraPDF validation is now blocking on PRs and pushes — no more continue-on-error. validate-pdfa.ts auto-detects PDF/A claims via XMP pdfaid:part, so non-PDF/A samples never trigger CI failures.

Added

  • fonts(latin): fonts/noto-sans-data.{js,d.ts} — Noto Sans VF subsetted, 4515 glyphs, 3094 cmap entries. OFL-1.1.
  • fonts(emoji): fonts/noto-emoji-data.{js,d.ts} — Noto Emoji monochrome, 1891 glyphs, 1489 cmap entries. OFL-1.1.
  • shaping(bidi): isolate support — LRI (U+2066), RLI (U+2067), FSI (U+2068), PDI (U+2069) classified as BN and recursed. Nested isolates supported. Unmatched isolates fall through gracefully.
  • shaping(arabic): MarkBasePos applied to transparent (joining type 'T') marks. lastBaseGid tracking through the shaping pipeline including lam-alef ligatures.
  • shaping(drivers): src/shaping/gsub-driver.ts exporting tryLigature(gids, ligatures) and src/shaping/gpos-positioner.ts exporting getBaseAnchor, getMarkAnchor, getMark2MarkAnchor, positionMarkOnBase. Bengali / Tamil / Devanagari / Arabic shapers refactored to use them.
  • shaping(emoji): EMOJI_RANGES, isEmojiCodepoint, containsEmoji, FITZPATRICK_START/END, ZWJ, VS15, VS16 exported from src/shaping/script-registry.ts. detectCharLang() returns 'emoji' for emoji codepoints; detectFallbackLangs() adds 'emoji' automatically.
  • scripts(download-fonts): Noto Emoji entry in the manifest for reproducible npm run download:fonts.
  • tests: tests/shaping/phase2-shaping.test.ts (24 tests, GSUB driver + GPOS positioner + BiDi isolates + Arabic GPOS), tests/shaping/emoji.test.ts (15 tests, ranges + predicates + script-detect integration + baked module shape), tests/fonts/pdfa-latin-embedding.test.ts (PDF/A Latin embedding integration).

Changed

  • shaping(bidi): resolveBidiRuns() rewritten as a recursive isolate-aware dispatcher. Behaviour unchanged for inputs without isolate characters — output is byte-identical for all pre-v1.1.0 fixtures.
  • shaping(types): fixPunctuationAffinity and fixBracketPairing widened to readonly number[] to match the new core pipeline. No public API impact.
  • shaping(bengali, tamil, devanagari): local tryLigature definitions removed. Shapers now declare a thin tryLig(gids) closure that forwards to the shared driver. Output bytes unchanged.

Documentation

  • New release notes (this file).
  • README, ROADMAP, and .github/copilot-instructions.md updated to reflect new modules, emoji support, and PDF/A Latin embedding.
  • New emoji guide at pdfnative.dev/guides/emoji.html.
  • PDF/A guide refreshed with Latin embedding example.

Deferred to v1.2.0

  • Full UAX #9 embeddings (LRE / RLE / LRO / RLO / PDF) — isolates ship now, embeddings remain rare in practice and require a deeper level-stack refactor.
  • True page-by-page constant-memory streaming (buildDocumentPDFStreamPageByPage()).
  • COLRv1 colour emoji (this release ships monochrome only).

Upgrade

npm install pdfnative@1.1.0

Opt into the new font modules as needed:

import { registerFont } from 'pdfnative';

// PDF/A documents that need non-WinAnsi Latin fallback
registerFont('latin', () => import('pdfnative/fonts/noto-sans-data.js'));

// Emoji rendering (monochrome)
registerFont('emoji', () => import('pdfnative/fonts/noto-emoji-data.js'));

No code changes required for users who don't register 'latin' or 'emoji' — pre-existing PDFs are byte-identical.

Credits

  • Noto Sans, Noto Emoji © Google LLC, licensed under SIL Open Font License 1.1.
  • UAX #9 reference: Unicode Bidirectional Algorithm.
  • ISO 32000-1:2008 §9.7 (CIDFont), §9.10 (ToUnicode), §14.8 (PDF/UA), ISO 19005-2 (PDF/A-2).