v1.1.0 — PDF/A Latin embedding, BiDi isolates, Arabic harakat, emoji
Released 2026-04-30
Maximalist minor release. Closes the two largest open epics — issue #28 (PDF/A Latin font embedding) and issue #25 (full UAX #9 BiDi isolates + GPOS MarkBasePos for Arabic harakat) — and adds first-class monochrome emoji support, auto-fit table columns, and per-cell clipping. Folds the alpha.1 / alpha.2 medium-term items into a single stable cut.
100% backward-compatible. All new features are opt-in and gated on font registration or explicit table flags. Pre-existing PDFs are byte-identical. 1726 tests green across 48 files.
Highlights
- fonts(latin): Noto Sans VF (OFL-1.1) is now bundleable as a fallback for PDF/A documents that use non-WinAnsi Latin (curly quotes, em-dash, ellipsis…). Opt-in via
registerFont('latin', () => import('pdfnative/fonts/noto-sans-data.js')). Automatically activates for PDF/A modes when the encoding context detects characters outside WinAnsi. Closes #28. - shaping(bidi): UAX #9 isolate handling — LRI / RLI / FSI / PDI (U+2066–U+2069) are now honoured. Mixed-script paragraphs containing isolated runs are resolved correctly with full recursion. Three-tier dispatcher: public
resolveBidiRuns()finds outermost isolate pairs,resolveBidiRunsForced()recurses with forced level,resolveBidiCore()runs the W1–W7 / N1–N2 / L2 pipeline. - shaping(arabic): GPOS MarkBasePos applied to transparent marks (harakat — fatha, kasra, damma, sukun, shadda, …). Marks now anchor on the preceding base glyph using font-provided GPOS anchor data, falling back to (0, 0) when absent. Closes the visual half of #25.
- shaping(drivers): new shared
gsub-driver.ts(tryLigature()) andgpos-positioner.ts(positionMarkOnBase()) modules. Bengali, Tamil, Devanagari, and Arabic shapers now route through a single GSUB lookup helper and a single GPOS anchor helper instead of three duplicated implementations. - shaping(emoji): monochrome emoji via Noto Emoji (OFL-1.1, 1891 glyphs). Opt-in via
registerFont('emoji', () => import('pdfnative/fonts/noto-emoji-data.js')). Detection covers the full BMP/SMP emoji ranges (U+1F300–U+1FAFF, U+2600–U+27BF, …) plus Fitzpatrick modifiers (U+1F3FB–U+1F3FF), ZWJ (U+200D), and VS-15 / VS-16 (U+FE0E / U+FE0F). Multi-font run splitting routes emoji codepoints to the registered'emoji'font automatically. - core(table):
TableBlock.autoFitColumns(alpha.2) andTableBlock.clipCells(alpha.2) now part of the stable surface. Defaults preserve v1.0.x byte output.
Fixed (PDF/A conformance hardening)
- pdfa(font embedding): Object 3 (
/F1) and Object 4 (/F2) are now Type0 redirector dicts pointing to the embeddedCIDFontType2/FontFile2chain when a Latin font entry is registered — eliminating unembeddedHelvetica/Helvetica-Boldstandard-14 references that broke veraPDF (ISO 19005-1 §6.3.4 / ISO 19005-2 §6.2.11.4.1). - pdfa(xmp utf-8): XMP metadata streams now go through a binary-safe UTF-8 encoder (
utf8EncodeBinaryString()) beforetoBytes(), preserving em-dash, ellipsis, smart quotes, CJK in<dc:title>and matching/Info /Titlebyte-for-byte (ISO 19005-1 §6.7.3 t1). - pdfa(xmp parity):
buildXMPMetadata()emits<dc:description>and<pdf:Keywords>whenever/Info /Subjectand/Info /Keywordsare set, satisfying ISO 19005-1 §6.7.3 t4 / t5 parity rules. Unblocks PDF/A-1b validation for documents carrying subject or keywords metadata. - pdfa(encoding fallback):
createEncodingContext(fontEntries, pdfA=true)disables the WinAnsi/Helvetica fallback. Characters outside the primary CIDFont's cmap render as.notdefinstead of routing to an unembedded Type1 font. - pdfa(annotations
/F 4): Link annotations (/Subtype /Link, both/URIand/GoTo) and form widgets (/Subtype /Widget) now emit/F 4(Print flag set, NoView/Hidden/Invisible cleared) per ISO 19005-2 §6.5.3 / veraPDF rule 6.3.2-1. Required on every annotation in PDF/A-2 / PDF/A-3. - scripts(samples): Five PDF/A-claiming sample generators (
barcode-tagged,compressed-tagged-pdfa2b,header-footer-tagged,tagged-accessibility-complex,toc-tagged) now register alatinfont entry so the generated samples pass veraPDF rule 6.2.11.4.1-1. Thepdfa-variantsandpdfa-latin-embeddinggenerators were already wired in alpha.1. - ci(verapdf): veraPDF validation is now blocking on PRs and pushes — no more
continue-on-error.validate-pdfa.tsauto-detects PDF/A claims via XMPpdfaid:part, so non-PDF/A samples never trigger CI failures.
Added
- fonts(latin):
fonts/noto-sans-data.{js,d.ts}— Noto Sans VF subsetted, 4515 glyphs, 3094 cmap entries. OFL-1.1. - fonts(emoji):
fonts/noto-emoji-data.{js,d.ts}— Noto Emoji monochrome, 1891 glyphs, 1489 cmap entries. OFL-1.1. - shaping(bidi): isolate support — LRI (U+2066), RLI (U+2067), FSI (U+2068), PDI (U+2069) classified as
BNand recursed. Nested isolates supported. Unmatched isolates fall through gracefully. - shaping(arabic): MarkBasePos applied to transparent (joining type 'T') marks.
lastBaseGidtracking through the shaping pipeline including lam-alef ligatures. - shaping(drivers):
src/shaping/gsub-driver.tsexportingtryLigature(gids, ligatures)andsrc/shaping/gpos-positioner.tsexportinggetBaseAnchor,getMarkAnchor,getMark2MarkAnchor,positionMarkOnBase. Bengali / Tamil / Devanagari / Arabic shapers refactored to use them. - shaping(emoji):
EMOJI_RANGES,isEmojiCodepoint,containsEmoji,FITZPATRICK_START/END,ZWJ,VS15,VS16exported fromsrc/shaping/script-registry.ts.detectCharLang()returns'emoji'for emoji codepoints;detectFallbackLangs()adds'emoji'automatically. - scripts(download-fonts): Noto Emoji entry in the manifest for reproducible
npm run download:fonts. - tests:
tests/shaping/phase2-shaping.test.ts(24 tests, GSUB driver + GPOS positioner + BiDi isolates + Arabic GPOS),tests/shaping/emoji.test.ts(15 tests, ranges + predicates + script-detect integration + baked module shape),tests/fonts/pdfa-latin-embedding.test.ts(PDF/A Latin embedding integration).
Changed
- shaping(bidi):
resolveBidiRuns()rewritten as a recursive isolate-aware dispatcher. Behaviour unchanged for inputs without isolate characters — output is byte-identical for all pre-v1.1.0 fixtures. - shaping(types):
fixPunctuationAffinityandfixBracketPairingwidened toreadonly number[]to match the new core pipeline. No public API impact. - shaping(bengali, tamil, devanagari): local
tryLigaturedefinitions removed. Shapers now declare a thintryLig(gids)closure that forwards to the shared driver. Output bytes unchanged.
Documentation
- New release notes (this file).
- README, ROADMAP, and
.github/copilot-instructions.mdupdated to reflect new modules, emoji support, and PDF/A Latin embedding. - New emoji guide at pdfnative.dev/guides/emoji.html.
- PDF/A guide refreshed with Latin embedding example.
Deferred to v1.2.0
- Full UAX #9 embeddings (LRE / RLE / LRO / RLO / PDF) — isolates ship now, embeddings remain rare in practice and require a deeper level-stack refactor.
- True page-by-page constant-memory streaming (
buildDocumentPDFStreamPageByPage()). - COLRv1 colour emoji (this release ships monochrome only).
Upgrade
npm install pdfnative@1.1.0Opt into the new font modules as needed:
import { registerFont } from 'pdfnative';
// PDF/A documents that need non-WinAnsi Latin fallback
registerFont('latin', () => import('pdfnative/fonts/noto-sans-data.js'));
// Emoji rendering (monochrome)
registerFont('emoji', () => import('pdfnative/fonts/noto-emoji-data.js'));No code changes required for users who don't register 'latin' or 'emoji' — pre-existing PDFs are byte-identical.
Credits
- Noto Sans, Noto Emoji © Google LLC, licensed under SIL Open Font License 1.1.
- UAX #9 reference: Unicode Bidirectional Algorithm.
- ISO 32000-1:2008 §9.7 (CIDFont), §9.10 (ToUnicode), §14.8 (PDF/UA), ISO 19005-2 (PDF/A-2).