v1.1.0 — PDF/A Latin embedding, BiDi isolates, Arabic harakat, emoji #41
Nizoka
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Released 2026-04-30
Maximalist minor release. Closes the two largest open epics — issue #28 (PDF/A Latin font embedding) and issue #25 (full UAX #9 BiDi isolates + GPOS MarkBasePos for Arabic harakat) — and adds first-class monochrome emoji support, auto-fit table columns, and per-cell clipping. Folds the alpha.1 / alpha.2 medium-term items into a single stable cut.
100% backward-compatible. All new features are opt-in and gated on font registration or explicit table flags. Pre-existing PDFs are byte-identical. 1726 tests green across 48 files.
Highlights
registerFont('latin', () => import('pdfnative/fonts/noto-sans-data.js')). Automatically activates for PDF/A modes when the encoding context detects characters outside WinAnsi. Closes #28.resolveBidiRuns()finds outermost isolate pairs,resolveBidiRunsForced()recurses with forced level,resolveBidiCore()runs the W1–W7 / N1–N2 / L2 pipeline.gsub-driver.ts(tryLigature()) andgpos-positioner.ts(positionMarkOnBase()) modules. Bengali, Tamil, Devanagari, and Arabic shapers now route through a single GSUB lookup helper and a single GPOS anchor helper instead of three duplicated implementations.registerFont('emoji', () => import('pdfnative/fonts/noto-emoji-data.js')). Detection covers the full BMP/SMP emoji ranges (U+1F300–U+1FAFF, U+2600–U+27BF, …) plus Fitzpatrick modifiers (U+1F3FB–U+1F3FF), ZWJ (U+200D), and VS-15 / VS-16 (U+FE0E / U+FE0F). Multi-font run splitting routes emoji codepoints to the registered'emoji'font automatically.TableBlock.autoFitColumns(alpha.2) andTableBlock.clipCells(alpha.2) now part of the stable surface. Defaults preserve v1.0.x byte output.Fixed (PDF/A conformance hardening)
/F1) and Object 4 (/F2) are now Type0 redirector dicts pointing to the embeddedCIDFontType2/FontFile2chain when a Latin font entry is registered — eliminating unembeddedHelvetica/Helvetica-Boldstandard-14 references that broke veraPDF (ISO 19005-1 §6.3.4 / ISO 19005-2 §6.2.11.4.1).utf8EncodeBinaryString()) beforetoBytes(), preserving em-dash, ellipsis, smart quotes, CJK in<dc:title>and matching/Info /Titlebyte-for-byte (ISO 19005-1 §6.7.3 t1).buildXMPMetadata()emits<dc:description>and<pdf:Keywords>whenever/Info /Subjectand/Info /Keywordsare set, satisfying ISO 19005-1 §6.7.3 t4 / t5 parity rules. Unblocks PDF/A-1b validation for documents carrying subject or keywords metadata.createEncodingContext(fontEntries, pdfA=true)disables the WinAnsi/Helvetica fallback. Characters outside the primary CIDFont's cmap render as.notdefinstead of routing to an unembedded Type1 font./F 4): Link annotations (/Subtype /Link, both/URIand/GoTo) and form widgets (/Subtype /Widget) now emit/F 4(Print flag set, NoView/Hidden/Invisible cleared) per ISO 19005-2 §6.5.3 / veraPDF rule 6.3.2-1. Required on every annotation in PDF/A-2 / PDF/A-3.barcode-tagged,compressed-tagged-pdfa2b,header-footer-tagged,tagged-accessibility-complex,toc-tagged) now register alatinfont entry so the generated samples pass veraPDF rule 6.2.11.4.1-1. Thepdfa-variantsandpdfa-latin-embeddinggenerators were already wired in alpha.1.continue-on-error.validate-pdfa.tsauto-detects PDF/A claims via XMPpdfaid:part, so non-PDF/A samples never trigger CI failures.Added
fonts/noto-sans-data.{js,d.ts}— Noto Sans VF subsetted, 4515 glyphs, 3094 cmap entries. OFL-1.1.fonts/noto-emoji-data.{js,d.ts}— Noto Emoji monochrome, 1891 glyphs, 1489 cmap entries. OFL-1.1.BNand recursed. Nested isolates supported. Unmatched isolates fall through gracefully.lastBaseGidtracking through the shaping pipeline including lam-alef ligatures.src/shaping/gsub-driver.tsexportingtryLigature(gids, ligatures)andsrc/shaping/gpos-positioner.tsexportinggetBaseAnchor,getMarkAnchor,getMark2MarkAnchor,positionMarkOnBase. Bengali / Tamil / Devanagari / Arabic shapers refactored to use them.EMOJI_RANGES,isEmojiCodepoint,containsEmoji,FITZPATRICK_START/END,ZWJ,VS15,VS16exported fromsrc/shaping/script-registry.ts.detectCharLang()returns'emoji'for emoji codepoints;detectFallbackLangs()adds'emoji'automatically.npm run download:fonts.tests/shaping/phase2-shaping.test.ts(24 tests, GSUB driver + GPOS positioner + BiDi isolates + Arabic GPOS),tests/shaping/emoji.test.ts(15 tests, ranges + predicates + script-detect integration + baked module shape),tests/fonts/pdfa-latin-embedding.test.ts(PDF/A Latin embedding integration).Changed
resolveBidiRuns()rewritten as a recursive isolate-aware dispatcher. Behaviour unchanged for inputs without isolate characters — output is byte-identical for all pre-v1.1.0 fixtures.fixPunctuationAffinityandfixBracketPairingwidened toreadonly number[]to match the new core pipeline. No public API impact.tryLigaturedefinitions removed. Shapers now declare a thintryLig(gids)closure that forwards to the shared driver. Output bytes unchanged.Documentation
.github/copilot-instructions.mdupdated to reflect new modules, emoji support, and PDF/A Latin embedding.Deferred to v1.2.0
buildDocumentPDFStreamPageByPage()).Upgrade
Opt into the new font modules as needed:
No code changes required for users who don't register
'latin'or'emoji'— pre-existing PDFs are byte-identical.Credits
This discussion was created from the release v1.1.0 — PDF/A Latin embedding, BiDi isolates, Arabic harakat, emoji.
Beta Was this translation helpful? Give feedback.
All reactions