v1.3.0 — COLRv1 colour emoji, USE-lite shaper integration, true constant-memory streaming, UAX #9 X4–X5 overrides, pixel-diff visual regression, #48 CP-1252 fix
LatestCloses issue #48 (CP-1252
extended characters not extractable under base-14 Helvetica) and delivers the
complete v1.3.0 roadmap with zero deferrals: COLRv1 colour emoji,
USE-lite shaper integration, true constant-memory streaming, UAX #9 X4–X5
character-level overrides, and a dual-mode pixel-diff visual-regression suite.
100% backward-compatible — every new feature is additive or opt-in, and
pre-existing PDFs are byte-identical for unchanged code paths.
Still zero runtime dependencies. 71 test files / 1982 tests, all green.
Highlights
-
feat(shaping): Telugu script (
te). A new pure-JS GSUB/GPOS
mini-shaper (src/shaping/telugu-shaper.ts) brings Telugu (~95 M speakers,
ISO 15924Telu, U+0C00–U+0C7F) to pdfnative’s 16 existing scripts. It builds
virama-mediated conjunct clusters, forms subjoined-consonant ligatures via the
sharedgsub-driver, and positions above/below vowel signs and modifiers via
the sharedgpos-positioner— with no reph and no pre-base reordering
(Telugu specifics). Bundled fontpdfnative/fonts/noto-telugu-data.js(Noto
Sans Telugu, OFL-1.1). Real-font shaping of తెలుగు / నమస్తే / క్షి /
శ్రీ / జ్ఞ produces zero.notdefand correct conjuncts. Opt-in via
registerFont('te', () => import('pdfnative/fonts/noto-telugu-data.js')).
(src/shaping/telugu-shaper.ts) -
feat(shaping): Five underserved scripts — Amharic/Ethiopic (
am),
Sinhala (si), Tibetan (bo), Khmer (km), Myanmar (my). Five new
pure-JS mini-shapers extend pdfnative from 17 to 22 Unicode scripts,
following the Telugu model (sharedgsub-driver+gpos-positioner,
zero-dependency, pure functions). Ethiopic (U+1200–U+137F) is a syllabic
abugida needing no reordering — detection + font routing only.
Sinhala (src/shaping/sinhala-shaper.ts, U+0D80–U+0DFF) builds
virama conjuncts, reorders the pre-base kombuva (U+0D9A-class), and
decomposes two-part vowels. Tibetan
(src/shaping/tibetan-shaper.ts, U+0F00–U+0FFF) performs vertical
subjoined-consonant stacking. Khmer
(src/shaping/khmer-shaper.ts, U+1780–U+17FF) is USE-lite — coeng
subscripts, pre-base vowels, two-part vowel decomposition. Myanmar
(src/shaping/myanmar-shaper.ts, U+1000–U+109F) is USE-lite — medials,
pre-base medial-ra (U+103C) and e-vowel (U+1031), virama stacking. Khmer and
Myanmar are pragmatic USE-lite implementations with documented limitations
(two-part-vowel MultipleSubst is handled JS-side via shaper decomposition
tables, not the OpenType extractor). Bundled fonts (all OFL-1.1):
noto-ethiopic-data.js,noto-sinhala-data.js,noto-tibetan-data.js
(Noto Serif Tibetan),noto-khmer-data.js,noto-myanmar-data.js. Opt-in via
registerFont('am'|'si'|'bo'|'km'|'my', () => import('pdfnative/fonts/...')). -
feat(core): Opt-in Unicode normalization (
layout.normalize).
PdfLayoutOptions.normalize?: 'NFC'|'NFD'|'NFKC'|'NFKD'|false(default
false) applies nativeString.prototype.normalizeto text before encoding,
so decomposed input (e.g. combining diacritics) composes to the form the
embedded font expects. Off by default → byte-identical output for existing
callers. (src/core/encoding-context.ts) -
fix(crypto): CSPRNG-only randomness. PDF encryption now throws
if no cryptographically secure random source (crypto.getRandomValues) is
available, instead of silently falling back toMath.random. Encryption keys
and IVs are never derived from a non-CSPRNG source.
(src/core/pdf-encrypt.ts) -
feat(core): Configurable document block limit (
layout.maxBlocks).
The previously hard-coded 10 000-block safety cap inassembleDocumentParts()
is now configurable and the default raised to 100 000 (DEFAULT_MAX_BLOCKS).
Large reports (e.g. multi-thousand-page medical or financial documents) no
longer hit a spurious ceiling; callers can raise or lower it per document via
layout.maxBlocks. The over-limit error now names the active limit and how to
change it. (src/core/pdf-document.ts, src/core/pdf-layout.ts) -
feat(parser):
validatePdfUA()— PDF/UA structural validator. A new
read-only, zero-byte-risk developer gate (ISO 14289-1) that parses a PDF and
checks/MarkInfo /Marked,/StructTreeRoot+/ParentTree,/Metadata,
/Lang, and per-page/MCIDuniqueness. Returns
{ valid, errors, warnings }. Complements (does not replace) veraPDF.
(src/parser/pdf-ua-validator.ts) -
fix(shaping, colour emoji): No more tofu from selectors/joiners.
Emoji variation selectors (VS-15/VS-16), the ZWJ/ZWNJ, and Fitzpatrick
skin-tone modifiers that no registered font covers are now dropped during
run-splitting instead of resolving to.notdef(the box). Joiners are
still preserved when an Indic shaper font maps them. NewisZeroWidthFormat()
predicate. (src/shaping/multi-font.ts, src/shaping/script-registry.ts) -
fix(core, colour emoji): Computed Form
/BBox(no clipping).
renderColorGlyph()now derives each colour-glyph Form/BBoxfrom the
transformed contour bounds rather than the hard-coded em box, so colour emoji
that dip below the baseline are no longer clipped. (src/core/pdf-color-glyph.ts) -
feat(fonts): COLRv1 colour emoji. Noto Color Emoji (OFL-1.1) is
bundleable as a curated subset (pdfnative/fonts/noto-color-emoji-data.js,
221 colour glyphs, 936 KB). COLR v0 solid layers and COLR v1 linear / radial
gradients render as native PDF Form XObjects (/ShadingType 2/3 +
/ExtGStateconstant-alpha), one indirect object per unique glyph,
forward-referenced into every page/XObject. The COLR / CPAL /glyf
parsers are self-written and zero-dependency. Opt-in:
registerFont('emoji', () => import('pdfnative/fonts/noto-color-emoji-data.js')).
When not registered, monochrome emoji (Noto Emoji, v1.1.0) is unchanged and
documents are byte-identical. Sweep gradients + Porter-Duff compositing fall
back gracefully to monochrome (documented limitation). (src/core/color-emoji.ts, src/fonts/colr-parser.ts, src/fonts/glyf-outline.ts) -
feat(core): True constant-memory streaming.
buildPDFStreamTrue()
andbuildDocumentPDFStreamTrue()assemble the PDF into its raw object/
framing parts and yield fixed-sizeUint8Arraychunks while freeing each
part as it is emitted — the fully-joined PDF binary never materialises in
memory. Peak memory is bounded by the chunk size plus the single largest
part (a content stream or embedded font subset). Byte-identical output to
buildDocumentPDFBytes()/buildPDFBytes(). The v1.2.0 object-boundary
variants (*StreamPageByPage) and fixed-size variants (*Stream) are
retained. (src/core/pdf-stream-writer.ts) -
feat(shaping): UAX #9 X4–X5 overrides.
resolveBidiRuns()now
performs full character-level direction overrides inside LRO / RLO scopes —
every codepoint within the scope is forced to strong L (LRO) or strong R
(RLO) before the W/N/L rules run, not merely the base paragraph direction
(the v1.2.0 behaviour). Nested embeddings and isolates recurse correctly.
(src/shaping/bidi.ts) -
feat(shaping): USE-lite shaper integration. The v1.2.0 cluster
classifier (classifyUseCategory()) is now the joiner-classification
authority across the Devanagari, Bengali, and Tamil shapers. Orphan ZWJ /
ZWNJ no longer reach the cmap as.notdef; ZWJ between a halant/pulli and
the next consonant continues a conjunct (half-form, eyelash-ra, ya-phalaa)
while ZWNJ breaks it keeping a visible virama. (src/shaping/use-lite.ts) -
test(visual): Dual-mode pixel-diff visual regression. Two
complementary guards over self-contained extreme-script fixtures (Tamil,
Bengali + Devanagari, Arabic) built with the real bundled fonts:- a glyph-position snapshot that extracts every show operator's font,
size, baseline x/y, and glyph IDs into a committed JSON baseline; and - a rendered-glyph pixel diff that parses the embedded
FontFile2
outlines, scan-fills the shaped glyphs at their positions into a
grayscale bitmap, and compares against a committed PNG baseline
(≤1% pixel tolerance) using a self-written, zero-dependency grayscale PNG
encoder/decoder.
A CI workflow (visual-regression.yml)
runs both, gated onsrc/shaping/**,src/fonts/**,src/core/**, and
fonts/**. (tests/visual/) - a glyph-position snapshot that extracts every show operator's font,
-
fix(fonts, #48):
CP-1252 extended characters. Base-14 Helvetica text now carries a
/ToUnicodeCMap, so the Windows-1252 high range (€ ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ
Ž ' ' " " • – — ˜ ™ š › œ ž Ÿ) is correctly extractable and searchable in any
viewer. When alatinfont is registered (Noto Sans VF), these glyphs
additionally embed and render so the Euro sign et al. are visible, not
viewer-tofu. Falls back to the correct WinAnsi byte (byte-stable) when no
latin font is registered. (src/fonts/encoding.ts) -
fix(core, tagged PDF): Per-line MCID allocation in wrapped table cells
and multi-line table captions. Previously a single MCID was allocated per
cell/caption block and reused on every wrapped line, so a wrapped cell emitted
duplicate/MCIDvalues inside one/TD//TH//Caption— a PDF/UA
(ISO 14289-1 §7.10) structure violation flagged by downstream validators.
Each wrapped line now receives a distinct, monotonically increasing MCID.
Single-line cells, the legacybuildPDF()table path, headers, footers, and
the TOC were already single-MCID and are byte-identical.
(src/core/pdf-renderers.ts)
Added
-
API:
validatePdfUA(bytes) => PdfUAValidationResult— read-only PDF/UA
(ISO 14289-1) structural checker;PdfUAValidationResultexported from the
root. -
API:
layout.maxBlocks?: numberonPdfLayoutOptionsand the exported
DEFAULT_MAX_BLOCKS(100 000) constant. -
API (shaping):
shapeTeluguText,isTeluguCodepoint,containsTelugu,
TELUGU_START,TELUGU_END, andisZeroWidthFormatexported from the root. -
API (shaping):
shapeSinhalaText,shapeTibetanText,shapeKhmerText,
shapeMyanmarText, plusis{Ethiopic,Sinhala,Tibetan,Khmer,Myanmar}Codepoint,
contains{Ethiopic,Sinhala,Tibetan,Khmer,Myanmar}, and the corresponding
*_START/*_ENDrange constants exported from the root. -
API (core):
layout.normalize?: 'NFC'|'NFD'|'NFKC'|'NFKD'|falseon
PdfLayoutOptions(defaultfalse). -
fonts: bundled
pdfnative/fonts/noto-telugu-data.{js,d.ts}(Noto Sans
Telugu, OFL-1.1);scripts/download-fonts.tsgains the Noto Sans Telugu entry. -
fonts: bundled
pdfnative/fonts/noto-{ethiopic,sinhala,tibetan,khmer,myanmar}-data.{js,d.ts}
(Noto Sans Ethiopic / Sinhala / Khmer / Myanmar + Noto Serif Tibetan, all
OFL-1.1);scripts/download-fonts.tsgains the five corresponding entries. -
samples:
scripts/generators/currency-symbols.ts(base-14 €£¥¢ +
embedded ₹₩₪₫₺₽₿ + Thai baht ฿ routed to the embedded Thai font +
multi-currency table) verifies the #48 Euro fix end to
end;alphabet-teluguscript-coverage sample plus new Telugu document
(doc-telugu.pdf), shaping (shaping-telugu.pdf) and multi-script
font-subsetting coverage; the colour-emoji showcase now
emits a third real-world document (color-emoji-real.pdf); plus five new
script-coverage samples (alphabet-ethiopic,alphabet-sinhala,
alphabet-tibetan,alphabet-khmer,alphabet-myanmar). Each of the five
new scripts also gains a dedicated per-language document
(doc-sinhala.pdf,doc-tibetan.pdf,doc-khmer.pdf,doc-myanmar.pdf,
doc-amharic.pdf) at full parity withdoc-telugu.pdf; the four shaper-backed
scripts add text-shaping deep-dives (shaping-sinhala.pdf,
shaping-tibetan.pdf,shaping-khmer.pdf,shaping-myanmar.pdf); and all
five appear in the multi-script font-subsetting and 22-script multi-language
showcases.npm run test:generatenow produces 187 sample PDFs across 32
generators. -
API:
buildPDFStreamTrue(params, layoutOptions?, streamOptions?)and
buildDocumentPDFStreamTrue(params, layoutOptions?, streamOptions?)—
AsyncGenerator<Uint8Array>. Honour the existingStreamOptions.chunkSize
(default 65 536 bytes). Reject TOC blocks and{pages}templates at the
boundary (same constraints as the other streaming entry points). -
API: the curated colour-emoji font module
pdfnative/fonts/noto-color-emoji-data.jsand its.d.ts.FontDatagains
an optionalcolorGlyphsfield; new public typesCpalColor,ColorStop,
GradientExtend,SolidPaint,LinearGradientPaint,RadialGradientPaint,
ColorPaint,ColorLayer,ColorGlyphexported from the root. -
tooling:
scripts/build-color-emoji-data.ts— converts the full
NotoColorEmoji-Regular.ttfinto a curated-subset data module via the COLR /
CPAL parser andglyfsubsetter (composite-aware, stable GIDs).
scripts/download-fonts.tsgains the Noto Color Emoji entry. -
samples:
scripts/generators/color-emoji-showcase.ts(three colour-emoji
PDFs: basic palette, mixed Latin+emoji, real-world status report) wired into
npm run test:generate.scripts/generators/use-lite-showcase.ts
renders the publicclassifyClusters()/classifyUseCategory()output for
Indic clusters;streaming-showcase.tsgainsbuildPDFStreamTrue()/
buildDocumentPDFStreamTrue()demos;bidi-embeddings-showcase.tsdocuments
the X4–X5 LRO/RLO overrides. -
docs: version numbers outside the live npm badges are now agnostic
(single source of truth:docs/assets/versions.js+[data-pn-badge]); the
nav logo no longer overflows the brand link (dedicated 1024 px breakpoint);
the extreme-scripts playground gains UAX #9 embeddings and COLRv1
colour-emoji presets; the medical playground is recalibrated to ~3.875
pages/patient and adds 5 000- and 10 000-page stress options. A new
all-scripts playground (docs/playgrounds/all-scripts.html) generates a
single PDF containing all 22 Unicode scripts plus native COLRv1 colour emoji
in the browser, showcasing automatic per-code-point font routing, BiDi,
GSUB/GPOS shaping, and subsetting. -
tests:
tests/visual/— fixtures, content/font extractor,glyf
rasteriser, grayscale PNG codec, and the two visual-regression test files;
baselines committed undertests/visual/baselines/.
Changed
- package.json: added
"./fonts/*"and"./package.json"toexports
(the documentedimport('pdfnative/fonts/...')subpaths were previously not
resolvable under Node's ESM exports map). Version bumped to1.3.0. Still
zero runtime dependencies. - refactor(core):
buildPDF()andbuildDocumentPDF()now delegate to
new internalassembleTableParts()/assembleDocumentParts()helpers that
return the rawstring[]parts; the public builders simply.join('')the
result. Byte-identical output; enables true streaming without a second
assembly path. - test: 71 test files / 1982 tests, all green. New coverage: colour-emoji
integration + module-shape, true-streaming byte-parity + constraints,
Indic ZWJ/ZWNJ/eyelash/ya-phalaa edge cases, UAX #9 X4–X5 overrides,
per-line MCID uniqueness in wrapped table cells/captions, the
visual-regression suite, the Telugu mini-shaper,validatePdfUA, the
configurablemaxBlockslimit, and the colour-emoji selector/joiner-drop and
computed-BBox fixes.
Downstream integration notes
- New public APIs:
buildPDFStreamTrue,buildDocumentPDFStreamTrue, the
colour-emojiFontData.colorGlyphsfield + colour paint types, the
pdfnative/fonts/noto-color-emoji-data.jssubpath,validatePdfUA(+
PdfUAValidationResult),layout.maxBlocks/DEFAULT_MAX_BLOCKS, the Telugu
shaper surface (shapeTeluguText,isTeluguCodepoint,containsTelugu,
TELUGU_START/TELUGU_END),isZeroWidthFormat, and the
pdfnative/fonts/noto-telugu-data.jssubpath. No APIs were removed or
changed in a breaking way. - pdfnative-mcp and pdfnative-cli reach 1.0.0 alongside this
release; both pinpdfnative@^1.3.0. Colour emoji is opt-in in both via the
existing font-registration surface. - Behaviour shifts: none for existing code paths. Colour emoji only
activates when anemojifont withcolorGlyphsis registered; the
/ToUnicodeaddition for base-14 fonts is additive (improves extraction,
does not change rendered glyphs).