Release v1.3.0 — COLRv1 colour emoji, USE-lite shaper integration, true constant-memory streaming, UAX #9 X4–X5 overrides, pixel-diff visual regression, #48 CP-1252 fix · Nizoka/pdfnative

Closes issue #48 (CP-1252
extended characters not extractable under base-14 Helvetica) and delivers the
complete v1.3.0 roadmap with zero deferrals: COLRv1 colour emoji,
USE-lite shaper integration, true constant-memory streaming, UAX #9 X4–X5
character-level overrides, and a dual-mode pixel-diff visual-regression suite.
100% backward-compatible — every new feature is additive or opt-in, and
pre-existing PDFs are byte-identical for unchanged code paths.

Still zero runtime dependencies. 71 test files / 1982 tests, all green.

Highlights

feat(shaping): Telugu script (te). A new pure-JS GSUB/GPOS
mini-shaper (src/shaping/telugu-shaper.ts) brings Telugu (~95 M speakers,
ISO 15924 Telu, U+0C00–U+0C7F) to pdfnative’s 16 existing scripts. It builds
virama-mediated conjunct clusters, forms subjoined-consonant ligatures via the
shared gsub-driver, and positions above/below vowel signs and modifiers via
the shared gpos-positioner — with no reph and no pre-base reordering
(Telugu specifics). Bundled font pdfnative/fonts/noto-telugu-data.js (Noto
Sans Telugu, OFL-1.1). Real-font shaping of తెలుగు / నమస్తే / క్షి /
శ్రీ / జ్ఞ produces zero .notdef and correct conjuncts. Opt-in via
registerFont('te', () => import('pdfnative/fonts/noto-telugu-data.js')).
(src/shaping/telugu-shaper.ts)
feat(shaping): Five underserved scripts — Amharic/Ethiopic (am),
Sinhala (si), Tibetan (bo), Khmer (km), Myanmar (my). Five new
pure-JS mini-shapers extend pdfnative from 17 to 22 Unicode scripts,
following the Telugu model (shared gsub-driver + gpos-positioner,
zero-dependency, pure functions). Ethiopic (U+1200–U+137F) is a syllabic
abugida needing no reordering — detection + font routing only.
Sinhala (src/shaping/sinhala-shaper.ts, U+0D80–U+0DFF) builds
virama conjuncts, reorders the pre-base kombuva (U+0D9A-class), and
decomposes two-part vowels. Tibetan
(src/shaping/tibetan-shaper.ts, U+0F00–U+0FFF) performs vertical
subjoined-consonant stacking. Khmer
(src/shaping/khmer-shaper.ts, U+1780–U+17FF) is USE-lite — coeng
subscripts, pre-base vowels, two-part vowel decomposition. Myanmar
(src/shaping/myanmar-shaper.ts, U+1000–U+109F) is USE-lite — medials,
pre-base medial-ra (U+103C) and e-vowel (U+1031), virama stacking. Khmer and
Myanmar are pragmatic USE-lite implementations with documented limitations
(two-part-vowel MultipleSubst is handled JS-side via shaper decomposition
tables, not the OpenType extractor). Bundled fonts (all OFL-1.1):
noto-ethiopic-data.js, noto-sinhala-data.js, noto-tibetan-data.js
(Noto Serif Tibetan), noto-khmer-data.js, noto-myanmar-data.js. Opt-in via
registerFont('am'|'si'|'bo'|'km'|'my', () => import('pdfnative/fonts/...')).
feat(core): Opt-in Unicode normalization (layout.normalize).
PdfLayoutOptions.normalize?: 'NFC'|'NFD'|'NFKC'|'NFKD'|false (default
false) applies native String.prototype.normalize to text before encoding,
so decomposed input (e.g. combining diacritics) composes to the form the
embedded font expects. Off by default → byte-identical output for existing
callers. (src/core/encoding-context.ts)
fix(crypto): CSPRNG-only randomness. PDF encryption now throws
if no cryptographically secure random source (crypto.getRandomValues) is
available, instead of silently falling back to Math.random. Encryption keys
and IVs are never derived from a non-CSPRNG source.
(src/core/pdf-encrypt.ts)
feat(core): Configurable document block limit (layout.maxBlocks).
The previously hard-coded 10 000-block safety cap in assembleDocumentParts()
is now configurable and the default raised to 100 000 (DEFAULT_MAX_BLOCKS).
Large reports (e.g. multi-thousand-page medical or financial documents) no
longer hit a spurious ceiling; callers can raise or lower it per document via
layout.maxBlocks. The over-limit error now names the active limit and how to
change it. (src/core/pdf-document.ts, src/core/pdf-layout.ts)
feat(parser): validatePdfUA() — PDF/UA structural validator. A new
read-only, zero-byte-risk developer gate (ISO 14289-1) that parses a PDF and
checks /MarkInfo /Marked, /StructTreeRoot + /ParentTree, /Metadata,
/Lang, and per-page /MCID uniqueness. Returns
{ valid, errors, warnings }. Complements (does not replace) veraPDF.
(src/parser/pdf-ua-validator.ts)
fix(shaping, colour emoji): No more tofu from selectors/joiners.
Emoji variation selectors (VS-15/VS-16), the ZWJ/ZWNJ, and Fitzpatrick
skin-tone modifiers that no registered font covers are now dropped during
run-splitting instead of resolving to .notdef (the  box). Joiners are
still preserved when an Indic shaper font maps them. New isZeroWidthFormat()
predicate. (src/shaping/multi-font.ts, src/shaping/script-registry.ts)
fix(core, colour emoji): Computed Form /BBox (no clipping).
renderColorGlyph() now derives each colour-glyph Form /BBox from the
transformed contour bounds rather than the hard-coded em box, so colour emoji
that dip below the baseline are no longer clipped. (src/core/pdf-color-glyph.ts)
feat(fonts): COLRv1 colour emoji. Noto Color Emoji (OFL-1.1) is
bundleable as a curated subset (pdfnative/fonts/noto-color-emoji-data.js,
221 colour glyphs, 936 KB). COLR v0 solid layers and COLR v1 linear / radial
gradients render as native PDF Form XObjects (/Shading Type 2/3 +
/ExtGState constant-alpha), one indirect object per unique glyph,
forward-referenced into every page /XObject. The COLR / CPAL / glyf
parsers are self-written and zero-dependency. Opt-in:
registerFont('emoji', () => import('pdfnative/fonts/noto-color-emoji-data.js')).
When not registered, monochrome emoji (Noto Emoji, v1.1.0) is unchanged and
documents are byte-identical. Sweep gradients + Porter-Duff compositing fall
back gracefully to monochrome (documented limitation). (src/core/color-emoji.ts, src/fonts/colr-parser.ts, src/fonts/glyf-outline.ts)
feat(core): True constant-memory streaming. buildPDFStreamTrue()
and buildDocumentPDFStreamTrue() assemble the PDF into its raw object/
framing parts and yield fixed-size Uint8Array chunks while freeing each
part as it is emitted — the fully-joined PDF binary never materialises in
memory. Peak memory is bounded by the chunk size plus the single largest
part (a content stream or embedded font subset). Byte-identical output to
buildDocumentPDFBytes() / buildPDFBytes(). The v1.2.0 object-boundary
variants (*StreamPageByPage) and fixed-size variants (*Stream) are
retained. (src/core/pdf-stream-writer.ts)
feat(shaping): UAX #9 X4–X5 overrides. resolveBidiRuns() now
performs full character-level direction overrides inside LRO / RLO scopes —
every codepoint within the scope is forced to strong L (LRO) or strong R
(RLO) before the W/N/L rules run, not merely the base paragraph direction
(the v1.2.0 behaviour). Nested embeddings and isolates recurse correctly.
(src/shaping/bidi.ts)
feat(shaping): USE-lite shaper integration. The v1.2.0 cluster
classifier (classifyUseCategory()) is now the joiner-classification
authority across the Devanagari, Bengali, and Tamil shapers. Orphan ZWJ /
ZWNJ no longer reach the cmap as .notdef; ZWJ between a halant/pulli and
the next consonant continues a conjunct (half-form, eyelash-ra, ya-phalaa)
while ZWNJ breaks it keeping a visible virama. (src/shaping/use-lite.ts)
test(visual): Dual-mode pixel-diff visual regression. Two
complementary guards over self-contained extreme-script fixtures (Tamil,
Bengali + Devanagari, Arabic) built with the real bundled fonts:
1. a glyph-position snapshot that extracts every show operator's font,
  size, baseline x/y, and glyph IDs into a committed JSON baseline; and
2. a rendered-glyph pixel diff that parses the embedded FontFile2
  outlines, scan-fills the shaped glyphs at their positions into a
  grayscale bitmap, and compares against a committed PNG baseline
  (≤1% pixel tolerance) using a self-written, zero-dependency grayscale PNG
  encoder/decoder.
A CI workflow (visual-regression.yml)
runs both, gated on src/shaping/**, src/fonts/**, src/core/**, and
fonts/**. (tests/visual/)
fix(fonts, #48):
CP-1252 extended characters. Base-14 Helvetica text now carries a
/ToUnicode CMap, so the Windows-1252 high range (€ ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ
Ž ' ' " " • – — ˜ ™ š › œ ž Ÿ) is correctly extractable and searchable in any
viewer. When a latin font is registered (Noto Sans VF), these glyphs
additionally embed and render so the Euro sign et al. are visible, not
viewer-tofu. Falls back to the correct WinAnsi byte (byte-stable) when no
latin font is registered. (src/fonts/encoding.ts)
fix(core, tagged PDF): Per-line MCID allocation in wrapped table cells
and multi-line table captions. Previously a single MCID was allocated per
cell/caption block and reused on every wrapped line, so a wrapped cell emitted
duplicate /MCID values inside one /TD / /TH / /Caption — a PDF/UA
(ISO 14289-1 §7.10) structure violation flagged by downstream validators.
Each wrapped line now receives a distinct, monotonically increasing MCID.
Single-line cells, the legacy buildPDF() table path, headers, footers, and
the TOC were already single-MCID and are byte-identical.
(src/core/pdf-renderers.ts)

Added

API: validatePdfUA(bytes) => PdfUAValidationResult — read-only PDF/UA
(ISO 14289-1) structural checker; PdfUAValidationResult exported from the
root.
API: layout.maxBlocks?: number on PdfLayoutOptions and the exported
DEFAULT_MAX_BLOCKS (100 000) constant.
API (shaping): shapeTeluguText, isTeluguCodepoint, containsTelugu,
TELUGU_START, TELUGU_END, and isZeroWidthFormat exported from the root.
API (shaping): shapeSinhalaText, shapeTibetanText, shapeKhmerText,
shapeMyanmarText, plus is{Ethiopic,Sinhala,Tibetan,Khmer,Myanmar}Codepoint,
contains{Ethiopic,Sinhala,Tibetan,Khmer,Myanmar}, and the corresponding
*_START / *_END range constants exported from the root.
API (core): layout.normalize?: 'NFC'|'NFD'|'NFKC'|'NFKD'|false on
PdfLayoutOptions (default false).
fonts: bundled pdfnative/fonts/noto-telugu-data.{js,d.ts} (Noto Sans
Telugu, OFL-1.1); scripts/download-fonts.ts gains the Noto Sans Telugu entry.
fonts: bundled pdfnative/fonts/noto-{ethiopic,sinhala,tibetan,khmer,myanmar}-data.{js,d.ts}
(Noto Sans Ethiopic / Sinhala / Khmer / Myanmar + Noto Serif Tibetan, all
OFL-1.1); scripts/download-fonts.ts gains the five corresponding entries.
samples: scripts/generators/currency-symbols.ts (base-14 €£¥¢ +
embedded ₹₩₪₫₺₽₿ + Thai baht ฿ routed to the embedded Thai font +
multi-currency table) verifies the #48 Euro fix end to
end; alphabet-telugu script-coverage sample plus new Telugu document
(doc-telugu.pdf), shaping (shaping-telugu.pdf) and multi-script
font-subsetting coverage; the colour-emoji showcase now
emits a third real-world document (color-emoji-real.pdf); plus five new
script-coverage samples (alphabet-ethiopic, alphabet-sinhala,
alphabet-tibetan, alphabet-khmer, alphabet-myanmar). Each of the five
new scripts also gains a dedicated per-language document
(doc-sinhala.pdf, doc-tibetan.pdf, doc-khmer.pdf, doc-myanmar.pdf,
doc-amharic.pdf) at full parity with doc-telugu.pdf; the four shaper-backed
scripts add text-shaping deep-dives (shaping-sinhala.pdf,
shaping-tibetan.pdf, shaping-khmer.pdf, shaping-myanmar.pdf); and all
five appear in the multi-script font-subsetting and 22-script multi-language
showcases. npm run test:generate now produces 187 sample PDFs across 32
generators.
API: buildPDFStreamTrue(params, layoutOptions?, streamOptions?) and
buildDocumentPDFStreamTrue(params, layoutOptions?, streamOptions?) —
AsyncGenerator<Uint8Array>. Honour the existing StreamOptions.chunkSize
(default 65 536 bytes). Reject TOC blocks and {pages} templates at the
boundary (same constraints as the other streaming entry points).
API: the curated colour-emoji font module
pdfnative/fonts/noto-color-emoji-data.js and its .d.ts. FontData gains
an optional colorGlyphs field; new public types CpalColor, ColorStop,
GradientExtend, SolidPaint, LinearGradientPaint, RadialGradientPaint,
ColorPaint, ColorLayer, ColorGlyph exported from the root.
tooling: scripts/build-color-emoji-data.ts — converts the full
NotoColorEmoji-Regular.ttf into a curated-subset data module via the COLR /
CPAL parser and glyf subsetter (composite-aware, stable GIDs).
scripts/download-fonts.ts gains the Noto Color Emoji entry.
samples: scripts/generators/color-emoji-showcase.ts (three colour-emoji
PDFs: basic palette, mixed Latin+emoji, real-world status report) wired into
npm run test:generate. scripts/generators/use-lite-showcase.ts
renders the public classifyClusters() / classifyUseCategory() output for
Indic clusters; streaming-showcase.ts gains buildPDFStreamTrue() /
buildDocumentPDFStreamTrue() demos; bidi-embeddings-showcase.ts documents
the X4–X5 LRO/RLO overrides.
docs: version numbers outside the live npm badges are now agnostic
(single source of truth: docs/assets/versions.js + [data-pn-badge]); the
nav logo no longer overflows the brand link (dedicated 1024 px breakpoint);
the extreme-scripts playground gains UAX #9 embeddings and COLRv1
colour-emoji presets; the medical playground is recalibrated to ~3.875
pages/patient and adds 5 000- and 10 000-page stress options. A new
all-scripts playground (docs/playgrounds/all-scripts.html) generates a
single PDF containing all 22 Unicode scripts plus native COLRv1 colour emoji
in the browser, showcasing automatic per-code-point font routing, BiDi,
GSUB/GPOS shaping, and subsetting.
tests: tests/visual/ — fixtures, content/font extractor, glyf
rasteriser, grayscale PNG codec, and the two visual-regression test files;
baselines committed under tests/visual/baselines/.

Changed

package.json: added "./fonts/*" and "./package.json" to exports
(the documented import('pdfnative/fonts/...') subpaths were previously not
resolvable under Node's ESM exports map). Version bumped to 1.3.0. Still
zero runtime dependencies.
refactor(core): buildPDF() and buildDocumentPDF() now delegate to
new internal assembleTableParts() / assembleDocumentParts() helpers that
return the raw string[] parts; the public builders simply .join('') the
result. Byte-identical output; enables true streaming without a second
assembly path.
test: 71 test files / 1982 tests, all green. New coverage: colour-emoji
integration + module-shape, true-streaming byte-parity + constraints,
Indic ZWJ/ZWNJ/eyelash/ya-phalaa edge cases, UAX #9 X4–X5 overrides,
per-line MCID uniqueness in wrapped table cells/captions, the
visual-regression suite, the Telugu mini-shaper, validatePdfUA, the
configurable maxBlocks limit, and the colour-emoji selector/joiner-drop and
computed-BBox fixes.

Downstream integration notes

New public APIs: buildPDFStreamTrue, buildDocumentPDFStreamTrue, the
colour-emoji FontData.colorGlyphs field + colour paint types, the
pdfnative/fonts/noto-color-emoji-data.js subpath, validatePdfUA (+
PdfUAValidationResult), layout.maxBlocks / DEFAULT_MAX_BLOCKS, the Telugu
shaper surface (shapeTeluguText, isTeluguCodepoint, containsTelugu,
TELUGU_START/TELUGU_END), isZeroWidthFormat, and the
pdfnative/fonts/noto-telugu-data.js subpath. No APIs were removed or
changed in a breaking way.
pdfnative-mcp and pdfnative-cli reach 1.0.0 alongside this
release; both pin pdfnative@^1.3.0. Colour emoji is opt-in in both via the
existing font-registration surface.
Behaviour shifts: none for existing code paths. Colour emoji only
activates when an emoji font with colorGlyphs is registered; the
/ToUnicode addition for base-14 fonts is additive (improves extraction,
does not change rendered glyphs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.3.0 — COLRv1 colour emoji, USE-lite shaper integration, true constant-memory streaming, UAX #9 X4–X5 overrides, pixel-diff visual regression, #48 CP-1252 fix

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Added

Changed

Downstream integration notes

Uh oh!