Releases: QrCommunication/gigapdf-lib
gigapdf-lib v0.72.0
Fidelity release focused on text extraction and AcroForm rendering on dense
government forms (CERFA). The public API is additive — existing behaviour is
unchanged except where noted as a fix.
Added
FormFieldnow surfaces its text-formatting metadata. Each field exposes
comb(the/Ffcomb flag for fixed-pitch character cells),quadding(the
/Qjustification: 0 left, 1 centre, 2 right), and the default-appearance font
and size parsed from the field's/DAstring asdaFont/daSize. This lets
a host reproduce a field's intended layout (combed cells, alignment, font
metrics) without re-parsing the appearance stream.
Fixed
- Spurious in-word spaces during text extraction. A new gap-aware
runs_joinhelper now drives all four reconstruction paths (lines,
paragraphs, lists, tables): a word split across several font runs no longer
emits a phantom space at each run boundary (e.g.ENFANT Sis reassembled as
ENFANTS). Spacing is decided from the real inter-run gap, not the mere fact
that the text changed font. - Form-field appearances double-rendered behind the editable text. A
widget_appearancesflag makesrenderPageNoText/renderPageExcluding
omit the/APappearance streams of AcroForm widgets, so a filled field's
baked-in value no longer shows through underneath the live, editable overlay. - Borderless prose misdetected as a table. A
line_has_gutterguard now
requires a real inter-cell gutter before promoting a borderless block to a
table: a two-run-per-line prose notice is kept as prose, while genuine tables
(wide column gutters) are still recognised.
gigapdf-lib v0.71.1
Documentation-only patch. No code changes — the WASM blob is byte-for-byte
identical to 0.71.0.
Documentation
- Complete overhaul of the SDK documentation for 0.71: API reference (signature
matrix for B / B-T / LTV signing, full ~263-method surface, removal of the
phantom OCR methodsdoc.ocr/ocrText/extractText), USAGE guide (the
four signing-signature levels + the host-fetch two-phase model + an SSRF note),
COOKBOOK (addedsignTimestamped/signLtvrecipes and an image-watermark
recipe), plus the README andsdk/README(npm). No behavioural change — the
WASM is identical to 0.71.0.
gigapdf-lib v0.71.0
Long-term validation release: PAdES-LTV builds on the B-T timestamped signatures
from 0.70 by embedding the validation material (certificate chain + revocation
responses) so a signature keeps verifying long after its certificates expire or
are revoked. The public API is additive — existing behaviour is unchanged.
Added
- PAdES-LTV (B-LT / B-LTA). New SDK
GigaPdfDoc.signLtv()(async) produces a
long-term-validation signature: it first builds a B-T signature
(signTimestamped), then embeds a Document Security Store (/DSSwith
/Certs,/OCSPs,/CRLs, and per-signature/VRI) carrying the revocation
material for the certificate chain (B-LT). WitharchiveTimestampit also adds
a/DocTimeStampdocument timestamp (ETSI.RFC3161subfilter) over the whole
updated file for B-LTA, refreshing the long-term trust anchor. The engine
computes which OCSP/CRL endpoints to query from the certificates' AIA / CRL-DP
extensions; the host fetches them (the WASM core has no network stack, same
pure-data two-phase model as the TSA). OCSP requests follow RFC 6960; CRLs are
parsed asCertificateList. The exporteddefaultOcspPostanddefaultCrlGet
perform the round trips viafetch, and therevocationFetch/crlFetch
hooks let the host add auth/proxy/retries and apply its own SSRF allow-list.
Fixed
- B-T
id-aa-timeStampTokennow carries the bareTimeStampToken.
signFinishTimestamped/signTimestampedpreviously embedded the TSA's raw
TimeStampResp(SEQUENCE { PKIStatusInfo, TimeStampToken }) verbatim in the
id-aa-timeStampTokenunsigned attribute. The engine now unwraps the response
to the bareTimeStampToken(a CMSContentInfo) before embedding it — as
required by RFC 3161 §3.3.2 / ETSI EN 319 122 — matching the B-LTA
document-timestamp path. Both a rawTimeStampRespand an already-unwrapped
token are accepted (thePKIStatusInfogate is still enforced).
gigapdf-lib v0.70.0
Fidelity + standards release: advanced (PAdES-B-T) timestamped signatures,
richer shading and JPEG decoding at the rasteriser, complex-script text shaping
for Indic writing systems, CFF flex curves, and RTF image import. The public API
is additive — existing behaviour is unchanged.
Added
- PAdES-B-T trusted timestamps (RFC 3161). New SDK
GigaPdfDoc.signTimestamped()(async) embeds an RFC 3161 timestamp token in
the SignerInfo for an advanced-level PAdES-B-T signature —ETSI.CAdES.detached
subfilter,signing-certificate-v2(ESS) signed attribute, and the
id-aa-timeStampTokenunsigned attribute. Uses the engine's pure-data
two-phase TSA flow (core emits theTimeStampReq, host POSTs it, core embeds
the returned token) since the WASM core has no network stack;tsaFetchlets
the host add auth/proxy/retries and apply its own SSRF allow-list, and the
exporteddefaultTsaPostPOSTsapplication/timestamp-queryviafetch
(e.g. FreeTSA). Signs with an imported PKCS#12 or a freshly generated
self-signed identity. - Mesh shadings at the rasteriser. Free-form (type 4), lattice (type 5),
Coons (type 6) and tensor (type 7) shadings are now rendered as Gouraud
triangles (pure, zero-dep decoder; Coons/tensor patches tessellated per
ISO 32000-1 §8.7.4.5.7), with per-vertex colour resolved through
Separation/DeviceN/ICCBased/CMYK/Gray. Axial (2) and radial (3)
shadings are unchanged. - Arithmetic-coded JPEG decoding. SOF9 (sequential) and SOF10 (progressive)
JPEGs now decode via a hand-rolled ISO/IEC 10918-1 Annex MQ arithmetic decoder
with the F.1.4 DC/AC context models andDACconditioning. Baseline/Huffman
paths are unchanged; lossless (SOF3/SOF11) and 12-bit Huffman (SOF1) remain
gracefully unsupported. - Indic complex-script shaping. A syllabic reordering machine for the
Brahmi-derived scripts (Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil,
Telugu, Kannada, Malayalam) — reph and pre-base matra reordering — plus the
missing OpenType lookups: GSUB 2 (multiple), GSUB 3 (alternate), GSUB 8
(reverse chaining single) and GPOS 3 (cursive attachment). Latin and the
existing contextual paths are unchanged. - CFF/Type2 flex operators. The Type2 charstring interpreter now implements
the four flex operators (flex,flex1,hflex,hflex1, Adobe TN #5177),
each emitting two cubic curves — CFF glyphs using flex no longer drop or
mis-render contour segments. - RTF image import. RTF import parses the
\pictgroup, extracting
\pngblip/\jpegblippayloads as<img src="data:image/…;base64,…">
(display size recovered from\picwgoal/\pichgoal), reusing the HTML
engine's image-embed pipeline. DIB/BMP, WMF/EMF and binary\binpayloads are
skipped (documented limits), guarded by a PNG/JPEG magic-byte check.
gigapdf-lib v0.69.0
Image-watermark release: stamp a raster image across any range of pages, with
the same ergonomics as the existing text watermark. The text watermark is
unchanged.
Added
- Image watermark. Stamp a raster image over pages —
addImageWatermark(SDK) /add_image_watermark(core) /
gp_add_image_watermark(FFI). Accepts PNG / JPEG / WebP / GIF / AVIF
source images and supports per-watermark opacity, anchoring
(center + four corners) with offsets, rotation (about the image center),
scaling to a target size (aspect-follow), and an optional tiling grid.
The image XObject is embedded once and referenced on each target page,
reusing the existing image-embed/raster-transcode pipeline. The text
watermark andadd_imagebehavior are unchanged.
gigapdf-lib v0.68.0
Format-reach + import/render fidelity release: the unified model now exports
Markdown / CSV / EPUB end to end, Office/ODF import preserves far more
structure, the HTML→PDF renderer gains the remaining common CSS, and several
image-codec and rendering bugs are fixed.
Added
- Markdown / CSV / EPUB model export. The unified editable model can now be
raised to Markdown (modelToMd), CSV (RFC 4180,modelToCsv) and
EPUB 3 (modelToEpub), alongside the existing
modelTo{Docx,Xlsx,Pptx,Odt,Ods,Odp,Pdf,Html,Rtf}targets (ABI
gp_model_to_{md,csv,epub}). - Complete Markdown modelling.
CodeBlock,Blockquoteand
HorizontalRuleare first-class in the model — full Markdown round-trip
(headings, runs, links, images, nested lists, GFM tables, code blocks,
block-quotes, horizontal rules, footnotes, front-matter) rendered and exported
consistently across formats. - Office / ODF import fidelity. DOCX/XLSX/PPTX and ODF (
.odt/.ods/
.odp) import now preserves images, hyperlinks, strikethrough,
highlighting, spreadsheet formulas, grouped shapes, charts, SmartArt text and
master/layout (theme) inheritance. - HTML / CSS → PDF — remaining common CSS. Radial and conic
gradients,font-weight100–900,box-shadow(blur), elliptical
border-radius, dashed/dotted borders,linear-gradientand
position: sticky. - OpenType text shaping. GPOS mark positioning, GSUB contextual, script
selection and Arabic joining (complex scripts only; Latin unchanged). - Image codecs. SVG
<text>rendering and GIF multi-frame decoding. - Run highlight. Character-level
backgroundis painted and emitted across
HTML, PDF and Office output. setTextRunStyle. Run-level style bake exposed in the SDK.- Mermaid flowchart renderer in the HTML engine (
graph TD/LR, node shapes,
typed edges + arrowheads, Sugiyama layout → PDF vectors).
Fixed
- AVIF multi-tile decode — corrupt images > 9.4 MP. Multi-tile AVIFs were
decoded as a single tile, garbling pixels. The AV1 spec forces multi-tile
above ~9.4 MP, so essentially every modern phone/camera AVIF was silently
corrupted. Each tile is now decoded independently; single-tile and existing
fixtures are byte-for-byte unchanged (validated bit-exact vsdav1d). - WebP lossless (VP8L) — lossless transforms + meta-Huffman now decode real
cwebp/libwebp lossless images correctly.
Changed
- Non-Device colorspaces — Pattern fills and
Separation/ICCBasedcolours
in content streams are unified through the raster colour resolver (consistent
with the rasterizer) instead of a device-default fallback. - Docs honesty — README corrected to near-zero-dependency (hand-written
PDF/render/conversion core; RustCrypto for crypto/signatures; Boa for
JS — the earlier from-scratch JS engine is gone), 1198 tests (was 284), and
.wasm~5.6 MB (was ~540 KB, before Boa was bundled).
gigapdf-lib v0.67.0
Added
- Structured-editing ModelOps + permissions API exposed in the SDK. New
applyModelOpsvariants: paragraph formatting (setParagraphStyle— align/indent/
spacing/line-height), lists (setListLevel/setListMarker/setListOrdered),
absolute block placement (setBlockFrame/setBlockRotation), and table styling
(setCellShading/setRowHeight/setColWidth/setTableBorder). Table structural
edits (insertTableRow/deleteTableRow/insertTableColumn/deleteTableColumn/
setCellSpan+ sheet row/column ops) andGigaPdfDocpermission helpers
(permissionsToP/decodePermissions/getPermissions+saveEncrypted({ flags }))
are now callable from JS.
Changed
- 8 PDF permission flags are functional:
/Pis computed from named flags per
ISO 32000-1 Table 22 (previously a cosmetic integer).
gigapdf-lib v0.66.0
Added
- HTML/CSS rendering — LibreOffice-level fidelity.
htmlRendergains real CSS
grid (fr/minmax/repeat/span/auto-rows) and complete flexbox
(basis/grow/shrink/wrap/justify/align), multi-column (column-count/columns/
column-gap), pragmatic RTL/bidi (direction/dir, RTL block/inline/run
layout), table fidelity (colspan/rowspan, LibreOffice-level), text styling
(super/sub, underline, strike),@media, font shorthand and further CSS-2 coverage. - Document reconstruction (
structuredText) — waves R1–R10. Typed + populated
pageBlocksbodies, merged-cell spans, strikethrough, hyperlinks, paragraph
spacing, super/subscript, document outline + figure captions, list nesting +
continuation lines, multi-column reading order, multiple tables per page
(connected-component split), borderless right/decimal-aligned columns, true
decimal-tab alignment. - PDF permissions — 8 functional flags.
getPermissions+ correct/Pencoding
of the 8 standard permission bits (print, modify, copy, annotate, fill-forms,
extract, assemble, high-res print). - Model structural edits. Table & sheet structural-edit ModelOps.
OCR (native gigapdf-ocr-rten crate — host-side, not bundled in the npm package)
- Pivoted the OCR engine to PaddleOCR PP-OCR on RTen (pure-Rust ONNX, no C++/
Tesseract): 13 printed languages incl. our own Hebrew model, with automatic
per-line script selection. - Handwriting recognizer (
latin_hw) — our own CRNN trained on real handwriting
(IAM/RIMES/NorHand/…; standardnn.LSTM→ dynamic-width ONNX), opt-in via
recognize_page_handwriting/recognize_page_with(img, "latin_hw"). - Full OCR documentation refresh (architecture, training data, SDK, cookbook).
gigapdf-lib v0.65.0
Added
- Office→PDF phase-2 fonts —
officeToPdfWith(office, fonts)(ABI
gp_office_to_pdf_with_fonts, coreoffice_to_pdf_with_fonts) completes the
two-phase font flow opened byofficeNeededFonts: hand back the host-fetched
faces for the families a container references but doesn't embed (e.g.
Carlito for a Calibri reference) and styled runs lay out + paint with the right
metrics instead of drifting onto the bundled fallback. The supplied faces are
merged with whatever the document embeds itself — embedded faces win on
conflict — so an emptyfontsarray yields exactlyofficeToPdf's output
(no regression).fontsuses the same packed blob ashtmlRender.
gigapdf-lib v0.64.0
Office↔PDF fidelity program — import all formats → PDF and export PDF → all
formats much closer to 1:1, including complex layouts (boxes/encadrés).
Added
- Office→PDF preserves absolute layout — presentation/box geometry is no
longer reflowed into a flat stack. PPTX/ODP shapes, images and tables carrying
an explicita:xfrm/draw:frameare emitted at their exact coordinates
(EMU/ODF units → pt), with slide backgrounds anda:schemeClrtheme colours
resolved. DOCX floating/anchored drawings (wp:anchor) and text boxes
(w:txbxContent) become absolutely-positioned frames (the “encadrés”), and
explicit page breaks (w:br type=page,w:pageBreakBefore, section breaks)
are honoured. - XLSX/ODS render with cell styling — fonts (bold/italic/underline/size/
colour/family), borders, alignment and row heights are read from each cell's
style and applied at render (theme colours resolved); ODS cells were previously
unstyled. Merges, column widths and number formats unchanged. - PDF→Office export preserves absolute layout — text boxes, images and vector
rectangles/paths (fill/stroke/dash) are exported at their exact coordinates for
PPTX/ODP/DOCX/ODT, so an exported deck/doc opened in PowerPoint/Word/Impress/
Writer looks like the source PDF, encadrés included. - Office→PDF embeds the document's own fonts — a self-embedding DOCX/PPTX/
XLSX (word|ppt|xl/fonts/*.odttf, de-obfuscated per ECMA-376 §17.8.1) or ODT/
ODS/ODP (Fonts/*, TTF/OTF) renders with its own typefaces (exact glyphs
and metrics, no reflow drift) instead of the bundled Liberation fallback. officeNeededFonts(office)/gp_office_needed_fonts— phase-1 for
officeToPdf: returns the fonts a container references but doesn't embed
(HtmlFontRequest[]), so the host can fetch metric clones (Carlito↔Calibri,
Arimo↔Arial, …) into its font cache for correct line-breaking.nullfor an
unrecognized archive,[]when nothing is needed.- Stateful RTF rendering —
rtfToPdfnow uses a real RTF parser with a{}
group state stack: character styling (\b \i \ul \strike \cf \fs \fvia
font/colour tables), paragraph alignment/indents (\qc\qr\qj\li\fi), tables
(\trowd\cell\row) and correct CP1252 (\'80→€, smart quotes, dashes) instead
of the previous text-only extraction.