Skip to content

Releases: QrCommunication/gigapdf-lib

gigapdf-lib v0.72.0

23 Jun 22:55

Choose a tag to compare

Fidelity release focused on text extraction and AcroForm rendering on dense
government forms (CERFA). The public API is additive — existing behaviour is
unchanged except where noted as a fix.

Added

  • FormField now surfaces its text-formatting metadata. Each field exposes
    comb (the /Ff comb flag for fixed-pitch character cells), quadding (the
    /Q justification: 0 left, 1 centre, 2 right), and the default-appearance font
    and size parsed from the field's /DA string as daFont / daSize. This lets
    a host reproduce a field's intended layout (combed cells, alignment, font
    metrics) without re-parsing the appearance stream.

Fixed

  • Spurious in-word spaces during text extraction. A new gap-aware
    runs_join helper now drives all four reconstruction paths (lines,
    paragraphs, lists, tables): a word split across several font runs no longer
    emits a phantom space at each run boundary (e.g. ENFANT S is reassembled as
    ENFANTS). Spacing is decided from the real inter-run gap, not the mere fact
    that the text changed font.
  • Form-field appearances double-rendered behind the editable text. A
    widget_appearances flag makes renderPageNoText / renderPageExcluding
    omit the /AP appearance streams of AcroForm widgets, so a filled field's
    baked-in value no longer shows through underneath the live, editable overlay.
  • Borderless prose misdetected as a table. A line_has_gutter guard now
    requires a real inter-cell gutter before promoting a borderless block to a
    table: a two-run-per-line prose notice is kept as prose, while genuine tables
    (wide column gutters) are still recognised.

gigapdf-lib v0.71.1

23 Jun 20:35

Choose a tag to compare

Documentation-only patch. No code changes — the WASM blob is byte-for-byte
identical to 0.71.0.

Documentation

  • Complete overhaul of the SDK documentation for 0.71: API reference (signature
    matrix for B / B-T / LTV signing, full ~263-method surface, removal of the
    phantom OCR methods doc.ocr / ocrText / extractText), USAGE guide (the
    four signing-signature levels + the host-fetch two-phase model + an SSRF note),
    COOKBOOK (added signTimestamped / signLtv recipes and an image-watermark
    recipe), plus the README and sdk/README (npm). No behavioural change — the
    WASM is identical to 0.71.0.

gigapdf-lib v0.71.0

23 Jun 18:56

Choose a tag to compare

Long-term validation release: PAdES-LTV builds on the B-T timestamped signatures
from 0.70 by embedding the validation material (certificate chain + revocation
responses) so a signature keeps verifying long after its certificates expire or
are revoked. The public API is additive — existing behaviour is unchanged.

Added

  • PAdES-LTV (B-LT / B-LTA). New SDK GigaPdfDoc.signLtv() (async) produces a
    long-term-validation signature: it first builds a B-T signature
    (signTimestamped), then embeds a Document Security Store (/DSS with
    /Certs, /OCSPs, /CRLs, and per-signature /VRI) carrying the revocation
    material for the certificate chain (B-LT). With archiveTimestamp it also adds
    a /DocTimeStamp document timestamp (ETSI.RFC3161 subfilter) over the whole
    updated file for B-LTA, refreshing the long-term trust anchor. The engine
    computes which OCSP/CRL endpoints to query from the certificates' AIA / CRL-DP
    extensions; the host fetches them (the WASM core has no network stack, same
    pure-data two-phase model as the TSA). OCSP requests follow RFC 6960; CRLs are
    parsed as CertificateList. The exported defaultOcspPost and defaultCrlGet
    perform the round trips via fetch, and the revocationFetch / crlFetch
    hooks let the host add auth/proxy/retries and apply its own SSRF allow-list.

Fixed

  • B-T id-aa-timeStampToken now carries the bare TimeStampToken.
    signFinishTimestamped / signTimestamped previously embedded the TSA's raw
    TimeStampResp (SEQUENCE { PKIStatusInfo, TimeStampToken }) verbatim in the
    id-aa-timeStampToken unsigned attribute. The engine now unwraps the response
    to the bare TimeStampToken (a CMS ContentInfo) before embedding it — as
    required by RFC 3161 §3.3.2 / ETSI EN 319 122 — matching the B-LTA
    document-timestamp path. Both a raw TimeStampResp and an already-unwrapped
    token are accepted (the PKIStatusInfo gate is still enforced).

gigapdf-lib v0.70.0

23 Jun 16:30

Choose a tag to compare

Fidelity + standards release: advanced (PAdES-B-T) timestamped signatures,
richer shading and JPEG decoding at the rasteriser, complex-script text shaping
for Indic writing systems, CFF flex curves, and RTF image import. The public API
is additive — existing behaviour is unchanged.

Added

  • PAdES-B-T trusted timestamps (RFC 3161). New SDK
    GigaPdfDoc.signTimestamped() (async) embeds an RFC 3161 timestamp token in
    the SignerInfo for an advanced-level PAdES-B-T signature — ETSI.CAdES.detached
    subfilter, signing-certificate-v2 (ESS) signed attribute, and the
    id-aa-timeStampToken unsigned attribute. Uses the engine's pure-data
    two-phase TSA flow (core emits the TimeStampReq, host POSTs it, core embeds
    the returned token) since the WASM core has no network stack; tsaFetch lets
    the host add auth/proxy/retries and apply its own SSRF allow-list, and the
    exported defaultTsaPost POSTs application/timestamp-query via fetch
    (e.g. FreeTSA). Signs with an imported PKCS#12 or a freshly generated
    self-signed identity.
  • Mesh shadings at the rasteriser. Free-form (type 4), lattice (type 5),
    Coons (type 6) and tensor (type 7) shadings are now rendered as Gouraud
    triangles (pure, zero-dep decoder; Coons/tensor patches tessellated per
    ISO 32000-1 §8.7.4.5.7), with per-vertex colour resolved through
    Separation/DeviceN/ICCBased/CMYK/Gray. Axial (2) and radial (3)
    shadings are unchanged.
  • Arithmetic-coded JPEG decoding. SOF9 (sequential) and SOF10 (progressive)
    JPEGs now decode via a hand-rolled ISO/IEC 10918-1 Annex MQ arithmetic decoder
    with the F.1.4 DC/AC context models and DAC conditioning. Baseline/Huffman
    paths are unchanged; lossless (SOF3/SOF11) and 12-bit Huffman (SOF1) remain
    gracefully unsupported.
  • Indic complex-script shaping. A syllabic reordering machine for the
    Brahmi-derived scripts (Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil,
    Telugu, Kannada, Malayalam) — reph and pre-base matra reordering — plus the
    missing OpenType lookups: GSUB 2 (multiple), GSUB 3 (alternate), GSUB 8
    (reverse chaining single) and GPOS 3 (cursive attachment). Latin and the
    existing contextual paths are unchanged.
  • CFF/Type2 flex operators. The Type2 charstring interpreter now implements
    the four flex operators (flex, flex1, hflex, hflex1, Adobe TN #5177),
    each emitting two cubic curves — CFF glyphs using flex no longer drop or
    mis-render contour segments.
  • RTF image import. RTF import parses the \pict group, extracting
    \pngblip/\jpegblip payloads as <img src="data:image/…;base64,…">
    (display size recovered from \picwgoal/\pichgoal), reusing the HTML
    engine's image-embed pipeline. DIB/BMP, WMF/EMF and binary \bin payloads are
    skipped (documented limits), guarded by a PNG/JPEG magic-byte check.

gigapdf-lib v0.69.0

23 Jun 11:40

Choose a tag to compare

Image-watermark release: stamp a raster image across any range of pages, with
the same ergonomics as the existing text watermark. The text watermark is
unchanged.

Added

  • Image watermark. Stamp a raster image over pages —
    addImageWatermark (SDK) / add_image_watermark (core) /
    gp_add_image_watermark (FFI). Accepts PNG / JPEG / WebP / GIF / AVIF
    source images and supports per-watermark opacity, anchoring
    (center + four corners) with offsets, rotation (about the image center),
    scaling to a target size (aspect-follow), and an optional tiling grid.
    The image XObject is embedded once and referenced on each target page,
    reusing the existing image-embed/raster-transcode pipeline. The text
    watermark and add_image behavior are unchanged.

gigapdf-lib v0.68.0

23 Jun 08:05

Choose a tag to compare

Format-reach + import/render fidelity release: the unified model now exports
Markdown / CSV / EPUB end to end, Office/ODF import preserves far more
structure, the HTML→PDF renderer gains the remaining common CSS, and several
image-codec and rendering bugs are fixed.

Added

  • Markdown / CSV / EPUB model export. The unified editable model can now be
    raised to Markdown (modelToMd), CSV (RFC 4180, modelToCsv) and
    EPUB 3 (modelToEpub), alongside the existing
    modelTo{Docx,Xlsx,Pptx,Odt,Ods,Odp,Pdf,Html,Rtf} targets (ABI
    gp_model_to_{md,csv,epub}).
  • Complete Markdown modelling. CodeBlock, Blockquote and
    HorizontalRule are first-class in the model — full Markdown round-trip
    (headings, runs, links, images, nested lists, GFM tables, code blocks,
    block-quotes, horizontal rules, footnotes, front-matter) rendered and exported
    consistently across formats.
  • Office / ODF import fidelity. DOCX/XLSX/PPTX and ODF (.odt/.ods/
    .odp)
    import now preserves images, hyperlinks, strikethrough,
    highlighting, spreadsheet formulas, grouped shapes, charts, SmartArt text and
    master/layout (theme) inheritance
    .
  • HTML / CSS → PDF — remaining common CSS. Radial and conic
    gradients, font-weight 100–900, box-shadow (blur), elliptical
    border-radius
    , dashed/dotted borders, linear-gradient and
    position: sticky.
  • OpenType text shaping. GPOS mark positioning, GSUB contextual, script
    selection and Arabic joining (complex scripts only; Latin unchanged).
  • Image codecs. SVG <text> rendering and GIF multi-frame decoding.
  • Run highlight. Character-level background is painted and emitted across
    HTML, PDF and Office output.
  • setTextRunStyle. Run-level style bake exposed in the SDK.
  • Mermaid flowchart renderer in the HTML engine (graph TD/LR, node shapes,
    typed edges + arrowheads, Sugiyama layout → PDF vectors).

Fixed

  • AVIF multi-tile decode — corrupt images > 9.4 MP. Multi-tile AVIFs were
    decoded as a single tile, garbling pixels. The AV1 spec forces multi-tile
    above ~9.4 MP, so essentially every modern phone/camera AVIF was silently
    corrupted. Each tile is now decoded independently; single-tile and existing
    fixtures are byte-for-byte unchanged (validated bit-exact vs dav1d).
  • WebP lossless (VP8L) — lossless transforms + meta-Huffman now decode real
    cwebp/libwebp lossless images correctly.

Changed

  • Non-Device colorspaces — Pattern fills and Separation/ICCBased colours
    in content streams are unified through the raster colour resolver (consistent
    with the rasterizer) instead of a device-default fallback.
  • Docs honesty — README corrected to near-zero-dependency (hand-written
    PDF/render/conversion core; RustCrypto for crypto/signatures; Boa for
    JS — the earlier from-scratch JS engine is gone), 1198 tests (was 284), and
    .wasm ~5.6 MB (was ~540 KB, before Boa was bundled).

gigapdf-lib v0.67.0

23 Jun 00:37

Choose a tag to compare

Added

  • Structured-editing ModelOps + permissions API exposed in the SDK. New
    applyModelOps variants: paragraph formatting (setParagraphStyle — align/indent/
    spacing/line-height), lists (setListLevel/setListMarker/setListOrdered),
    absolute block placement (setBlockFrame/setBlockRotation), and table styling
    (setCellShading/setRowHeight/setColWidth/setTableBorder). Table structural
    edits (insertTableRow/deleteTableRow/insertTableColumn/deleteTableColumn/
    setCellSpan + sheet row/column ops) and GigaPdfDoc permission helpers
    (permissionsToP/decodePermissions/getPermissions + saveEncrypted({ flags }))
    are now callable from JS.

Changed

  • 8 PDF permission flags are functional: /P is computed from named flags per
    ISO 32000-1 Table 22 (previously a cosmetic integer).

gigapdf-lib v0.66.0

22 Jun 23:44

Choose a tag to compare

Added

  • HTML/CSS rendering — LibreOffice-level fidelity. htmlRender gains real CSS
    grid
    (fr/minmax/repeat/span/auto-rows) and complete flexbox
    (basis/grow/shrink/wrap/justify/align), multi-column (column-count/columns/
    column-gap), pragmatic RTL/bidi (direction/dir, RTL block/inline/run
    layout), table fidelity (colspan/rowspan, LibreOffice-level), text styling
    (super/sub, underline, strike), @media, font shorthand and further CSS-2 coverage.
  • Document reconstruction (structuredText) — waves R1–R10. Typed + populated
    pageBlocks bodies, merged-cell spans, strikethrough, hyperlinks, paragraph
    spacing, super/subscript, document outline + figure captions, list nesting +
    continuation lines, multi-column reading order, multiple tables per page
    (connected-component split), borderless right/decimal-aligned columns, true
    decimal-tab alignment.
  • PDF permissions — 8 functional flags. getPermissions + correct /P encoding
    of the 8 standard permission bits (print, modify, copy, annotate, fill-forms,
    extract, assemble, high-res print).
  • Model structural edits. Table & sheet structural-edit ModelOps.

OCR (native gigapdf-ocr-rten crate — host-side, not bundled in the npm package)

  • Pivoted the OCR engine to PaddleOCR PP-OCR on RTen (pure-Rust ONNX, no C++/
    Tesseract): 13 printed languages incl. our own Hebrew model, with automatic
    per-line script selection.
  • Handwriting recognizer (latin_hw) — our own CRNN trained on real handwriting
    (IAM/RIMES/NorHand/…; standard nn.LSTM → dynamic-width ONNX), opt-in via
    recognize_page_handwriting / recognize_page_with(img, "latin_hw").
  • Full OCR documentation refresh (architecture, training data, SDK, cookbook).

gigapdf-lib v0.65.0

22 Jun 13:36

Choose a tag to compare

Added

  • Office→PDF phase-2 fontsofficeToPdfWith(office, fonts) (ABI
    gp_office_to_pdf_with_fonts, core office_to_pdf_with_fonts) completes the
    two-phase font flow opened by officeNeededFonts: hand back the host-fetched
    faces for the families a container references but doesn't embed (e.g.
    Carlito for a Calibri reference) and styled runs lay out + paint with the right
    metrics instead of drifting onto the bundled fallback. The supplied faces are
    merged with whatever the document embeds itself — embedded faces win on
    conflict
    — so an empty fonts array yields exactly officeToPdf's output
    (no regression). fonts uses the same packed blob as htmlRender.

gigapdf-lib v0.64.0

22 Jun 13:15

Choose a tag to compare

Office↔PDF fidelity program — import all formats → PDF and export PDF → all
formats much closer to 1:1, including complex layouts (boxes/encadrés).

Added

  • Office→PDF preserves absolute layout — presentation/box geometry is no
    longer reflowed into a flat stack. PPTX/ODP shapes, images and tables carrying
    an explicit a:xfrm / draw:frame are emitted at their exact coordinates
    (EMU/ODF units → pt), with slide backgrounds and a:schemeClr theme colours
    resolved. DOCX floating/anchored drawings (wp:anchor) and text boxes
    (w:txbxContent) become absolutely-positioned frames (the “encadrés”), and
    explicit page breaks (w:br type=page, w:pageBreakBefore, section breaks)
    are honoured.
  • XLSX/ODS render with cell styling — fonts (bold/italic/underline/size/
    colour/family), borders, alignment and row heights are read from each cell's
    style and applied at render (theme colours resolved); ODS cells were previously
    unstyled. Merges, column widths and number formats unchanged.
  • PDF→Office export preserves absolute layout — text boxes, images and vector
    rectangles/paths (fill/stroke/dash) are exported at their exact coordinates for
    PPTX/ODP/DOCX/ODT, so an exported deck/doc opened in PowerPoint/Word/Impress/
    Writer looks like the source PDF, encadrés included.
  • Office→PDF embeds the document's own fonts — a self-embedding DOCX/PPTX/
    XLSX (word|ppt|xl/fonts/*.odttf, de-obfuscated per ECMA-376 §17.8.1) or ODT/
    ODS/ODP (Fonts/*, TTF/OTF) renders with its own typefaces (exact glyphs
    and metrics, no reflow drift) instead of the bundled Liberation fallback.
  • officeNeededFonts(office) / gp_office_needed_fonts — phase-1 for
    officeToPdf: returns the fonts a container references but doesn't embed
    (HtmlFontRequest[]), so the host can fetch metric clones (Carlito↔Calibri,
    Arimo↔Arial, …) into its font cache for correct line-breaking. null for an
    unrecognized archive, [] when nothing is needed.
  • Stateful RTF renderingrtfToPdf now uses a real RTF parser with a {}
    group state stack: character styling (\b \i \ul \strike \cf \fs \f via
    font/colour tables), paragraph alignment/indents (\qc\qr\qj\li\fi), tables
    (\trowd\cell\row) and correct CP1252 (\'80→€, smart quotes, dashes) instead
    of the previous text-only extraction.