Skip to content

Releases: Xero-Team/zpdf

v0.6.0

19 Jun 07:33

Choose a tag to compare

zpdf v0.6.0

A feature release, all with zero C/C++ dependencies: interactive AcroForm
support with generated field appearances, password-protected document
decryption (user/owner passwords), type 4–7 mesh shadings, the full predefined
CJK CMap families (Big5 / Shift-JIS / KSC / GBK / EUC-JP), and DeviceCMYK colour
fidelity.

Password-protected documents (non-empty user/owner password)

Encrypted PDFs that require a password now open with one supplied — previously
only the empty-password case decrypted. The cryptographic core (RC4/AES-128/256,
MD5/SHA-2, the R6 hardened hash, key-derivation Algorithms 2 / 2.A / 2.B) was
already in place and tested; this wires a real password through it. Still zero
C/C++ dependencies.

  • New API: PdfDocument::open_with_password(data, pw) (and
    _and_limits); PdfFile::parse_with_password; PdfDocument::is_encrypted().
    A wrong password returns the new Error::WrongPassword; the password may be
    the user or the owner password. The default open() is unchanged.
  • Owner-password recovery (zpdf-parser/src/crypt.rs): RC4 documents now
    authenticate the owner password too — Algorithm 7 derives the owner key,
    RC4-decrypts /O to recover the user password (single pass for R2, the 20
    reverse-counter passes for R≥3), then re-derives the file key via Algorithm 2.
    authenticate_rc4 tries the password as user (Algorithm 6) then owner. V5
    (AES-256) already had the owner path; it now uses the supplied password.
  • Robustness preserved: the empty-password default open stays lenient — an
    RC4 document whose /U doesn't validate under the empty password still opens
    best-effort (with a warning), so the malformed/adversarial corpus is
    unaffected. Only an explicitly-supplied non-empty password that authenticates
    as neither user nor owner raises WrongPassword.
  • CLI: a --password <pw> flag on info / dump / render / text /
    forms; render notes when a document is encrypted and no password was given.
  • Verified by new unit tests: a hand-built RC4 V2/R3-128 PDF with distinct user
    and owner passwords decrypts under either (owner via Algorithm 7 recovery),
    a wrong password returns WrongPassword, and the empty-password default open
    degrades without erroring (no corpus regression).

Interactive forms (AcroForm)

Interactive form fields now have a field model and, crucially for a renderer,
generated appearances — a text or choice field whose producer left no /AP
stream (or set /NeedAppearances) is now drawn with its value, instead of
rendering blank. Still zero C/C++ dependencies.

  • Field model (zpdf-document/src/forms.rs, new): walks /Root /AcroForm /Fields (with /Kids recursion, cycle/depth guards), resolving the tree into
    terminal FormFields with fully-qualified names (/T partials joined by
    .) and inherited /FT /V /DA /Ff /Q (PDF 12.7.3.2). Each field
    records its widget-annotation ids, its kind (Tx/Btn/Ch/Sig), value
    (string / name / multi-select list, UTF-16BE-aware), flags, /MaxLen, and
    /Opt. Exposed as PdfDocument::acro_form() and a new zpdf forms <file>
    CLI command that lists fields, types, and values.
  • Appearance generation (forms.rs + zpdf-content annotation painter):
    for text and choice fields needing one, a form-XObject appearance is
    synthesized and painted through the existing /AP path (a synthetic
    PdfStream replayed by do_form_xobject, so both CPU and wgpu backends
    render it with no backend changes
    ). It honors the /DA font / size / color
    (size 0 auto-fits height then width), /Q justification (left / center /
    right), and the multiline, comb (/MaxLen cells), and list-box layout
    modes. The /DA font name resolves through the AcroForm /DR font resources,
    falling back to a synthesized standard Helvetica (load_form_fonts now also
    loads inline font dicts). Content is emitted as WinAnsi single-byte text.
  • Non-regressing by construction: generation fires only when the widget has
    no usable /AP or /NeedAppearances is set; an existing producer
    appearance is otherwise kept untouched. Buttons (checkbox/radio) keep their
    supplied /AP states — only the /AS selection is hardened to fall back to
    the field /V when /AS is absent. Password and push-button fields never
    generate. Bounded against adversarial forms (field-count / depth / value-length
    caps, visited-set cycle guard), consistent with the existing anti-hang budgets.
  • Verified by unit tests (field-tree FQN + inheritance + widget mapping, DA
    parsing, UTF-16BE values, comb/escape helpers) and end-to-end CPU render
    acceptance tests (a text field's value rasterizes to glyphs inside its rect
    via both the /DR font and the Helvetica fallback; an existing /AP is not
    overridden).

Mesh shadings (types 4–7)

The four mesh shading types now decode and render, completing the shading
family (sh and shading-pattern fills), still with zero C/C++ dependencies.

  • Type 4 (free-form Gouraud triangle mesh) and Type 5 (lattice-form
    Gouraud mesh) — the packed vertex bit-stream is decoded MSB-first with
    per-vertex byte alignment; type 4 follows the edge-flag triangle strip
    (f=0 starts a triangle, f=1/f=2 reuse the previous triangle's vbc/vac
    side), type 5 triangulates /VerticesPerRow rows pairwise.
  • Type 6 (Coons patch mesh) and Type 7 (tensor-product patch mesh) —
    per-patch byte-aligned records with the f=1/2/3 shared-edge control-point /
    corner-colour reuse table; the Coons surface is evaluated directly as
    S = SC + SD − SB, the tensor surface as a bicubic over all 16 control
    points (interior points placed per the ISO §8.7.4.5.8 grid). Patches are
    tessellated into a triangle grid.
  • Implemented in a new zpdf-content/src/mesh.rs (decoder + tessellation) plus
    a Gouraud triangle rasterizer in shading.rs. Meshes rasterize through the
    existing shading→image path, so both the CPU and wgpu backends render them
    with no backend changes
    . Decoded with the spec's image-/Decode mapping;
    with a /Function the single parametric value per vertex is mapped through
    the function. Vertex colours are resolved to RGB then interpolated
    (barycentric per triangle, bilinear per patch) — matching pdf.js/pdfium.
  • Robustness: a 32-bit coordinate divisor computed in u64 (no overflow), a
    first-patch-flag≠0 guard, graceful truncation of incomplete trailing
    triangles/rows/patches, and a 2M-triangle ceiling — consistent with the
    existing anti-hang budgets. Verified by unit test vectors (hand-computed
    decode + interpolation) and CPU end-to-end render tests.

Predefined CJK byte-encoded CMaps (Big5 / Shift-JIS / KSC / GBK / EUC-JP)

Completes the predefined-CMap support that previously covered only GBpc-EUC
(GB2312). The remaining legacy byte-encoded families — used by non-embedded CJK
fonts — no longer fall back to Identity-H (which produced wrong/blank glyphs);
they now decode correctly for both rendering and text extraction. Still zero
C/C++ dependencies.

  • New encodings: GBK (GBK-EUC, GBKp-EUC, GBK2K, GB-EUC), Big5
    (B5pc, ETen-B5, ETenms-B5, HKscs-B5), Shift-JIS (the *-RKSJ family:
    90ms/90msp/90pv/83pv/Add/Ext), EUC-KR / UHC (KSC-EUC,
    KSCms-UHC, KSCpc-EUC), and EUC-JP (EUC-H/V) — each in both -H and -V
    writing modes.
  • How it works: CidCMap gains a LegacyEncoding enum. Each encoding
    declares its codespace (so next_code segments mixed 1-/2-byte text — including
    the Shift-JIS single-byte half-width katakana block 0xA1–0xDF and the EUC-JP
    SS2 kana lead 0x8E) and a 2-byte → Unicode table. For a substituted
    (non-embedded) face the code is decoded to Unicode and the glyph resolves
    through the face's Unicode cmap; the system-font substitution already picks
    the right CJK face from the descendant's /CIDSystemInfo /Ordering
    (GB1/CNS1/Japan1/Korea1). 1-byte ASCII keeps a CID range for /W Latin
    advances; 2-byte CJK falls to /DW (full width), matching the GBpc precedent.
  • Tables: baked, sorted (u16, u16) slices (binary-searchable) generated by
    crates/zpdf-font/tools/gen_cjk_tables.py from the Python standard-library
    codecs (gbk, cp950, cp932, cp949, euc_jp) — the same technique used
    for the hand-baked gb2312.rs, so no new runtime dependency.
  • Scope: embedded fonts that use a predefined byte-encoded CMap (rare —
    embedders almost always re-encode to Identity-H) keep the existing CID path
    and are not in scope. Verified by unit tests (segmentation + decode +
    name classification per encoding) and end-to-end text/render round-trips
    for Big5, GBK, Shift-JIS (incl. half-width kana), EUC-KR, and EUC-JP.

DeviceCMYK colour fidelity

DeviceCMYK without an ICC profile previously used the crude (1−c)(1−k)
conversion, which renders oversaturated, unlike a reference viewer. It now uses
the Adobe DeviceCMYK→sRGB polynomial approximation (fitted to US Web Coated
SWOP — the same one Acrobat and pdf.js use), so colours match a reference
renderer. Pure cyan goes from (0, 255, 255) to (0, 185, 242); pure yellow
to (255, 235, 61). Most visibly, 100 % K renders as a dark near-black
(44, 46, 53), not pure black
— ink impurity, matching Acrobat.

  • Single source of truth in zpdf_color::cmyk_to_rgb (inputs clamped to 0..1).
    Applies to DeviceCMYK fills/strokes, raw CMYK images (Flate/LZW, via
    zpdf-image, which delegates), Indexed-over-CMYK palettes, Separation/DeviceN
    tint transforms whose alternate space is DeviceCMYK, and — so the filter
    pipeline stays consistent — the Adobe-YCCK JPEG decode arm in zpdf-parser
    (ycck_to_rgb now recovers the true CMYK and runs the polynomial instead of
    the old (1−c)(1−k) ink weighting). No new third-party dependency.
  • Unchanged: DeviceCMYK with an ICC/De...
Read more

v0.5.0

15 Jun 10:22

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.4.0

11 Jun 15:32

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/Xero-Team/zpdf/commits/v0.4.0

What's Changed

New Contributors

Full Changelog: https://github.com/Xero-Team/zpdf/commits/v0.4.0

What's Changed

New Contributors

Full Changelog: https://github.com/Xero-Team/zpdf/commits/v0.4.0