zpdf v0.6.0
A feature release, all with zero C/C++ dependencies: interactive AcroForm
support with generated field appearances, password-protected document
decryption (user/owner passwords), type 4–7 mesh shadings, the full predefined
CJK CMap families (Big5 / Shift-JIS / KSC / GBK / EUC-JP), and DeviceCMYK colour
fidelity.
Password-protected documents (non-empty user/owner password)
Encrypted PDFs that require a password now open with one supplied — previously
only the empty-password case decrypted. The cryptographic core (RC4/AES-128/256,
MD5/SHA-2, the R6 hardened hash, key-derivation Algorithms 2 / 2.A / 2.B) was
already in place and tested; this wires a real password through it. Still zero
C/C++ dependencies.
- New API:
PdfDocument::open_with_password(data, pw)(and
_and_limits);PdfFile::parse_with_password;PdfDocument::is_encrypted().
A wrong password returns the newError::WrongPassword; the password may be
the user or the owner password. The defaultopen()is unchanged. - Owner-password recovery (
zpdf-parser/src/crypt.rs): RC4 documents now
authenticate the owner password too — Algorithm 7 derives the owner key,
RC4-decrypts/Oto recover the user password (single pass for R2, the 20
reverse-counter passes for R≥3), then re-derives the file key via Algorithm 2.
authenticate_rc4tries the password as user (Algorithm 6) then owner. V5
(AES-256) already had the owner path; it now uses the supplied password. - Robustness preserved: the empty-password default open stays lenient — an
RC4 document whose/Udoesn't validate under the empty password still opens
best-effort (with a warning), so the malformed/adversarial corpus is
unaffected. Only an explicitly-supplied non-empty password that authenticates
as neither user nor owner raisesWrongPassword. - CLI: a
--password <pw>flag oninfo/dump/render/text/
forms;rendernotes when a document is encrypted and no password was given. - Verified by new unit tests: a hand-built RC4 V2/R3-128 PDF with distinct user
and owner passwords decrypts under either (owner via Algorithm 7 recovery),
a wrong password returnsWrongPassword, and the empty-password default open
degrades without erroring (no corpus regression).
Interactive forms (AcroForm)
Interactive form fields now have a field model and, crucially for a renderer,
generated appearances — a text or choice field whose producer left no /AP
stream (or set /NeedAppearances) is now drawn with its value, instead of
rendering blank. Still zero C/C++ dependencies.
- Field model (
zpdf-document/src/forms.rs, new): walks/Root /AcroForm /Fields(with/Kidsrecursion, cycle/depth guards), resolving the tree into
terminalFormFields with fully-qualified names (/Tpartials joined by
.) and inherited/FT/V/DA/Ff/Q(PDF 12.7.3.2). Each field
records its widget-annotation ids, its kind (Tx/Btn/Ch/Sig), value
(string / name / multi-select list, UTF-16BE-aware), flags,/MaxLen, and
/Opt. Exposed asPdfDocument::acro_form()and a newzpdf forms <file>
CLI command that lists fields, types, and values. - Appearance generation (
forms.rs+zpdf-contentannotation painter):
for text and choice fields needing one, a form-XObject appearance is
synthesized and painted through the existing/APpath (a synthetic
PdfStreamreplayed bydo_form_xobject, so both CPU and wgpu backends
render it with no backend changes). It honors the/DAfont / size / color
(size0auto-fits height then width),/Qjustification (left / center /
right), and the multiline, comb (/MaxLencells), and list-box layout
modes. The/DAfont name resolves through the AcroForm/DRfont resources,
falling back to a synthesized standard Helvetica (load_form_fontsnow also
loads inline font dicts). Content is emitted as WinAnsi single-byte text. - Non-regressing by construction: generation fires only when the widget has
no usable/APor/NeedAppearancesis set; an existing producer
appearance is otherwise kept untouched. Buttons (checkbox/radio) keep their
supplied/APstates — only the/ASselection is hardened to fall back to
the field/Vwhen/ASis absent. Password and push-button fields never
generate. Bounded against adversarial forms (field-count / depth / value-length
caps, visited-set cycle guard), consistent with the existing anti-hang budgets. - Verified by unit tests (field-tree FQN + inheritance + widget mapping, DA
parsing, UTF-16BE values, comb/escape helpers) and end-to-end CPU render
acceptance tests (a text field's value rasterizes to glyphs inside its rect
via both the/DRfont and the Helvetica fallback; an existing/APis not
overridden).
Mesh shadings (types 4–7)
The four mesh shading types now decode and render, completing the shading
family (sh and shading-pattern fills), still with zero C/C++ dependencies.
- Type 4 (free-form Gouraud triangle mesh) and Type 5 (lattice-form
Gouraud mesh) — the packed vertex bit-stream is decoded MSB-first with
per-vertex byte alignment; type 4 follows the edge-flag triangle strip
(f=0starts a triangle,f=1/f=2reuse the previous triangle'svbc/vac
side), type 5 triangulates/VerticesPerRowrows pairwise. - Type 6 (Coons patch mesh) and Type 7 (tensor-product patch mesh) —
per-patch byte-aligned records with thef=1/2/3shared-edge control-point /
corner-colour reuse table; the Coons surface is evaluated directly as
S = SC + SD − SB, the tensor surface as a bicubic over all 16 control
points (interior points placed per the ISO §8.7.4.5.8 grid). Patches are
tessellated into a triangle grid. - Implemented in a new
zpdf-content/src/mesh.rs(decoder + tessellation) plus
a Gouraud triangle rasterizer inshading.rs. Meshes rasterize through the
existing shading→image path, so both the CPU and wgpu backends render them
with no backend changes. Decoded with the spec's image-/Decodemapping;
with a/Functionthe single parametric value per vertex is mapped through
the function. Vertex colours are resolved to RGB then interpolated
(barycentric per triangle, bilinear per patch) — matching pdf.js/pdfium. - Robustness: a 32-bit coordinate divisor computed in
u64(no overflow), a
first-patch-flag≠0guard, graceful truncation of incomplete trailing
triangles/rows/patches, and a 2M-triangle ceiling — consistent with the
existing anti-hang budgets. Verified by unit test vectors (hand-computed
decode + interpolation) and CPU end-to-end render tests.
Predefined CJK byte-encoded CMaps (Big5 / Shift-JIS / KSC / GBK / EUC-JP)
Completes the predefined-CMap support that previously covered only GBpc-EUC
(GB2312). The remaining legacy byte-encoded families — used by non-embedded CJK
fonts — no longer fall back to Identity-H (which produced wrong/blank glyphs);
they now decode correctly for both rendering and text extraction. Still zero
C/C++ dependencies.
- New encodings: GBK (
GBK-EUC,GBKp-EUC,GBK2K,GB-EUC), Big5
(B5pc,ETen-B5,ETenms-B5,HKscs-B5), Shift-JIS (the*-RKSJfamily:
90ms/90msp/90pv/83pv/Add/Ext), EUC-KR / UHC (KSC-EUC,
KSCms-UHC,KSCpc-EUC), and EUC-JP (EUC-H/V) — each in both-Hand-V
writing modes. - How it works:
CidCMapgains aLegacyEncodingenum. Each encoding
declares its codespace (sonext_codesegments mixed 1-/2-byte text — including
the Shift-JIS single-byte half-width katakana block0xA1–0xDFand the EUC-JP
SS2 kana lead0x8E) and a 2-byte → Unicode table. For a substituted
(non-embedded) face the code is decoded to Unicode and the glyph resolves
through the face's Unicodecmap; the system-font substitution already picks
the right CJK face from the descendant's/CIDSystemInfo /Ordering
(GB1/CNS1/Japan1/Korea1). 1-byte ASCII keeps a CID range for/WLatin
advances; 2-byte CJK falls to/DW(full width), matching theGBpcprecedent. - Tables: baked, sorted
(u16, u16)slices (binary-searchable) generated by
crates/zpdf-font/tools/gen_cjk_tables.pyfrom the Python standard-library
codecs (gbk,cp950,cp932,cp949,euc_jp) — the same technique used
for the hand-bakedgb2312.rs, so no new runtime dependency. - Scope: embedded fonts that use a predefined byte-encoded CMap (rare —
embedders almost always re-encode toIdentity-H) keep the existing CID path
and are not in scope. Verified by unit tests (segmentation + decode +
name classification per encoding) and end-to-endtext/renderround-trips
for Big5, GBK, Shift-JIS (incl. half-width kana), EUC-KR, and EUC-JP.
DeviceCMYK colour fidelity
DeviceCMYK without an ICC profile previously used the crude (1−c)(1−k)
conversion, which renders oversaturated, unlike a reference viewer. It now uses
the Adobe DeviceCMYK→sRGB polynomial approximation (fitted to US Web Coated
SWOP — the same one Acrobat and pdf.js use), so colours match a reference
renderer. Pure cyan goes from (0, 255, 255) to (0, 185, 242); pure yellow
to (255, 235, 61). Most visibly, 100 % K renders as a dark near-black
(44, 46, 53), not pure black — ink impurity, matching Acrobat.
- Single source of truth in
zpdf_color::cmyk_to_rgb(inputs clamped to 0..1).
Applies to DeviceCMYK fills/strokes, raw CMYK images (Flate/LZW, via
zpdf-image, which delegates), Indexed-over-CMYK palettes, Separation/DeviceN
tint transforms whose alternate space is DeviceCMYK, and — so the filter
pipeline stays consistent — the Adobe-YCCK JPEG decode arm inzpdf-parser
(ycck_to_rgbnow recovers the true CMYK and runs the polynomial instead of
the old(1−c)(1−k)ink weighting). No new third-party dependency. - Unchanged: DeviceCMYK with an ICC/Default profile converts through the
moxcms transform (already accurate). Plain Adobe-CMYK JPEGs (APP14 transform
0/1) are still colour-converted internally byzune-jpeg, which has no raw-CMYK
output arm we can intercept — a minor residual non-fidelity for that one path. - Verified against the pdf.js coefficients, cross-checked numerically, with an
end-to-end CMYK render whose pixels match the polynomial, and per-encoding
YCCK unit vectors (white/black/gray/colour) against hand-computed references.
Full Changelog: v0.5.0...v0.6.0