Skip to content

fix: eliminate panic surfaces on untrusted PDF input#70

Merged
pratyush618 merged 2 commits intomainfrom
fix/panic-surfaces
Apr 24, 2026
Merged

fix: eliminate panic surfaces on untrusted PDF input#70
pratyush618 merged 2 commits intomainfrom
fix/panic-surfaces

Conversation

@pratyush618
Copy link
Copy Markdown
Collaborator

Summary

Third batch from the codebase audit. Closes the last 🔴 high-priority security item: panic-on-input sites in the PDF engine.

Malformed or malicious PDFs could turn table detection, stamping, watermarking, bookmarks, and PDF/UA validation into denial-of-service primitives through unwrap()-based error handling on lopdf accessors and partial_cmp on NaN coordinates. Every site is now either a structured PdfError or a demonstrably-safe direct operation.

Changes

fix(table) — NaN-safe coordinate sorting

  • table/{grid,lattice,stream}.rs — swap partial_cmp(..).unwrap() for f64::total_cmp. NaN coords from degenerate content-stream math no longer panic.
  • table/grid.rs — replace first()/last().unwrap() with direct indexing guarded by the existing length check.

fix(core) — structured errors for PDF dict access

  • stamp/mod.rs — two get_object_mut().unwrap() + as_dict_mut().unwrap() pairs now return PdfError::Annotation.
  • watermark/mod.rs — same pattern on six sites plus four from_utf8(..).unwrap() calls replaced by const &str resource names (no runtime decoding).
  • bookmarks/mod.rschild_ids.last().unwrap() folded into an if let (Some, Some) over both ends.
  • validation/pdf_ua.rselem_dict.get(b"K").unwrap() replaced by pattern-binding the matched dict.

Residual unwrap audit

After this PR, grep -n "unwrap()" across table/, stamp/, watermark/, bookmarks/, validation/ in paperjam-core returns zero hits. The remaining unwraps across paperjam-core (encryption key derivation, Mutex locks) are either genuinely infallible (compile-time-known key lengths) or a separate refactor (poison-safe locking).

What's still outstanding from the audit

  • Layer discipline (paperjam-epubpaperjam-html) — needs direction: amend CLAUDE.md or extract shared helper?
  • Rust test gap — 13/15 crates still have no tests.
  • paperjam-studio scope — currently a static server, not wired to the engine.
  • Easy polish cluster (rust-toolchain.toml, CHANGELOG, crate docs, paperjam-async feature drag, justfile, etc.).

Test plan

  • cargo test --workspace — 11 existing tests still pass, no regressions
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo fmt --all --check
  • uv run pytest tests/python/ — 88 passed, 4 skipped
  • pre-commit run --all-files — all hooks pass

Table detection sorted PDF-derived f64 coordinates with
`partial_cmp(..).unwrap()`, which panics if any coordinate is NaN.
Malformed PDF content streams can produce NaN in computed
coordinates (e.g. degenerate matrix ops during text positioning),
turning a parsing routine into a denial-of-service primitive.

Switched all three table-extraction sort sites (grid.rs, lattice.rs,
stream.rs) to `f64::total_cmp`, which provides a total ordering
without panicking on NaN. Also replaced the `first()/last().unwrap()`
bbox construction in grid.rs with direct indexing; the preceding
length guard makes the invariant local and explicit.
`get_object_mut().unwrap()`, `as_dict_mut().unwrap()`, and
`from_utf8().unwrap()` were used throughout the stamp, watermark,
bookmarks, and PDF/UA validation paths. The lopdf accessors return
Results — a malformed PDF that fails an invariant (missing page
object, non-dict where a dict was expected, cycle in /K tree) would
hit one of these and panic the process, rather than returning a
structured PdfError.

Converted each site to `?`-chained Result handling with the existing
PdfError variants (Annotation / Watermark / etc.). The watermark
module also replaced two `from_utf8(bytes).unwrap()` calls with
top-level `const &str` names so the ASCII resource names are the
source of truth and no runtime UTF-8 decoding is needed. The
bookmarks and PDF/UA changes are structurally equivalent idiomatic
rewrites (pattern-bind the child list, fold first/last checks into
one `if let`).
@github-actions github-actions Bot added the rust Pull requests that update rust code label Apr 24, 2026
@pratyush618 pratyush618 merged commit 749ee14 into main Apr 24, 2026
13 checks passed
@pratyush618 pratyush618 deleted the fix/panic-surfaces branch April 24, 2026 10:27
@pratyush618 pratyush618 restored the fix/panic-surfaces branch April 24, 2026 10:27
pratyush618 added a commit that referenced this pull request Apr 24, 2026
* chore: add rust-toolchain and justfile for consistent dev tooling

rust-toolchain.toml pins every contributor and CI invocation to the
same stable toolchain with rustfmt, clippy, and the
wasm32-unknown-unknown target. Previously CI used
dtolnay/rust-toolchain@stable while contributors installed their own;
minor version drift between them could produce clippy lint
discrepancies at merge time.

justfile captures the common build / test / lint commands documented
in CLAUDE.md as executable recipes. `just` (no args) prints the full
list, and the common flows (build, test, check, fmt, clean-all) are
one step each so local iteration matches the pre-commit chain.

* chore(async): stop force-enabling signatures/validation on core

paperjam-async currently only reaches into paperjam_core::render, yet
its manifest force-enabled the signatures and validation features on
paperjam-core for every consumer. Downstream crates that need those
features (paperjam-py does, explicitly) keep working unchanged;
lightweight async consumers no longer drag in the x509-parser / cms /
rsa / p256 / sha1 / pkcs8 / spki / ureq / rustls / roxmltree tree.

* docs: crate-level rustdoc across the workspace

Every library crate now has a `//!` summary describing its scope,
its entry points, and how it fits into the broader paperjam
ecosystem. Uniform style: plain prose, no intra-doc links in
crate-level summaries (simpler to maintain, no rustdoc link
warnings to manage).

Also fixes two pre-existing rustdoc warnings uncovered along the
way: an `[OPTIONAL]` literal in signature/tsa.rs that rustdoc was
parsing as an intra-doc link, and a bare URL in model/annotations.rs
flagged for auto-linking. The PyO3 `PyDocument` and `PyPage` classes
get class-level docs that clarify they are the native layer beneath
the pure-Python `paperjam.Document` / `paperjam.Page` wrappers.

After this commit `cargo doc --workspace --no-deps` produces zero
warnings.

* chore(ci): run docs workflow on PRs and install wasm-opt

The docs workflow previously fired only on pushes to main, so docs
regressions (broken wasm builds, Docusaurus compile errors, bad
links) were invisible until after merge. Now PRs with matching
paths run the full build (without deploying) so problems surface in
the PR check run.

Also installs binaryen, whose wasm-opt binary wasm-pack invokes
automatically when present on PATH. Release-mode WASM bundles
shrink by 20-30% with no code changes.

Concurrency group is keyed on ref so PR builds and deploy builds
don't cancel each other; the deploy job is skipped on pull_request
events to preserve production pages behaviour.

* docs(changelog): record [Unreleased] entries since 0.2.0

Document the audit-driven work that has landed on main but hasn't
been cut into a release yet: the ZIP-entry and MCP sandbox security
hardening (#69), the panic-surface cleanup in the PDF engine (#70),
the form-bindings stub sync and metadata / docs refresh (#68), plus
the tooling, docs, and paperjam-async feature adjustments from this
polish branch.

* fix(ci): install pinned binaryen release instead of apt binaryen

Ubuntu's apt-shipped binaryen is ~v108, which predates the default
enablement of bulk-memory and sign-extension instructions in rustc
output. The result is wasm-pack invoking /usr/bin/wasm-opt on a
valid modern wasm module and wasm-opt rejecting it with
"[wasm-validator error] Bulk memory operation (bulk memory is
disabled)" — observed on the PR #71 run.

Download and install a pinned binaryen release tarball from the
upstream GitHub releases page. version_119 is known-good against
the current rustc and supports all default features. Future bumps
change one env var.

* chore(ci): verify binaryen tarball checksum and cache across runs

Harden the binaryen install step that landed in the previous commit:

- SHA256-pin the downloaded tarball (value verified against a local
  download of version_119). Guards against upstream tampering or an
  accidental silent swap.
- Split the version-check into a dedicated Verify step so the log
  shows the installed wasm-opt version unambiguously.
- Wrap the install in actions/cache keyed on the pinned version so
  subsequent runs skip the download. Saves ~3-5s per run.

* fix(wasm): tell wasm-pack to enable bulk-memory and sign-ext in wasm-opt

rustc 1.82+ emits bulk-memory and sign-extension instructions in its
default wasm output. wasm-pack's baseline wasm-opt invocation ("-O")
does not pass --enable-bulk-memory / --enable-sign-ext, so even a
modern binaryen rejects the module with "Bulk memory operations
require bulk memory [--enable-bulk-memory]" during validation.

Configure the flags in paperjam-wasm's Cargo.toml metadata block so
wasm-pack invokes wasm-opt with the right feature set. This is what
was blocking CI #71 even after installing a modern binaryen.

* fix(wasm): extend wasm-opt feature set to the full rustc default list

Rust 1.87 / LLVM 20 enabled bulk-memory and nontrapping-fptoint in
the default wasm32-unknown-unknown feature set, alongside the
previously-defaulted multivalue, mutable-globals, reference-types,
and sign-ext. wasm-pack's baseline "-O" invocation of wasm-opt does
not pass any of them, so the optimiser rejects a perfectly valid
rustc-emitted module.

The previous commit only enabled bulk-memory and sign-ext, which
exposed a follow-on validator error on `i32.trunc_sat_f64_s`
(nontrapping-fptoint). Rather than re-play whack-a-mole for each
feature, pass the full list that matches the rustc default set
documented in the wasm32-unknown-unknown platform-support page.

Ref: https://doc.rust-lang.org/rustc/platform-support/wasm32-unknown-unknown.html
@pratyush618 pratyush618 mentioned this pull request Apr 24, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant