fix: eliminate panic surfaces on untrusted PDF input#70
Merged
pratyush618 merged 2 commits intomainfrom Apr 24, 2026
Merged
Conversation
Table detection sorted PDF-derived f64 coordinates with `partial_cmp(..).unwrap()`, which panics if any coordinate is NaN. Malformed PDF content streams can produce NaN in computed coordinates (e.g. degenerate matrix ops during text positioning), turning a parsing routine into a denial-of-service primitive. Switched all three table-extraction sort sites (grid.rs, lattice.rs, stream.rs) to `f64::total_cmp`, which provides a total ordering without panicking on NaN. Also replaced the `first()/last().unwrap()` bbox construction in grid.rs with direct indexing; the preceding length guard makes the invariant local and explicit.
`get_object_mut().unwrap()`, `as_dict_mut().unwrap()`, and `from_utf8().unwrap()` were used throughout the stamp, watermark, bookmarks, and PDF/UA validation paths. The lopdf accessors return Results — a malformed PDF that fails an invariant (missing page object, non-dict where a dict was expected, cycle in /K tree) would hit one of these and panic the process, rather than returning a structured PdfError. Converted each site to `?`-chained Result handling with the existing PdfError variants (Annotation / Watermark / etc.). The watermark module also replaced two `from_utf8(bytes).unwrap()` calls with top-level `const &str` names so the ASCII resource names are the source of truth and no runtime UTF-8 decoding is needed. The bookmarks and PDF/UA changes are structurally equivalent idiomatic rewrites (pattern-bind the child list, fold first/last checks into one `if let`).
6 tasks
pratyush618
added a commit
that referenced
this pull request
Apr 24, 2026
* chore: add rust-toolchain and justfile for consistent dev tooling rust-toolchain.toml pins every contributor and CI invocation to the same stable toolchain with rustfmt, clippy, and the wasm32-unknown-unknown target. Previously CI used dtolnay/rust-toolchain@stable while contributors installed their own; minor version drift between them could produce clippy lint discrepancies at merge time. justfile captures the common build / test / lint commands documented in CLAUDE.md as executable recipes. `just` (no args) prints the full list, and the common flows (build, test, check, fmt, clean-all) are one step each so local iteration matches the pre-commit chain. * chore(async): stop force-enabling signatures/validation on core paperjam-async currently only reaches into paperjam_core::render, yet its manifest force-enabled the signatures and validation features on paperjam-core for every consumer. Downstream crates that need those features (paperjam-py does, explicitly) keep working unchanged; lightweight async consumers no longer drag in the x509-parser / cms / rsa / p256 / sha1 / pkcs8 / spki / ureq / rustls / roxmltree tree. * docs: crate-level rustdoc across the workspace Every library crate now has a `//!` summary describing its scope, its entry points, and how it fits into the broader paperjam ecosystem. Uniform style: plain prose, no intra-doc links in crate-level summaries (simpler to maintain, no rustdoc link warnings to manage). Also fixes two pre-existing rustdoc warnings uncovered along the way: an `[OPTIONAL]` literal in signature/tsa.rs that rustdoc was parsing as an intra-doc link, and a bare URL in model/annotations.rs flagged for auto-linking. The PyO3 `PyDocument` and `PyPage` classes get class-level docs that clarify they are the native layer beneath the pure-Python `paperjam.Document` / `paperjam.Page` wrappers. After this commit `cargo doc --workspace --no-deps` produces zero warnings. * chore(ci): run docs workflow on PRs and install wasm-opt The docs workflow previously fired only on pushes to main, so docs regressions (broken wasm builds, Docusaurus compile errors, bad links) were invisible until after merge. Now PRs with matching paths run the full build (without deploying) so problems surface in the PR check run. Also installs binaryen, whose wasm-opt binary wasm-pack invokes automatically when present on PATH. Release-mode WASM bundles shrink by 20-30% with no code changes. Concurrency group is keyed on ref so PR builds and deploy builds don't cancel each other; the deploy job is skipped on pull_request events to preserve production pages behaviour. * docs(changelog): record [Unreleased] entries since 0.2.0 Document the audit-driven work that has landed on main but hasn't been cut into a release yet: the ZIP-entry and MCP sandbox security hardening (#69), the panic-surface cleanup in the PDF engine (#70), the form-bindings stub sync and metadata / docs refresh (#68), plus the tooling, docs, and paperjam-async feature adjustments from this polish branch. * fix(ci): install pinned binaryen release instead of apt binaryen Ubuntu's apt-shipped binaryen is ~v108, which predates the default enablement of bulk-memory and sign-extension instructions in rustc output. The result is wasm-pack invoking /usr/bin/wasm-opt on a valid modern wasm module and wasm-opt rejecting it with "[wasm-validator error] Bulk memory operation (bulk memory is disabled)" — observed on the PR #71 run. Download and install a pinned binaryen release tarball from the upstream GitHub releases page. version_119 is known-good against the current rustc and supports all default features. Future bumps change one env var. * chore(ci): verify binaryen tarball checksum and cache across runs Harden the binaryen install step that landed in the previous commit: - SHA256-pin the downloaded tarball (value verified against a local download of version_119). Guards against upstream tampering or an accidental silent swap. - Split the version-check into a dedicated Verify step so the log shows the installed wasm-opt version unambiguously. - Wrap the install in actions/cache keyed on the pinned version so subsequent runs skip the download. Saves ~3-5s per run. * fix(wasm): tell wasm-pack to enable bulk-memory and sign-ext in wasm-opt rustc 1.82+ emits bulk-memory and sign-extension instructions in its default wasm output. wasm-pack's baseline wasm-opt invocation ("-O") does not pass --enable-bulk-memory / --enable-sign-ext, so even a modern binaryen rejects the module with "Bulk memory operations require bulk memory [--enable-bulk-memory]" during validation. Configure the flags in paperjam-wasm's Cargo.toml metadata block so wasm-pack invokes wasm-opt with the right feature set. This is what was blocking CI #71 even after installing a modern binaryen. * fix(wasm): extend wasm-opt feature set to the full rustc default list Rust 1.87 / LLVM 20 enabled bulk-memory and nontrapping-fptoint in the default wasm32-unknown-unknown feature set, alongside the previously-defaulted multivalue, mutable-globals, reference-types, and sign-ext. wasm-pack's baseline "-O" invocation of wasm-opt does not pass any of them, so the optimiser rejects a perfectly valid rustc-emitted module. The previous commit only enabled bulk-memory and sign-ext, which exposed a follow-on validator error on `i32.trunc_sat_f64_s` (nontrapping-fptoint). Rather than re-play whack-a-mole for each feature, pass the full list that matches the rustc default set documented in the wasm32-unknown-unknown platform-support page. Ref: https://doc.rust-lang.org/rustc/platform-support/wasm32-unknown-unknown.html
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third batch from the codebase audit. Closes the last 🔴 high-priority security item: panic-on-input sites in the PDF engine.
Malformed or malicious PDFs could turn table detection, stamping, watermarking, bookmarks, and PDF/UA validation into denial-of-service primitives through
unwrap()-based error handling on lopdf accessors andpartial_cmpon NaN coordinates. Every site is now either a structuredPdfErroror a demonstrably-safe direct operation.Changes
fix(table)— NaN-safe coordinate sortingtable/{grid,lattice,stream}.rs— swappartial_cmp(..).unwrap()forf64::total_cmp. NaN coords from degenerate content-stream math no longer panic.table/grid.rs— replacefirst()/last().unwrap()with direct indexing guarded by the existing length check.fix(core)— structured errors for PDF dict accessstamp/mod.rs— twoget_object_mut().unwrap()+as_dict_mut().unwrap()pairs now returnPdfError::Annotation.watermark/mod.rs— same pattern on six sites plus fourfrom_utf8(..).unwrap()calls replaced byconst &strresource names (no runtime decoding).bookmarks/mod.rs—child_ids.last().unwrap()folded into anif let (Some, Some)over both ends.validation/pdf_ua.rs—elem_dict.get(b"K").unwrap()replaced by pattern-binding the matched dict.Residual unwrap audit
After this PR,
grep -n "unwrap()"acrosstable/,stamp/,watermark/,bookmarks/,validation/in paperjam-core returns zero hits. The remaining unwraps across paperjam-core (encryption key derivation, Mutex locks) are either genuinely infallible (compile-time-known key lengths) or a separate refactor (poison-safe locking).What's still outstanding from the audit
paperjam-epub→paperjam-html) — needs direction: amend CLAUDE.md or extract shared helper?paperjam-studioscope — currently a static server, not wired to the engine.rust-toolchain.toml, CHANGELOG, crate docs,paperjam-asyncfeature drag,justfile, etc.).Test plan
cargo test --workspace— 11 existing tests still pass, no regressionscargo clippy --workspace --all-targets -- -D warningscargo fmt --all --checkuv run pytest tests/python/— 88 passed, 4 skippedpre-commit run --all-files— all hooks pass