Skip to content

chore: audit-driven cleanup (stubs, metadata, docs, release profile)#68

Merged
pratyush618 merged 5 commits intomainfrom
chore/audit-cleanup
Apr 24, 2026
Merged

chore: audit-driven cleanup (stubs, metadata, docs, release profile)#68
pratyush618 merged 5 commits intomainfrom
chore/audit-cleanup

Conversation

@pratyush618
Copy link
Copy Markdown
Collaborator

Summary

First batch of fixes from the parallel codebase audit. All low-risk, high-signal items — no behavior changes for end users, just correctness and hygiene.

  • fix(py): modify_form_field / add_form_field / fill_form.generate_appearances were PyO3-exposed but missing from _paperjam.pyi. Mypy could not see them. Closes the CLAUDE.md "stubs in sync" gap for form APIs.
  • chore: align workspace version with CHANGELOG.md (0.1.3 → 0.2.0) and add [profile.release] with thin LTO, codegen-units = 1, and symbol strip. Plus a release-with-debug profile for profiling.
  • chore(pyproject): multi-format description, readme / project.urls / extra classifiers/keywords so the PyPI page isn't blank. Drops the stale Sphinx [docs] extra (the site is Docusaurus now).
  • docs: README CLI examples used the wrong binary name (paperjam vs actual pj) and referenced a --format csv flag that doesn't exist on extract tables. installation.md still referenced a cd docs && make html Sphinx flow and the wrong GitHub org.
  • chore(gitignore): two dead entries — python/paperjam/libpdfium.so (path renamed to py_src/ long ago) and _build (Sphinx leftover).

What's not in this PR

Bigger items from the audit kept out deliberately:

  • SafeZip wrapper for DOCX/XLSX/PPTX/EPUB (zip-bomb protection). Needs a shared helper + routing through four crates.
  • paperjam-mcp path sandboxing. resolve_path accepts absolute paths and does no containment check.
  • Panic surfaces on untrusted PDFs. f64::partial_cmp().unwrap() in table detection (NaN → panic), get_object_mut().unwrap() / as_dict_mut().unwrap() in stamp/watermark/validation.
  • Rust test coverage. 13 of 15 crates have zero Rust tests.
  • rust-toolchain.toml + MSRV CI job.

These belong on dedicated branches so review stays scoped.

What was dropped from the audit

  • Zip dep unification in paperjam-docx: docx-rs 0.4 transitively pulls zip 0.6.6, so bumping our direct dep to 2.x would add a version rather than consolidate.
  • unreachable!() in encryption/mod.rs: the outer match is exhaustive on EncryptionAlgorithm; the inner arms are genuinely unreachable. Not a real panic risk.
  • Cargo.lock / uv.lock / CLAUDE.md in .gitignore: verified none are tracked. Entries work as intended.

Test plan

  • cargo check --workspace
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo fmt --all --check
  • uv run ruff check py_src/ + ruff format --check
  • uv run mypy py_src/
  • uv run pytest tests/python/ — 88 passed, 4 skipped
  • pre-commit run --all-files — all hooks pass

modify_form_field and add_form_field were PyO3-exposed but absent
from the type stubs, so callers never got static checking. fill_form
also gained a generate_appearances kwarg that was missing from its
stub signature.
CHANGELOG declares 0.2.0 shipped on 2026-04-04 but manifests still
pointed at 0.1.3. Aligning the workspace version so the next publish
matches the changelog.

Release profile picks up the usual size/perf wins (thin LTO, single
codegen unit, symbol strip) and keeps a release-with-debug variant
for profiling.
- bump version to 0.2.0
- describe multi-format scope (PDF/DOCX/XLSX/PPTX/HTML/EPUB)
- add readme, project urls, and Office/Business classifier so the
  PyPI page is not blank
- drop stale [docs] Sphinx extra: the site is Docusaurus
  (docs-site/), built via npm, not sphinx-build
- README: the installed binary is pj, not paperjam; the tables
  subcommand never had a --format csv flag (output format is a
  global text|json switch), so replace with a realistic invocation
- installation.md: replace leftover Sphinx build steps with the
  Docusaurus workflow, fix the wrong clone org, align the feature
  flag table with the actual crate features (ltv, validation,
  parallel, mmap)
python/paperjam/libpdfium.so pointed at a path that was renamed to
py_src/ long ago, and the Sphinx _build entry is from before the
docs moved to Docusaurus. Both were no-ops.
@github-actions github-actions Bot added documentation Improvements or additions to documentation rust Pull requests that update rust code python Pull requests that update Python code labels Apr 24, 2026
@pratyush618 pratyush618 merged commit 37d6657 into main Apr 24, 2026
13 checks passed
pratyush618 added a commit that referenced this pull request Apr 24, 2026
* chore: add rust-toolchain and justfile for consistent dev tooling

rust-toolchain.toml pins every contributor and CI invocation to the
same stable toolchain with rustfmt, clippy, and the
wasm32-unknown-unknown target. Previously CI used
dtolnay/rust-toolchain@stable while contributors installed their own;
minor version drift between them could produce clippy lint
discrepancies at merge time.

justfile captures the common build / test / lint commands documented
in CLAUDE.md as executable recipes. `just` (no args) prints the full
list, and the common flows (build, test, check, fmt, clean-all) are
one step each so local iteration matches the pre-commit chain.

* chore(async): stop force-enabling signatures/validation on core

paperjam-async currently only reaches into paperjam_core::render, yet
its manifest force-enabled the signatures and validation features on
paperjam-core for every consumer. Downstream crates that need those
features (paperjam-py does, explicitly) keep working unchanged;
lightweight async consumers no longer drag in the x509-parser / cms /
rsa / p256 / sha1 / pkcs8 / spki / ureq / rustls / roxmltree tree.

* docs: crate-level rustdoc across the workspace

Every library crate now has a `//!` summary describing its scope,
its entry points, and how it fits into the broader paperjam
ecosystem. Uniform style: plain prose, no intra-doc links in
crate-level summaries (simpler to maintain, no rustdoc link
warnings to manage).

Also fixes two pre-existing rustdoc warnings uncovered along the
way: an `[OPTIONAL]` literal in signature/tsa.rs that rustdoc was
parsing as an intra-doc link, and a bare URL in model/annotations.rs
flagged for auto-linking. The PyO3 `PyDocument` and `PyPage` classes
get class-level docs that clarify they are the native layer beneath
the pure-Python `paperjam.Document` / `paperjam.Page` wrappers.

After this commit `cargo doc --workspace --no-deps` produces zero
warnings.

* chore(ci): run docs workflow on PRs and install wasm-opt

The docs workflow previously fired only on pushes to main, so docs
regressions (broken wasm builds, Docusaurus compile errors, bad
links) were invisible until after merge. Now PRs with matching
paths run the full build (without deploying) so problems surface in
the PR check run.

Also installs binaryen, whose wasm-opt binary wasm-pack invokes
automatically when present on PATH. Release-mode WASM bundles
shrink by 20-30% with no code changes.

Concurrency group is keyed on ref so PR builds and deploy builds
don't cancel each other; the deploy job is skipped on pull_request
events to preserve production pages behaviour.

* docs(changelog): record [Unreleased] entries since 0.2.0

Document the audit-driven work that has landed on main but hasn't
been cut into a release yet: the ZIP-entry and MCP sandbox security
hardening (#69), the panic-surface cleanup in the PDF engine (#70),
the form-bindings stub sync and metadata / docs refresh (#68), plus
the tooling, docs, and paperjam-async feature adjustments from this
polish branch.

* fix(ci): install pinned binaryen release instead of apt binaryen

Ubuntu's apt-shipped binaryen is ~v108, which predates the default
enablement of bulk-memory and sign-extension instructions in rustc
output. The result is wasm-pack invoking /usr/bin/wasm-opt on a
valid modern wasm module and wasm-opt rejecting it with
"[wasm-validator error] Bulk memory operation (bulk memory is
disabled)" — observed on the PR #71 run.

Download and install a pinned binaryen release tarball from the
upstream GitHub releases page. version_119 is known-good against
the current rustc and supports all default features. Future bumps
change one env var.

* chore(ci): verify binaryen tarball checksum and cache across runs

Harden the binaryen install step that landed in the previous commit:

- SHA256-pin the downloaded tarball (value verified against a local
  download of version_119). Guards against upstream tampering or an
  accidental silent swap.
- Split the version-check into a dedicated Verify step so the log
  shows the installed wasm-opt version unambiguously.
- Wrap the install in actions/cache keyed on the pinned version so
  subsequent runs skip the download. Saves ~3-5s per run.

* fix(wasm): tell wasm-pack to enable bulk-memory and sign-ext in wasm-opt

rustc 1.82+ emits bulk-memory and sign-extension instructions in its
default wasm output. wasm-pack's baseline wasm-opt invocation ("-O")
does not pass --enable-bulk-memory / --enable-sign-ext, so even a
modern binaryen rejects the module with "Bulk memory operations
require bulk memory [--enable-bulk-memory]" during validation.

Configure the flags in paperjam-wasm's Cargo.toml metadata block so
wasm-pack invokes wasm-opt with the right feature set. This is what
was blocking CI #71 even after installing a modern binaryen.

* fix(wasm): extend wasm-opt feature set to the full rustc default list

Rust 1.87 / LLVM 20 enabled bulk-memory and nontrapping-fptoint in
the default wasm32-unknown-unknown feature set, alongside the
previously-defaulted multivalue, mutable-globals, reference-types,
and sign-ext. wasm-pack's baseline "-O" invocation of wasm-opt does
not pass any of them, so the optimiser rejects a perfectly valid
rustc-emitted module.

The previous commit only enabled bulk-memory and sign-ext, which
exposed a follow-on validator error on `i32.trunc_sat_f64_s`
(nontrapping-fptoint). Rather than re-play whack-a-mole for each
feature, pass the full list that matches the rustc default set
documented in the wasm32-unknown-unknown platform-support page.

Ref: https://doc.rust-lang.org/rustc/platform-support/wasm32-unknown-unknown.html
@pratyush618 pratyush618 mentioned this pull request Apr 24, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python Pull requests that update Python code rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant