fix(zip): shared SafeArchive with four-axis decompression-bomb caps#72
Merged
pratyush618 merged 2 commits intomainfrom Apr 24, 2026
Merged
fix(zip): shared SafeArchive with four-axis decompression-bomb caps#72pratyush618 merged 2 commits intomainfrom
pratyush618 merged 2 commits intomainfrom
Conversation
Introduce a shared hardened ZIP reader in paperjam-model, gated behind the optional `zip_safety` feature so the PDF engine and other non-ZIP consumers keep a zero-dependency paperjam-model. `SafeArchive` wraps a `zip::ZipArchive` and enforces four independent caps across the lifetime of a single archive scan: 1. per-entry decompressed size 2. aggregate decompressed-byte budget 3. entry count 4. compression ratio (declared / compressed) Each cap surfaces as a structured `ZipSafetyError` variant so callers can convert the failure into their own error type with `#[from]`. `ArchiveLimits::DEFAULT` is tuned for ordinary office / EPUB files (100 MB per entry, 500 MB total, 10k entries, 100x ratio); consumers can override any field. Seven tests cover the normal path, each cap, missing-entry handling, and non-UTF-8 decoding.
…model Replace the per-crate `safe_read` modules with the new `paperjam_model::zip_safety::SafeArchive`. Each format crate now: - enables the `zip_safety` feature on paperjam-model - deletes its local safe_read.rs (~170 lines each, identical logic) - threads a single SafeArchive through its parser, so the per-entry, total-bytes, entry-count, and compression-ratio caps apply across every archive read rather than just individual entries - exposes the archive-safety failures as a single `Archive(#[from] ZipSafetyError)` variant on EpubError / PptxError; the per-crate duplicates of EntryTooLarge / ArchiveTotalExceeded / ... are dropped Net effect: ~200 lines of duplicated security-critical code collapses into one implementation with seven tests in paperjam-model, plus the same bounded-read semantics now cover images, TOC entries, and slide notes that were previously capped only on per-entry size.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes the SafeZip work sketched during the security pass — the per-entry caps that landed in #69 become one of four caps enforced across every archive read, and the ~170-line implementation that was duplicated between
paperjam-epubandpaperjam-pptxis now a single shared module inpaperjam-model.What changed
paperjam-model::zip_safety(new, feature-gated)Opt-in via a
zip_safetyCargo feature so the PDF engine and other non-ZIP consumers keep a zero-dependencypaperjam-model. Surface:ArchiveLimitswith four independent caps and a sensibleDEFAULTSafeArchive<R>wrapper that tracks running totals for the scanZipSafetyErrorwith typed variants for each way a read can failFour caps enforced on every read:
max_entry_bytesmax_total_bytesmax_entriesmax_ratioSeven tests cover the happy path, each cap, missing-entry handling, and non-UTF-8 decoding.
paperjam-epub+paperjam-pptx(consolidate)safe_read.rsmodules (~170 lines each, identical)zip_safetyonpaperjam-modelSafeArchivethrough the parser so the caps apply across every read (images, TOC, slide notes) rather than per-entry onlyArchive(#[from] ZipSafetyError)— external error surface is#[error(transparent)]so callers see the underlying typed error directlyNet: ~200 lines of security-critical duplicated code collapses to one authoritative implementation.
Why now
Auditing for "clean, production-grade, maintainable" — the first-pass security fix worked but violated DRY in the most sensitive module. Any future tuning (adjusting defaults, adding caps, fixing a bug) would have to land in two places.
Test plan
cargo test --workspace— 4 xlsx reader + 5 mcp sandbox + 7 new zip_safety = 16 Rust tests, no regressionscargo clippy --workspace --all-targets -- -D warningscargo fmt --all --checkuv run pytest tests/python/— 88 passed, 4 skippedpre-commit run --all-files— every hook passesWhat's still outstanding from the audit
After this PR:
paperjam-epub→paperjam-html),paperjam-studioscope, Rust test strategy — all still need your direction.calamine/docx-rspin refresh.