Skip to content

feat: partial extraction reporting and --atomic flag (#89)#105

Merged
bug-ops merged 2 commits intomainfrom
enhancement-extraction-is-not
Mar 15, 2026
Merged

feat: partial extraction reporting and --atomic flag (#89)#105
bug-ops merged 2 commits intomainfrom
enhancement-extraction-is-not

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 15, 2026

Summary

  • Add ExtractionError::PartialExtraction { source, report } variant that wraps the original error with an ExtractionReport snapshot captured at point of failure. TAR, ZIP, and 7z extractors now wrap mid-extraction errors in this variant when at least one file was written before the error.
  • Add --atomic CLI flag: extracts to a temporary directory in the same parent as the destination, renames on success, and uses TempDir RAII drop for cleanup on failure.
  • JSON error output includes a partial_report field; human-readable output shows a warning with the count of files written before the error.

Changes

  • ExtractionOptions { atomic: bool } + extract_archive_with_options() / extract_archive_full() public API
  • Move tempfile from dev-dependencies to dependencies in exarch-core
  • Remove const from is_security_violation() and is_recoverable() to allow delegation through Box<ExtractionError>
  • Python and Node binding error converters updated with exhaustive PartialExtraction match arms

Known follow-ups (non-blocking)

  • 7z extractor reports 0 items written in PartialExtraction (callback API limitation) — filed separately
  • Binding error converters discard the partial report; bindings will surface it when ExtractionOptions is exposed — deferred to binding PR
  • --force + --atomic: existing destination is removed before extraction starts (documented in help text); true post-extraction removal deferred

Test plan

  • cargo +nightly fmt --all -- --check passes
  • cargo clippy --all-targets --all-features --workspace -- -D warnings passes (0 warnings)
  • cargo nextest run --workspace --all-features --exclude exarch-python --exclude exarch-node --lib --bins — 542 passed, 0 failed
  • cargo deny check clean
  • Manual smoke test: exarch extract --max-file-size 500 two-files.zip /tmp/out/ — warning appears with file count
  • Manual smoke test: exarch extract --atomic archive.tar.gz /tmp/out/ — temp dir cleaned on failure

Closes #89

@github-actions github-actions bot added core Changes to exarch-core docs Documentation python Python bindings node Node.js bindings dependencies Dependency updates labels Mar 15, 2026
@bug-ops bug-ops force-pushed the enhancement-extraction-is-not branch from 5c218f1 to 71467f3 Compare March 15, 2026 01:57
@bug-ops bug-ops enabled auto-merge (squash) March 15, 2026 01:57
Copy link
Copy Markdown
Owner Author

@bug-ops bug-ops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing test: test_per_entry_encrypted_check_catches_missed_by_sampling

The test at crates/exarch-core/src/formats/zip.rs:1544 fails on all platforms because PR #105 introduces PartialExtraction wrapping for mid-extraction errors.

Root cause: With a 400-entry archive and the encrypted entry at index 125, extraction writes 125 plain files before hitting the encrypted entry. The new PartialExtraction variant wraps the error:

PartialExtraction {
    source: SecurityViolation { reason: "archive is password-protected..." },
    report: ExtractionReport { files_extracted: 125, ... }
}

The test asserts matches!(err, ExtractionError::SecurityViolation { .. }) — which no longer matches because the error is now wrapped.

Fix: Update the assertion to accept both forms:

assert!(
    matches!(
        &err,
        ExtractionError::SecurityViolation { .. }
            | ExtractionError::PartialExtraction { source, .. }
            if matches!(source.as_ref(), ExtractionError::SecurityViolation { .. })
    ),
    "per-entry check must catch encrypted entry missed by sampling, got: {err:?}"
);

Or alternatively, unwrap the inner source before asserting:

let root = match &err {
    ExtractionError::PartialExtraction { source, .. } => source.as_ref(),
    other => other,
};
assert!(
    matches!(root, ExtractionError::SecurityViolation { .. }),
    "per-entry check must catch encrypted entry missed by sampling, got: {err:?}"
);

This is the only failing test. Everything else on the PR passes.

@bug-ops
Copy link
Copy Markdown
Owner Author

bug-ops commented Mar 15, 2026

Failing test: test_per_entry_encrypted_check_catches_missed_by_sampling

The test at crates/exarch-core/src/formats/zip.rs:1544 fails on all platforms because PR #105 introduces PartialExtraction wrapping for mid-extraction errors.

Root cause: With a 400-entry archive and the encrypted entry at index 125, extraction writes 125 plain files before hitting the encrypted entry. The new PartialExtraction variant now wraps the error:

PartialExtraction {
    source: SecurityViolation { reason: "archive is password-protected..." },
    report: ExtractionReport { files_extracted: 125, ... }
}

The test asserts matches!(err, ExtractionError::SecurityViolation { .. }) — which no longer matches.

Fix: Unwrap the inner source before asserting:

let root = match &err {
    ExtractionError::PartialExtraction { source, .. } => source.as_ref(),
    other => other,
};
assert!(
    matches!(root, ExtractionError::SecurityViolation { .. }),
    "per-entry check must catch encrypted entry missed by sampling, got: {err:?}"
);

This is the only failing test; everything else passes.

bug-ops added 2 commits March 15, 2026 03:08
Add ExtractionError::PartialExtraction variant that wraps the original
error together with an ExtractionReport snapshot captured at the point
of failure. All three extractors (TAR, ZIP, 7z) now wrap mid-extraction
errors in this variant when at least one file was written before the
error occurred.

The CLI displays a human-readable warning ("Extraction was stopped.
N items were written to disk before the error.") and the JSON error
output includes a partial_report field.

Add ExtractionOptions struct with an atomic flag, plus
extract_archive_with_options() and extract_archive_full() public API
functions. When --atomic is set the CLI extracts into a temporary
directory inside the same parent as the destination, renames on
success, and relies on TempDir RAII drop for cleanup on failure.

Move tempfile from dev-dependencies to dependencies in exarch-core.
Remove const qualifier from is_security_violation() and
is_recoverable() to allow delegation through Box<ExtractionError>.
Update Python and Node binding error converters with exhaustive
PartialExtraction match arms.

Closes #89
Restore 8 tests that were accidentally removed from zip.rs during
the PartialExtraction implementation: raw_zip_with_custom_entry helper,
crc32_ieee helper, test_unsupported_compression_method_rejected,
test_symlink_target_too_large, test_symlink_target_invalid_utf8,
create_large_archive_with_encrypted_entry helper,
test_password_protected_large_archive_{first,middle,last}_entry,
test_large_archive_no_encryption_passes_constructor, and
test_per_entry_encrypted_check_catches_missed_by_sampling.

Update test_per_entry_encrypted_check_catches_missed_by_sampling to
unwrap one level of PartialExtraction before asserting SecurityViolation:
entries 0..125 extract successfully before the encrypted entry is hit,
so the error is now wrapped in PartialExtraction by design.
@bug-ops bug-ops force-pushed the enhancement-extraction-is-not branch from 3d55e4a to 37143c4 Compare March 15, 2026 02:08
@bug-ops bug-ops merged commit 83a96fd into main Mar 15, 2026
24 checks passed
@bug-ops bug-ops deleted the enhancement-extraction-is-not branch March 15, 2026 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to exarch-core dependencies Dependency updates docs Documentation node Node.js bindings python Python bindings

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enhancement: extraction is not atomic — partial state left on disk when error occurs mid-archive

1 participant