lzah: StuffIt classic method-5 (LZAH) decoder, validated against real archives#71
Merged
Conversation
Implement a decode-only codec for StuffIt classic compression method 5 (low nibble 5), per the clean-room functional format spec: - 4096-byte pre-seeded LZSS sliding window (fixed dictionary pattern). - MSB-first bitstream. - Single 314-symbol adaptive (sibling-property) Huffman tree for literal/length tokens, with rescale at root frequency 0x8000. - Separate static 64-symbol canonical offset-high prefix code built from the length-count rule; 6 literal low bits; distance (h<<6)+low6+1. - No in-band end-of-stream: length is out of band via DecoderConfig::with_len(n); a non-empty stream with no length returns Error::Unsupported. The encoder is a permanent Unsupported stub (no StuffIt encoder exists). no_std, zero-dep, forbid(unsafe_code); bounds-checked window/offset, checked termination at expected_len, truncation/invalid tree-walk map to UnexpectedEnd/Corrupt with no panics on crafted input. Registered under feature `lzah` (marker `Lzah`, name "lzah") in Cargo.toml (plus the `all` meta-feature), lib.rs, and the factory. Validation: tests/lzah.rs includes a minimal SIT! container walker and a CRC-16 (poly 0xA001) check over decoded forks. The bundled tests/fixtures/lzah/convert-archive-fix.sit (3 method-5 forks) is verified in CI; the full staged fixture set (17 method-5 forks across 5 real classic StuffIt archives, including one that triggers the adaptive-tree rebuild) is verified during development. Plus unit tests for empty fork, truncation, and None-length-on-non-empty. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A pure-Rust,
no_std, decode-only codec for StuffIt classic compression method 5 ("LZAH") — LZSS (4 KiB sliding window, MSB-first) + a single 314-symbol adaptive (sibling-property) Huffman tree, with a static canonical offset-high prefix code and the fixed window pre-seed.Built clean-room from a facts-only functional spec (Chinese-wall: the analysis team read the LGPL reference and produced a facts-only spec + black-box fixtures; this implementation was written from the spec alone — no reference source was read).
Interface
Raw fork payload + uncompressed size out of band via
DecoderConfig::with_len(n)(StuffIt stores the size in the container header; the LZAH stream has no in-band end marker). No invented framing.Nonelength on a non-empty stream returnsError::Unsupported. Decode-only (no StuffIt encoder exists); the encoder stub returnsUnsupported.Validation — real StuffIt archives
Unlike a self-round-trip, this is validated against genuine
.sitfiles: a minimalSIT!container walker in the test extracts each method-5 fork, decodes it with the header's uncompressed length, and checks the decoded bytes against the stored per-fork CRC-16 (poly0xA001).0x8000).tests/fixtures/lzah/convert-archive-fix.sit, 3 forks) is bundled and CRC-validated in CI.None-length-on-non-empty →Unsupported, truncation → clean error.DoS hygiene
#![forbid(unsafe_code)], zero-dep; no panics reachable from input; bounds-checked window/offset/tree-walk; output bounded byexpected_len; truncation/invalid walks →UnexpectedEnd/Corrupt.Notes
held/lzahprototype (a guess with no real spec); that branch can be dropped.Verification
cargo fmt --checkclean ·clippy --all-targets --all-features -- -D warnings→ 0 · docs build clean · all features build · lzah tests (6) pass.🤖 Generated with Claude Code