Skip to content

lzah: StuffIt classic method-5 (LZAH) decoder, validated against real archives#71

Merged
MagicalTux merged 2 commits into
masterfrom
lzah-method5-2026-05-30
May 30, 2026
Merged

lzah: StuffIt classic method-5 (LZAH) decoder, validated against real archives#71
MagicalTux merged 2 commits into
masterfrom
lzah-method5-2026-05-30

Conversation

@MagicalTux
Copy link
Copy Markdown
Member

Summary

A pure-Rust, no_std, decode-only codec for StuffIt classic compression method 5 ("LZAH") — LZSS (4 KiB sliding window, MSB-first) + a single 314-symbol adaptive (sibling-property) Huffman tree, with a static canonical offset-high prefix code and the fixed window pre-seed.

Built clean-room from a facts-only functional spec (Chinese-wall: the analysis team read the LGPL reference and produced a facts-only spec + black-box fixtures; this implementation was written from the spec alone — no reference source was read).

Interface

Raw fork payload + uncompressed size out of band via DecoderConfig::with_len(n) (StuffIt stores the size in the container header; the LZAH stream has no in-band end marker). No invented framing. None length on a non-empty stream returns Error::Unsupported. Decode-only (no StuffIt encoder exists); the encoder stub returns Unsupported.

Validation — real StuffIt archives

Unlike a self-round-trip, this is validated against genuine .sit files: a minimal SIT! container walker in the test extracts each method-5 fork, decodes it with the header's uncompressed length, and checks the decoded bytes against the stored per-fork CRC-16 (poly 0xA001).

  • 17 method-5 forks across all 5 real fixtures pass (incl. the large fixture that exercises the adaptive-tree rebuild at root frequency 0x8000).
  • One small fixture (tests/fixtures/lzah/convert-archive-fix.sit, 3 forks) is bundled and CRC-validated in CI.
  • Plus unit tests: empty fork, None-length-on-non-empty → Unsupported, truncation → clean error.

DoS hygiene

#![forbid(unsafe_code)], zero-dep; no panics reachable from input; bounds-checked window/offset/tree-walk; output bounded by expected_len; truncation/invalid walks → UnexpectedEnd/Corrupt.

Notes

  • Distinct from method 13 ("LZ+Huffman") — branch on the numeric method byte, never the display string.
  • Supersedes the earlier held/lzah prototype (a guess with no real spec); that branch can be dropped.

Verification

cargo fmt --check clean · clippy --all-targets --all-features -- -D warnings → 0 · docs build clean · all features build · lzah tests (6) pass.

🤖 Generated with Claude Code

MagicalTux and others added 2 commits May 30, 2026 23:26
Implement a decode-only codec for StuffIt classic compression method 5
(low nibble 5), per the clean-room functional format spec:

- 4096-byte pre-seeded LZSS sliding window (fixed dictionary pattern).
- MSB-first bitstream.
- Single 314-symbol adaptive (sibling-property) Huffman tree for
  literal/length tokens, with rescale at root frequency 0x8000.
- Separate static 64-symbol canonical offset-high prefix code built from
  the length-count rule; 6 literal low bits; distance (h<<6)+low6+1.
- No in-band end-of-stream: length is out of band via
  DecoderConfig::with_len(n); a non-empty stream with no length returns
  Error::Unsupported. The encoder is a permanent Unsupported stub (no
  StuffIt encoder exists).

no_std, zero-dep, forbid(unsafe_code); bounds-checked window/offset,
checked termination at expected_len, truncation/invalid tree-walk map to
UnexpectedEnd/Corrupt with no panics on crafted input.

Registered under feature `lzah` (marker `Lzah`, name "lzah") in Cargo.toml
(plus the `all` meta-feature), lib.rs, and the factory.

Validation: tests/lzah.rs includes a minimal SIT! container walker and a
CRC-16 (poly 0xA001) check over decoded forks. The bundled
tests/fixtures/lzah/convert-archive-fix.sit (3 method-5 forks) is verified
in CI; the full staged fixture set (17 method-5 forks across 5 real
classic StuffIt archives, including one that triggers the adaptive-tree
rebuild) is verified during development. Plus unit tests for empty fork,
truncation, and None-length-on-non-empty.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MagicalTux MagicalTux merged commit 655fbd2 into master May 30, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant