A collection of compression algorithms in pure Rust.
compcol puts every supported algorithm — RLE, deflate, zlib, gzip,
LZMA, xz, Zstandard, Brotli, LZ4, Snappy, LZW, LZO, LZX, Quantum, plus
decoders for RAR 1/2/3/5 — behind one uniform streaming trait, with
each algorithm gated by its own Cargo feature so downstream crates
only pay for what they pull in. A runtime by-name factory makes
algorithms selectable from configuration or a CLI flag, and a
compcol binary turns the library into a Unix-style filter.
- Pure Rust. No
bindgen, no FFI, no C dependencies. The crate has zero runtime dependencies — nothing in[dependencies]. - 100% safe.
unsafe_code = "forbid"is set crate-wide; the library never opts out. no_std. The library is#![no_std].allocis used by everything except the bare-bonesrlealgorithm; algorithms that need large windows or work buffers pull inallocautomatically.- Streaming. The caller owns both buffers; the codec preserves its state across calls. Works in a 1-byte-on-both-sides streaming loop.
- Per-algorithm features.
default = ["alloc", "rle", "deflate", "zlib", "gzip", "factory"]. Everything else is opt-in. allmeta-feature.features = ["all"]is a single name that enables every algorithm — useful for downstream crates and the CLI install command instead of a 20-item feature list.
| Algorithm | Feature | Extension | Encoder | Decoder | Cross-validation |
|---|---|---|---|---|---|
| RLE | rle |
.rle |
full | full | — |
| Deflate (RFC 1951) | deflate |
.deflate |
full (lazy LZ77 + dynamic / fixed / stored Huffman; cross-block matching) | full | python3 -c "import zlib" |
| Zlib (RFC 1950) | zlib |
.zz |
full | full | python3 -c "import zlib" |
| Gzip (RFC 1952) | gzip |
.gz |
full | full | gzip(1) |
| LZ4 block format | lz4 |
.lz4 |
LZ77 hash matcher | full | — |
| Snappy | snappy |
.sz |
LZ77 hash matcher (raw block format) | full | — |
LZW (compress(1) .Z) |
lzw |
.lzw |
full | full | compress(1) / uncompress(1) |
LZMA (legacy .lzma) |
lzma |
.lzma |
full | full | python3 -m lzma (FORMAT_ALONE) |
| xz | xz |
.xz |
compressed-LZMA2 chunks + uncompressed fallback | full envelope + all reset variants | xz(1) both directions |
| Zstandard (RFC 8478) | zstd |
.zst |
LZ77 + Huffman literals + FSE_Compressed_Mode sequences + repeat offsets + RLE blocks | full Compressed_Block | zstd(1) both directions |
| Brotli (RFC 7932) | brotli |
.br |
LZ77 + length-limited Huffman + 704-symbol IC alphabet + static-dictionary refs | full (with 122 KiB static dictionary) | brotli(1) both directions |
| LZO (LZO1X-1) | lzo |
.lzo |
LZ77 hash matcher | full | python3 -c "import lzo" |
| LZX (Microsoft CAB / WIM) | lzx |
.lzx |
uncompressed blocks only | full (verbatim + aligned-offset + uncompressed; E8 filter) | — |
| Quantum (Stac, old CAB) | quantum |
.q |
Unsupported (no public encoder exists) |
full (libmspack-equivalent) | libmspack regression fixtures |
| LZFSE (Apple) | lzfse |
.lzfse |
Unsupported (decoder-only) |
bvx- raw + bvxn (LZVN); bvx2 returns Unsupported |
hand-built fixtures (no Apple toolchain bundled) |
| ADC (Apple DMG) | adc |
.adc |
LZSS-style greedy match-finder | full | hand-built fixtures |
| RAR 1.x | rar1 |
.rar |
Unsupported (license) |
building blocks only (Huffman tables not license-clean) | — |
| RAR 2.x | rar2 |
.rar |
Unsupported (license) |
full LZ77+Huffman + audio predictor | real rar-2.60 fixtures |
| RAR 3.x | rar3 |
.rar |
Unsupported (license) |
full LZ77+Huffman + E8 filter; PPMd & VM filters refused | libarchive RAR3 fixtures |
| RAR 5.x | rar5 |
.rar |
Unsupported (license) |
full LZ77+Huffman + x86 filter; Delta/ARM refused | RARLAB-CLI fixtures |
The RAR encoders are permanently Unsupported per RARLAB's unRAR
license terms (every clean-room RAR reader — libarchive, The
Unarchiver, 7-Zip — ships decoder-only for the same reason).
Every other algorithm decodes real-world output from its reference toolchain and produces output that the same reference toolchain accepts. Some encoders (zstd, brotli) lag the reference's compression ratio because they skip features like FSE-compressed Huffman weight tables (zstd) or encoder-side static-dictionary lookups for non-English text (brotli); the wire format is always conformant.
# Cargo.toml
[dependencies]
compcol = { version = "0.1", features = ["gzip", "factory"] }use compcol::{Algorithm, Encoder, Decoder, Progress, Error};
pub struct Progress {
pub consumed: usize, // bytes read from input
pub written: usize, // bytes written to output
pub done: bool, // true once finish() has fully drained
}
pub trait Encoder {
fn encode(&mut self, input: &[u8], output: &mut [u8]) -> Result<Progress, Error>;
fn finish(&mut self, output: &mut [u8]) -> Result<Progress, Error>;
fn reset(&mut self);
}
pub trait Decoder {
fn decode(&mut self, input: &[u8], output: &mut [u8]) -> Result<Progress, Error>;
fn finish(&mut self, output: &mut [u8]) -> Result<Progress, Error>;
fn reset(&mut self);
/// Advance the decompressed stream by up to `n` bytes without
/// emitting them. Default impl reads-and-discards through a small
/// scratch buffer; algorithms can override for cheaper skipping.
fn skip(&mut self, input: &[u8], n: usize) -> Result<Progress, Error>;
}
pub trait Algorithm {
const NAME: &'static str;
type Encoder: Encoder;
type Decoder: Decoder;
fn encoder() -> Self::Encoder;
fn decoder() -> Self::Decoder;
}For callers that already have the whole payload in memory:
use compcol::gzip::Gzip;
use compcol::vec::{compress_to_vec, decompress_to_vec, compress_to_vec_with};
let plain = b"hello world hello world hello world";
let compressed = compress_to_vec::<Gzip>(plain)?;
let decoded = decompress_to_vec::<Gzip>(&compressed)?;
assert_eq!(decoded, plain);
// With explicit config:
let small = compress_to_vec_with::<Gzip>(
plain, compcol::gzip::EncoderConfig { level: 9 },
)?;
# Ok::<(), compcol::Error>(())compress_to_vec_with / decompress_to_vec_with accept the
algorithm's EncoderConfig / DecoderConfig for tuning (level,
quality, etc.). Available under the alloc feature — no std
required.
For files, sockets, or any Read/Write source. All four
directions are covered; pick by which side you control and which
direction the bytes flow.
use std::io::{Read, Write};
use compcol::{Algorithm, gzip::Gzip};
use compcol::io::{EncoderWriter, DecoderReader};
// Write plaintext, get a compressed file.
let file = std::fs::File::create("hello.txt.gz")?;
let mut w = EncoderWriter::new(file, Gzip::encoder());
w.write_all(b"hello, gzip\n")?;
let _file = w.finish()?; // returns the inner File
// Read a compressed file as if it were plain text.
let file = std::fs::File::open("hello.txt.gz")?;
let mut r = DecoderReader::new(file, Gzip::decoder());
let mut decoded = String::new();
r.read_to_string(&mut decoded)?;
# Ok::<(), std::io::Error>(())EncoderReader (compressed source out of a plain reader) and
DecoderWriter (plain output out of a compressed writer) round out
the set. Writers call finish on Drop best-effort — call
finish() explicitly to catch errors. Requires the std feature.
use compcol::gzip::{Encoder, Decoder};
use compcol::{Encoder as _, Decoder as _, Status};
let input = b"hello world hello world hello world";
// Encode.
let mut enc = Encoder::new();
let mut buf = [0u8; 256];
let mut encoded = Vec::new();
let mut consumed = 0;
while consumed < input.len() {
let (p, status) = enc.encode(&input[consumed..], &mut buf).unwrap();
encoded.extend_from_slice(&buf[..p.written]);
consumed += p.consumed;
if matches!(status, Status::InputEmpty) { break; }
}
loop {
let (p, status) = enc.finish(&mut buf).unwrap();
encoded.extend_from_slice(&buf[..p.written]);
if matches!(status, Status::StreamEnd) { break; }
}
// Decode.
let mut dec = Decoder::new();
let mut decoded = Vec::new();
let mut c2 = 0;
while c2 < encoded.len() {
let (p, status) = dec.decode(&encoded[c2..], &mut buf).unwrap();
decoded.extend_from_slice(&buf[..p.written]);
c2 += p.consumed;
if matches!(status, Status::StreamEnd | Status::InputEmpty) { break; }
}
loop {
let (p, status) = dec.finish(&mut buf).unwrap();
decoded.extend_from_slice(&buf[..p.written]);
if matches!(status, Status::StreamEnd) { break; }
}
assert_eq!(decoded, input);use compcol::{factory, Encoder as _, Decoder as _};
let mut enc = factory::encoder_by_name("gzip")
.expect("gzip not compiled in");
let mut out = [0u8; 1024];
let p = enc.encode(b"hello", &mut out).unwrap();
// ...
println!("available algorithms: {:?}", factory::names());factory::extension(name) returns the conventional file extension for
each algorithm (e.g. "gz" for gzip, "zst" for zstd).
Useful for tar-style archive browsing — read a header, skip past the file body, read the next header:
use compcol::gzip::Decoder;
use compcol::Decoder as _;
let mut dec = Decoder::new();
// Skip past the first 100 decompressed bytes…
let p = dec.skip(&compressed[..], 100).unwrap();
// …then decode the next 50:
let mut out = [0u8; 50];
let p = dec.decode(&compressed[p.consumed..], &mut out).unwrap();The default skip implementation just reads-and-discards through a
small scratch buffer, so it works for every algorithm. Individual
decoders are free to override with a smarter implementation when the
format allows it (e.g. fast-forwarding through stored deflate blocks
without LZ77 expansion).
The compcol binary ships with the crate. Install with:
cargo install --path . --features all…or pick a subset:
cargo install --path . --features "gzip,zstd,brotli,lz4,factory"Usage: compcol -t ALGO [OPTIONS] [INPUT]
Required:
-t, --type ALGO Algorithm (use --list to see what's compiled in)
Mode:
-d, --decompress Decompress instead of compress
Output (mutually exclusive):
-c, --stdout Write to stdout, keep input file
-o, --output PATH Write to PATH
(default, INPUT given) Write to <INPUT>.<ext> on compress, or strip
<ext> on decompress; remove INPUT on success
(default, no INPUT) Read stdin, write stdout
Misc:
-k, --keep Keep input file even in in-place mode
-f, --force Overwrite an existing output file
-L, --list List available algorithms and exit
-V, --version Print version and exit
-h, --help Print this help and exit
# Pipe-style use (gzip via stdin → stdout)
cat README.md | compcol -t gzip > README.md.gz
# In-place compression (mirrors gzip(1) semantics: removes the original)
compcol -t gzip README.md # → README.md.gz, removes README.md
# Keep the original
compcol -t gzip -k README.md # → README.md.gz, keeps README.md
# Decompress
compcol -t gzip -d README.md.gz # → README.md, removes README.md.gz
# Force overwrite of an existing output file
compcol -t gzip -f README.md
# Round-trip into a pager
compcol -t xz -d archive.xz -c | less
# Mix algorithms
compcol -t zstd payload.bin # → payload.bin.zst
compcol -t brotli payload.bin # → payload.bin.br
# List what's compiled in
compcol --listExit codes: 0 success, 1 runtime / I/O error, 2 usage / argument
error.
[features]
default = ["alloc", "rle", "deflate", "zlib", "gzip", "factory"]
# Meta-feature: pulls in every algorithm. Equivalent to `--all-features`.
all = ["alloc", "factory",
"rle", "deflate", "zlib", "gzip",
"lzma", "xz",
"zstd", "brotli", "lz4", "snappy", "lzw",
"lzo", "lzx", "quantum", "lzfse", "adc",
"rar1", "rar2", "rar3", "rar5"]
alloc = []
std = ["alloc"] # std::io::{Read,Write} adapters in compcol::io
factory = ["alloc"] # by-name lookup, returns Box<dyn …>
rle = [] # no_std clean (alloc not required)
deflate = ["alloc"]
zlib = ["deflate"]
gzip = ["deflate"]
lzma = ["alloc"]
xz = ["lzma"]
zstd = ["alloc"]
brotli = ["alloc"]
lz4 = ["alloc"]
snappy = ["alloc"]
lzw = ["alloc"]
lzo = ["alloc"]
lzx = ["alloc"]
quantum = ["alloc"]
lzfse = ["alloc"] # decoder-only, bvx2 returns Unsupported
adc = ["alloc"]
rar1 = ["alloc"]
rar2 = ["alloc"]
rar3 = ["alloc"]
rar5 = ["alloc"]A bare --no-default-features build produces a library with just the
trait surface — useful for the most constrained embedded targets.
Adding rle gives an algorithm that doesn't need alloc. Adding any
other algorithm feature pulls in alloc and the codec.
The alloc feature also enables compcol::vec (one-shot
compress_to_vec / decompress_to_vec helpers). The std feature
adds compcol::io (the Read/Write adapters) plus
From<Error> for std::io::Error so adapter code can use ?
freely.
features = ["all"] enables every algorithm and is the most ergonomic
choice when you don't know in advance which formats you'll see.
The compcol binary is gated on features = ["factory"] so a
--no-default-features library build doesn't try to compile it.
compcol::Error is a single crate-wide enum so trait objects work
without GATs:
pub enum Error {
Corrupt, // generic malformed input
UnexpectedEnd, // finish() called mid-stream
OutputTooSmall, // codec has a minimum atomic output size
BadHeader, // container header malformed
InvalidBlockType, // deflate BTYPE=3, etc.
InvalidHuffmanTree, // code lengths violate Kraft inequality
InvalidDistance, // LZ77 back-reference out of range
ChecksumMismatch, // Adler-32 / CRC-32 mismatch
TrailerMismatch, // gzip ISIZE doesn't match output length
Unsupported, // option / mode this build doesn't implement
}cargo build # builds lib + bin (default features)
cargo build --no-default-features # bare no_std lib
cargo build --no-default-features --features rle # narrowest alloc-free build
cargo build --no-default-features --features all # every algorithm, still no_std
cargo test --all-features # full test suite
cargo clippy --all-features --all-targets -- -D warnings # lint clean
cargo fmt --all --check # format cleanThe crate currently ships with ~566 tests across 23 test binaries,
including round-trip tests for every algorithm with an encoder,
cross-validation against system gzip / xz / zstd / brotli /
compress / lz4 / python3 lzo / python3 lzma, and hand-crafted
hex fixtures for every decoder-only format (RAR 2/3/5, Quantum, LZX).
A simple benchmark harness lives at examples/bench.rs. Run it with:
cargo run --release --features all --example benchIt measures each compiled-in algorithm's encoder/decoder throughput
and compression ratio on a small fixed corpus and compares against
the system reference when one is installed. A snapshot of the output
is kept in BENCH.md.
MIT. © 2026 Karpeles Lab Inc.