A multi-threaded file compressor and decompressor written in Rust. It implements Huffman encoding/decoding and uses Rayon to process multiple files in parallel.
This project has two modes:
main.rsasks you to choose encode or decode.- In encode mode, it scans
./to_encode/and processes every file in that folder in parallel. - For each file,
encoder.rs:- counts character frequencies,
- builds a Huffman tree (min-heap),
- generates an encoding table (symbol -> bitstring).
compress.rswrites the compressed output into./encoded_output/:{original_filename}_encoded.bin
Binary format (what gets stored inside *_encoded.bin):
- number of table entries (
u32) - for each entry:
- key length (
u8) + key bytes - code length (
u8) + code bytes
- key length (
- original symbol count (
u64) - encoded bitstream (packed into bytes)
There’s also a debug file (optional / for inspection):
{original_filename}_debug.bin
- In decode mode, it scans
./to_decode/and processes every file in that folder in parallel. decoder.rsreads the binary format and reconstructs the Huffman table (bitstring -> symbol).expand.rsreads:- the original symbol count
- the packed bitstream then walks the bits until the symbol count is reached.
- Output is written to
./decoded_output/:{input_filename}_decoded.txt
Cargo.toml/Cargo.lock: Rust package metadata + dependenciesconfig.toml: folder config (currently not wired into the main encode/decode flow)README.md: project description (this file)LICENSE: license info
to_encode/- put the text files you want to compress here
encoded_output/- output:
*_encoded.bin(compressed data) - output:
*_debug.bin(debug info)
- output:
to_decode/- put the
*_encoded.binfiles you want to decompress here
- put the
decoded_output/- output:
*_decoded.txt
- output:
main.rs: CLI entry point (encode/decode + parallel file processing)path_read.rs: lists all files in a folderthread_pool.rs: Rayon thread pool + parallel processing helperfile_io.rs: reads input files (text/binary)encoder.rs: builds frequency map, Huffman tree, and encoding tablecompress.rs: writes Huffman table + encoded content into the binary formatdecoder.rs: reads the binary format and reconstructs the decoding tableexpand.rs: expands the encoded bitstream back into the original textsturcture.rs: Huffman treeNode+ heap ordering (name is a typo but works)
Rust/Cargo will download everything automatically the first time you build.
For a quick build:
cargo buildRun directly with Cargo:
cargo runIt’ll ask you to type encode or decode.
- Put files in:
to_encode/ - Run the program (
cargo runor the release binary). - When prompted, type
encode. - Outputs go to:
encoded_output/
- Copy/move the
*_encoded.binfiles fromencoded_output/into:to_decode/ - Run the program (
cargo runor the release binary). - When prompted, type
decode. - Outputs go to:
decoded_output/
If you want the fast/optimized build:
cargo build --releaseThen run:
./target/release/huffman-encoding- Input text size: ~800 mb (100 mb * 8 files)
- Encoded binary size: ~600 mb
- Compression ratio: ~25%
- Encoding time: ~7.8s (almost no load + 8 threads for 8 files)
- Using the previous compressed data for decompression
- Time taken: ~22.3s (same favourable conditions as above)
Small note: this project can feel fast mainly because it’s a simple Huffman implementation and it currently focuses on text files. But the compression ratio is also not as good as mature tools like gzip/zstd — that’s the tradeoff.
This project was built mainly to learn Rust, so it’s not meant to compete with tools like gzip or zlib. It hasn’t been optimized for speed or compression ratio — it’s just a Huffman coding implementation in Rust.
There’s still a lot of optimization potential (both algorithmic and code-level). One of the main bottlenecks right now is file I/O:
- the encoder currently reads the whole file into memory,
- and the compressed writer ends up reading the input again when producing the final binary output.
Feel free to explore, modify, and experiment with the code!