🗜️ Parallel Huffman Compressor (Rust)

A multi-threaded file compressor and decompressor written in Rust. It implements Huffman encoding/decoding and uses Rayon to process multiple files in parallel.

Working

This project has two modes:

1) Encode (compress)

main.rs asks you to choose encode or decode.
In encode mode, it scans ./to_encode/ and processes every file in that folder in parallel.
For each file, encoder.rs:
- counts character frequencies,
- builds a Huffman tree (min-heap),
- generates an encoding table (symbol -> bitstring).
compress.rs writes the compressed output into ./encoded_output/:
- {original_filename}_encoded.bin

Binary format (what gets stored inside *_encoded.bin):

number of table entries (u32)
for each entry:
- key length (u8) + key bytes
- code length (u8) + code bytes
original symbol count (u64)
encoded bitstream (packed into bytes)

There’s also a debug file (optional / for inspection):

{original_filename}_debug.bin

2) Decode (decompress)

In decode mode, it scans ./to_decode/ and processes every file in that folder in parallel.
decoder.rs reads the binary format and reconstructs the Huffman table (bitstring -> symbol).
expand.rs reads:
- the original symbol count
- the packed bitstream then walks the bits until the symbol count is reached.
Output is written to ./decoded_output/:
- {input_filename}_decoded.txt

File Structure

Root

Cargo.toml / Cargo.lock: Rust package metadata + dependencies
config.toml: folder config (currently not wired into the main encode/decode flow)
README.md: project description (this file)
LICENSE: license info

Input / Output folders

to_encode/
- put the text files you want to compress here
encoded_output/
- output: *_encoded.bin (compressed data)
- output: *_debug.bin (debug info)
to_decode/
- put the *_encoded.bin files you want to decompress here
decoded_output/
- output: *_decoded.txt

Source code (`src/`)

main.rs: CLI entry point (encode/decode + parallel file processing)
path_read.rs: lists all files in a folder
thread_pool.rs: Rayon thread pool + parallel processing helper
file_io.rs: reads input files (text/binary)
encoder.rs: builds frequency map, Huffman tree, and encoding table
compress.rs: writes Huffman table + encoded content into the binary format
decoder.rs: reads the binary format and reconstructs the decoding table
expand.rs: expands the encoded bitstream back into the original text
sturcture.rs: Huffman tree Node + heap ordering (name is a typo but works)

Usage

0) Build (download deps)

Rust/Cargo will download everything automatically the first time you build.

For a quick build:

cargo build

1) Quick run (trial)

Run directly with Cargo:

cargo run

It’ll ask you to type encode or decode.

2) Encode (compress)

Put files in: to_encode/
Run the program (cargo run or the release binary).
When prompted, type encode.
Outputs go to: encoded_output/

3) Decode (decompress)

Copy/move the *_encoded.bin files from encoded_output/ into: to_decode/
Run the program (cargo run or the release binary).
When prompted, type decode.
Outputs go to: decoded_output/

4) Release build (final)

If you want the fast/optimized build:

cargo build --release

Then run:

./target/release/huffman-encoding

Stats

Encoding

Input text size: ~800 mb (100 mb * 8 files)
Encoded binary size: ~600 mb
Compression ratio: ~25%
Encoding time: ~7.8s (almost no load + 8 threads for 8 files)

Decoding

Using the previous compressed data for decompression
Time taken: ~22.3s (same favourable conditions as above)

Small note: this project can feel fast mainly because it’s a simple Huffman implementation and it currently focuses on text files. But the compression ratio is also not as good as mature tools like gzip/zstd — that’s the tradeoff.

This project was built mainly to learn Rust, so it’s not meant to compete with tools like gzip or zlib. It hasn’t been optimized for speed or compression ratio — it’s just a Huffman coding implementation in Rust.

There’s still a lot of optimization potential (both algorithmic and code-level). One of the main bottlenecks right now is file I/O:

the encoder currently reads the whole file into memory,
and the compressed writer ends up reading the input again when producing the final binary output.

Feel free to explore, modify, and experiment with the code!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🗜️ Parallel Huffman Compressor (Rust)

Working

1) Encode (compress)

2) Decode (decompress)

File Structure

Root

Input / Output folders

Source code (`src/`)

Usage

0) Build (download deps)

1) Quick run (trial)

2) Encode (compress)

3) Decode (decompress)

4) Release build (final)

Stats

Encoding

Decoding

About

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
config.toml		config.toml

License

Soumyadip-gole/Multi-threaded-Huffman

Folders and files

Latest commit

History

Repository files navigation

🗜️ Parallel Huffman Compressor (Rust)

Working

1) Encode (compress)

2) Decode (decompress)

File Structure

Root

Input / Output folders

Source code (src/)

Usage

0) Build (download deps)

1) Quick run (trial)

2) Encode (compress)

3) Decode (decompress)

4) Release build (final)

Stats

Encoding

Decoding

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages

Source code (`src/`)