Dynamic File Compression Utility is a Data Structures and Algorithms (DSA) project that implements Huffman Coding for lossless file compression and decompression.
The application reads a text file, calculates character frequencies, builds a Huffman Tree using a Min Heap, generates optimal binary codes, compresses the file, and reconstructs the original content during decompression.
This project demonstrates practical applications of Trees, Heaps, Greedy Algorithms, Hash Maps, and File Handling.
Large files consume storage space and increase data transfer time.
The objective of this project is to reduce file size using Huffman Coding while ensuring that the original file can be recovered without any loss of information.
- Lossless file compression
- Huffman Coding implementation
- File decompression
- Compression statistics generation
- Command Line Interface (CLI)
- Frequency table generation
- Huffman Tree construction
- Compression ratio calculation
- Modular project structure
- Python 3
- heapq
- collections
- argparse
- json
- os
Used to store character frequencies.
Example:
{
'a': 15,
'b': 10,
'c': 5
}Used to efficiently select the two nodes with the smallest frequency while building the Huffman Tree.
The Huffman Tree is implemented using a Binary Tree.
Huffman Coding is a Greedy Algorithm because it repeatedly selects the least frequent nodes to generate optimal prefix codes.
Used for traversing the Huffman Tree and generating binary codes.
Input File
|
v
Read File
|
v
Frequency Table
|
v
Min Heap
|
v
Huffman Tree
|
v
Code Generation
|
v
Compression
|
v
Compressed File
|
v
Decompression
|
v
Original File Recovery
Dynamic-File-Compression-Utility/
│
├── input_files/
│
├── compressed_files/
│
├── decompressed_files/
│
├── src/
│ ├── __init__.py
│ ├── huffman.py
│ ├── compressor.py
│ └── decompressor.py
│
├── outputs/
├── images/
├── docs/
│
├── README.md
├── requirements.txt
├── .gitignore
└── main.py
git clone https://github.com/your-username/Dynamic-File-Compression-Utility.gitcd Dynamic-File-Compression-Utilitypip install -r requirements.txtpython main.py compress input_files/sample.txtpython main.py decompress compressed_files/sample.txt.huffCompression Successful
Original Size : 233 bytes
Compressed Size : 954 bytes
Compression Ratio : 409.44 %
Decompression Successful
Output :
decompressed_files/sample.txt_decompressed.txt
This project stores Huffman codes as text strings of 0s and 1s.
Therefore, the compressed file may appear larger than the original file.
In production compression systems, encoded bits are packed into binary bytes, resulting in actual compression.
The purpose of this implementation is educational and focused on demonstrating DSA concepts.
Add screenshot here:
images/project_structure.png
Add screenshot here:
images/compression_output.png
Add screenshot here:
images/decompression_output.png
Add screenshot here:
images/github_repository.png
Time Complexity:
O(n)
Time Complexity:
O(n log n)
Time Complexity:
O(n log n)
Time Complexity:
O(n)
Time Complexity:
O(n)
Through this project, I learned:
- Huffman Coding
- File Compression Techniques
- Binary Trees
- Min Heap Implementation
- Greedy Algorithms
- Recursion
- File Handling in Python
- Command Line Interface Development
- Project Documentation
- GitHub Project Management
- Binary bit-level compression
- Huffman Tree visualization
- GUI interface using Tkinter
- Multi-file compression
- Compression benchmarking
- Unit testing
- Real-time compression statistics
- Support for binary files
This project implements Huffman Coding for lossless file compression. It calculates character frequencies, builds a Huffman Tree using a Min Heap, generates optimal binary codes, compresses the file, and later reconstructs the original file through decompression. The project demonstrates Hash Maps, Priority Queues, Binary Trees, Greedy Algorithms, and File Handling.
A Min Heap allows efficient retrieval of the two nodes with the smallest frequency, which is required while constructing the Huffman Tree.
Because at every step it selects the locally optimal choice, which is combining the two least frequent nodes.
Lossless compression allows the original data to be perfectly reconstructed after decompression.
Aryan Choughule
Engineering Student | DSA Enthusiast | Aspiring Software Developer
This project is developed for educational and learning purposes.