Dynamic File Compression Utility

Overview

Dynamic File Compression Utility is a Data Structures and Algorithms (DSA) project that implements Huffman Coding for lossless file compression and decompression.

The application reads a text file, calculates character frequencies, builds a Huffman Tree using a Min Heap, generates optimal binary codes, compresses the file, and reconstructs the original content during decompression.

This project demonstrates practical applications of Trees, Heaps, Greedy Algorithms, Hash Maps, and File Handling.

Problem Statement

Large files consume storage space and increase data transfer time.

The objective of this project is to reduce file size using Huffman Coding while ensuring that the original file can be recovered without any loss of information.

Features

Lossless file compression
Huffman Coding implementation
File decompression
Compression statistics generation
Command Line Interface (CLI)
Frequency table generation
Huffman Tree construction
Compression ratio calculation
Modular project structure

Technologies Used

Language

Python 3

Libraries

heapq
collections
argparse
json
os

Data Structures and Algorithms Used

Hash Map (Dictionary)

Used to store character frequencies.

Example:

{
    'a': 15,
    'b': 10,
    'c': 5
}

Min Heap (Priority Queue)

Used to efficiently select the two nodes with the smallest frequency while building the Huffman Tree.

Binary Tree

The Huffman Tree is implemented using a Binary Tree.

Greedy Algorithm

Huffman Coding is a Greedy Algorithm because it repeatedly selects the least frequent nodes to generate optimal prefix codes.

Recursion

Used for traversing the Huffman Tree and generating binary codes.

Project Architecture

Input File
     |
     v
Read File
     |
     v
Frequency Table
     |
     v
Min Heap
     |
     v
Huffman Tree
     |
     v
Code Generation
     |
     v
Compression
     |
     v
Compressed File
     |
     v
Decompression
     |
     v
Original File Recovery

Folder Structure

Dynamic-File-Compression-Utility/
│
├── input_files/
│
├── compressed_files/
│
├── decompressed_files/
│
├── src/
│   ├── __init__.py
│   ├── huffman.py
│   ├── compressor.py
│   └── decompressor.py
│
├── outputs/
├── images/
├── docs/
│
├── README.md
├── requirements.txt
├── .gitignore
└── main.py

Installation

Clone Repository

git clone https://github.com/your-username/Dynamic-File-Compression-Utility.git

Navigate to Project

cd Dynamic-File-Compression-Utility

Install Requirements

pip install -r requirements.txt

How to Run

Compress a File

python main.py compress input_files/sample.txt

Decompress a File

python main.py decompress compressed_files/sample.txt.huff

Sample Output

Compression

Compression Successful

Original Size : 233 bytes
Compressed Size : 954 bytes
Compression Ratio : 409.44 %

Decompression

Decompression Successful

Output :
decompressed_files/sample.txt_decompressed.txt

Explanation of Compression Ratio

This project stores Huffman codes as text strings of 0s and 1s.

Therefore, the compressed file may appear larger than the original file.

In production compression systems, encoded bits are packed into binary bytes, resulting in actual compression.

The purpose of this implementation is educational and focused on demonstrating DSA concepts.

Screenshots

Project Structure

Add screenshot here:

images/project_structure.png

Compression Output

Add screenshot here:

images/compression_output.png

Decompression Output

Add screenshot here:

images/decompression_output.png

GitHub Repository

Add screenshot here:

images/github_repository.png

Complexity Analysis

Frequency Table

Time Complexity:

O(n)

Heap Construction

Time Complexity:

O(n log n)

Huffman Tree Construction

Time Complexity:

O(n log n)

Encoding

Time Complexity:

O(n)

Decoding

Time Complexity:

O(n)

Learning Outcomes

Through this project, I learned:

Huffman Coding
File Compression Techniques
Binary Trees
Min Heap Implementation
Greedy Algorithms
Recursion
File Handling in Python
Command Line Interface Development
Project Documentation
GitHub Project Management

Future Enhancements

Binary bit-level compression
Huffman Tree visualization
GUI interface using Tkinter
Multi-file compression
Compression benchmarking
Unit testing
Real-time compression statistics
Support for binary files

Interview Questions

Explain your project.

This project implements Huffman Coding for lossless file compression. It calculates character frequencies, builds a Huffman Tree using a Min Heap, generates optimal binary codes, compresses the file, and later reconstructs the original file through decompression. The project demonstrates Hash Maps, Priority Queues, Binary Trees, Greedy Algorithms, and File Handling.

Why did you use a Min Heap?

A Min Heap allows efficient retrieval of the two nodes with the smallest frequency, which is required while constructing the Huffman Tree.

Why is Huffman Coding called a Greedy Algorithm?

Because at every step it selects the locally optimal choice, which is combining the two least frequent nodes.

What is lossless compression?

Lossless compression allows the original data to be perfectly reconstructed after decompression.

Author

Aryan Choughule

Engineering Student | DSA Enthusiast | Aspiring Software Developer

License

This project is developed for educational and learning purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
compressed_files		compressed_files
decompressed_files		decompressed_files
input_files		input_files
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Dynamic File Compression Utility

Overview

Problem Statement

Features

Technologies Used

Language

Libraries

Data Structures and Algorithms Used

Hash Map (Dictionary)

Min Heap (Priority Queue)

Binary Tree

Greedy Algorithm

Recursion

Project Architecture

Folder Structure

Installation

Clone Repository

Navigate to Project

Install Requirements

How to Run

Compress a File

Decompress a File

Sample Output

Compression

Decompression

Explanation of Compression Ratio

Screenshots

Project Structure

Compression Output

Decompression Output

GitHub Repository

Complexity Analysis

Frequency Table

Heap Construction

Huffman Tree Construction

Encoding

Decoding

Learning Outcomes

Future Enhancements

Interview Questions

Explain your project.

Why did you use a Min Heap?

Why is Huffman Coding called a Greedy Algorithm?

What is lossless compression?

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages