Skip to content

Aryan-05CS/Dynamic-File-Compression-Utility

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamic File Compression Utility

Overview

Dynamic File Compression Utility is a Data Structures and Algorithms (DSA) project that implements Huffman Coding for lossless file compression and decompression.

The application reads a text file, calculates character frequencies, builds a Huffman Tree using a Min Heap, generates optimal binary codes, compresses the file, and reconstructs the original content during decompression.

This project demonstrates practical applications of Trees, Heaps, Greedy Algorithms, Hash Maps, and File Handling.


Problem Statement

Large files consume storage space and increase data transfer time.

The objective of this project is to reduce file size using Huffman Coding while ensuring that the original file can be recovered without any loss of information.


Features

  • Lossless file compression
  • Huffman Coding implementation
  • File decompression
  • Compression statistics generation
  • Command Line Interface (CLI)
  • Frequency table generation
  • Huffman Tree construction
  • Compression ratio calculation
  • Modular project structure

Technologies Used

Language

  • Python 3

Libraries

  • heapq
  • collections
  • argparse
  • json
  • os

Data Structures and Algorithms Used

Hash Map (Dictionary)

Used to store character frequencies.

Example:

{
    'a': 15,
    'b': 10,
    'c': 5
}

Min Heap (Priority Queue)

Used to efficiently select the two nodes with the smallest frequency while building the Huffman Tree.

Binary Tree

The Huffman Tree is implemented using a Binary Tree.

Greedy Algorithm

Huffman Coding is a Greedy Algorithm because it repeatedly selects the least frequent nodes to generate optimal prefix codes.

Recursion

Used for traversing the Huffman Tree and generating binary codes.


Project Architecture

Input File
     |
     v
Read File
     |
     v
Frequency Table
     |
     v
Min Heap
     |
     v
Huffman Tree
     |
     v
Code Generation
     |
     v
Compression
     |
     v
Compressed File
     |
     v
Decompression
     |
     v
Original File Recovery

Folder Structure

Dynamic-File-Compression-Utility/
│
├── input_files/
│
├── compressed_files/
│
├── decompressed_files/
│
├── src/
│   ├── __init__.py
│   ├── huffman.py
│   ├── compressor.py
│   └── decompressor.py
│
├── outputs/
├── images/
├── docs/
│
├── README.md
├── requirements.txt
├── .gitignore
└── main.py

Installation

Clone Repository

git clone https://github.com/your-username/Dynamic-File-Compression-Utility.git

Navigate to Project

cd Dynamic-File-Compression-Utility

Install Requirements

pip install -r requirements.txt

How to Run

Compress a File

python main.py compress input_files/sample.txt

Decompress a File

python main.py decompress compressed_files/sample.txt.huff

Sample Output

Compression

Compression Successful

Original Size : 233 bytes
Compressed Size : 954 bytes
Compression Ratio : 409.44 %

Decompression

Decompression Successful

Output :
decompressed_files/sample.txt_decompressed.txt

Explanation of Compression Ratio

This project stores Huffman codes as text strings of 0s and 1s.

Therefore, the compressed file may appear larger than the original file.

In production compression systems, encoded bits are packed into binary bytes, resulting in actual compression.

The purpose of this implementation is educational and focused on demonstrating DSA concepts.


Screenshots

Project Structure

Add screenshot here:

images/project_structure.png

Compression Output

Add screenshot here:

images/compression_output.png

Decompression Output

Add screenshot here:

images/decompression_output.png

GitHub Repository

Add screenshot here:

images/github_repository.png

Complexity Analysis

Frequency Table

Time Complexity:

O(n)

Heap Construction

Time Complexity:

O(n log n)

Huffman Tree Construction

Time Complexity:

O(n log n)

Encoding

Time Complexity:

O(n)

Decoding

Time Complexity:

O(n)

Learning Outcomes

Through this project, I learned:

  • Huffman Coding
  • File Compression Techniques
  • Binary Trees
  • Min Heap Implementation
  • Greedy Algorithms
  • Recursion
  • File Handling in Python
  • Command Line Interface Development
  • Project Documentation
  • GitHub Project Management

Future Enhancements

  • Binary bit-level compression
  • Huffman Tree visualization
  • GUI interface using Tkinter
  • Multi-file compression
  • Compression benchmarking
  • Unit testing
  • Real-time compression statistics
  • Support for binary files

Interview Questions

Explain your project.

This project implements Huffman Coding for lossless file compression. It calculates character frequencies, builds a Huffman Tree using a Min Heap, generates optimal binary codes, compresses the file, and later reconstructs the original file through decompression. The project demonstrates Hash Maps, Priority Queues, Binary Trees, Greedy Algorithms, and File Handling.

Why did you use a Min Heap?

A Min Heap allows efficient retrieval of the two nodes with the smallest frequency, which is required while constructing the Huffman Tree.

Why is Huffman Coding called a Greedy Algorithm?

Because at every step it selects the locally optimal choice, which is combining the two least frequent nodes.

What is lossless compression?

Lossless compression allows the original data to be perfectly reconstructed after decompression.


Author

Aryan Choughule

Engineering Student | DSA Enthusiast | Aspiring Software Developer


License

This project is developed for educational and learning purposes.

About

A Huffman Coding based file compression utility demonstrating Min Heap, Binary Tree, Greedy Algorithm, Hash Map and File Handling concepts in Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages