JAS Compression is a custom text compression library that uses tokenization, specialized preprocessing for different text-based formats, and deterministic Huffman encoding to compress and decompress text files. The project is designed to handle plain text, JSON, CSV, XML, and YAML formats.
- Tokenization: Breaks text into tokens (words, punctuation, whitespace, etc.).
- Special Phrase Detection: Identifies and replaces frequently occurring special phrases to improve compression.
- Deterministic Huffman Encoding: Uses a deterministic Huffman tree for consistent encoding and decoding.
- Format-Specific Preprocessing: Supports normalization for JSON, CSV, XML, and YAML files.
- Command-Line Interface: Provides a CLI for compression and decompression with verbose logging and progress bars.
You can install the package via PyPI:
pip install jas-compression
Or, for the latest development version, clone the repository and install it locally:
git clone https://github.com/yourusername/jas-compression.git
cd jas
pip install .To compress a text file:
python -m jas.cli compress input.txt output.jas --verboseTo decompress a .jas file:
python -m jas.cli decompress output.jas result.txt --verbosejas-compression/
├── jas/
│ ├── __init__.py
│ ├── compressor.py
│ ├── decompressor.py
│ ├── cli.py
│ ├── huffman.py
│ ├── tokenizer.py
│ ├── structured.py
│ ├── utils.py
│ └── bitstream.py
├── README.md
├── setup.py
├── MANIFEST.in
├── LICENSE
└── requirements.txtContributions are welcome! Please open an issue or submit a pull request on GitHub. Make sure to follow the existing code style and include tests for any new features.
This project is licensed under the MIT License.