with Python
Explore the BWT »
Explore the Huffman coding »
Table of Contents
NOTICE: For better interface aesthetic use Windows instead of Linux.
This project implements the Burrows Wheeler Transform and the Huffman coding algorithm using Python in order to compress genome sequences.
We can use this application:
- To compress genome files (fasta, txt).
- To decompress files to get genomic sequence.
- To implement step by step the Burrows Wheeler Transform of a given sequecne.
- To decrypt a Burrows Wheeler Sequence.
- To visualize the full compression and decompression process of a genome.
Of course, we can enter genomic sequence manually or with a file (fasta, txt)
To get a local copy up and running follow these simple steps
-
Clone the repository locally
git clone https://github.com/LouaiKB/-BWT-Huffman-coding cd -BWT-Huffman-coding/
- Install all the dependencies from the requirements.txt
pip install -r requirements.txt
- If problems occur with the dependecies installation try:
pip install -r requirements.txt --no-index --find-links file:///tmp/packages
cd scripts/
# run main.py
python main.py
- If you want to proceed the compression of a file you can enter the sequence manually in the text box then press the compression button. Or if you have a genome file Note:Enter only fasta or txt files you can press button directly to proceed the compression step by step.
- Once you click the button a toplevel window appears ET VOILA!
NOTICE: The Huffman binary tree is presented in the Newick format. Check the Nwick format here.
- Next button to complete the process
NOTICE: The compression process will save two files the compressed file + json associated file which will be used for the decompression process.
- For the decompression WE CAN'T ENTER THE SEQUENCE MANUALLY because we need the json associated file.
- Press decompress and choose the compressed sequence file and the json associated to this file. NOTICE: The compressed file and the json file have the same name.
- The BWT button performs the Burrows Wheeler Tranformation.
- Enter the sequence manually or with a file.
- You can choose if you want to proceed the Transform step by step or not.
* The ***Burrows Wheeler transform*** is presented in the next column of the Burrows Wheeler construction matrix.
- Choose BWT decryption to perform this algorithm if you want to decrypt (or retransform the original sequence from BWT).
- The original sequence is presented in the row which ends by '$' in the Burrows Wheeler reconstruction matrix.
- This will allow us to proceed all the compression and decompression process starting from the Burrows Wheeler Transform to the Huffman coding.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
Project Link: https://github.com/LouaiKB/-BWT-Huffman-coding