Generic compression using autoencoder
Python 3.X.X is required for this to work.
- Create your python(v3) virtual environment
- Install the necessary packages
pip install tensorflow==2.1.0
pip install keras==2.3.1
pip install opencv-python==4.2.0.32
pip install Pillow==7.0.0
pip install image==1.5.28
pip install noisereduce
pip install numpy
pip install matplotlib
- Download the following files from this github repo:
- Download this .zip file --> Click Here
- Put the downloaded files in a single folder
Open your CMD and navigate to the installation folder from step 4 (from the Installation section)
Note: the model detects the filetype (image/audio) automatically, you don't have to specify.
To Encode (compress), use the following command:
python main.py encode [input_file_path] [compressed_file_path]
Examples:
python main.py encode myimage.png mycompressed
python main.py encode myaudio.wav mycompressed
Note: You are required to include the input file extension but not the compressed file.
To Decode (decompress), use the following command:
python main.py decode [compressed_file_path] [output_file_path]
Example:
python main.py decode mycompressed my_image_output.png
python main.py decode mycompressed my_audio_output
Note: You are required to include the output file extension for the image output only.
- .wav
- .JPEG
- .JPG
- .PNG
- .TIFF
Note: Not all the content of the datasets were used due to resources limitations. https://www.kaggle.com/evgeniumakov/images4k
http://www.cs.toronto.edu/~kriz/cifar.html
https://www.kaggle.com/hsankesara/flickr-image-dataset
https://www.kaggle.com/vishalsubbiah/pokemon-images-and-types
- All images are processed first using data_generator.py before being used for training.
- All images are cut into 32x32 blocks to match the model's input size.
- Around ~1,000,000 32x32x3 images are used for training. (dataset contains 15,000,000+)
- Beatport EDM Key Dataset https://zenodo.org/record/1101082#.XqyLuqgzZPZ
Note: A portion of 2.49 GB Wav files (125 song) of the dataset is used in training.