TinyZZZ is a simple, standalone data compressor/decompressor which supports several popular data compression algorithms, including GZIP, LZ4, ZSTD, LZMA and LPAQ8. These algorithms are written in C language, unlike the official code implementation, this code mainly focuses on simplicity and easy to understand.
TinyZZZ currently supports following compression algorithms:
format | file suffix | compress | decompress |
---|---|---|---|
GZIP | .gz | 510 lines of C | ❌ not yet supported |
LZ4 | .lz4 | 170 lines of C | 190 lines of C |
ZSTD | .zst | ❌ not yet supported | 760 lines of C |
LZMA | .lzma | 780 lines of C | 480 lines of C |
LPAQ8 | .lpaq8 | 860 lines of C | 860 lines of C |
Explanation:
format | year | Explanation |
---|---|---|
GZIP | 1989 | GZIP is an old, famous lossless data compression algorithm which has excellent compatibility. The core compression algorithm of GZIP is Deflate. The file name suffix of compressed GZIP file is ".gz" |
LZ4 | 2014 | LZ4 is a new, lightweight lossless data compression algorithm with very high decompression speed. The file name suffix of compressed LZ4 file is ".lz4" |
ZSTD | 2016 | ZSTD (Zstandard) is a new lossless data compression algorithm with high compression ratio and high decompression speed. The file name suffix of compressed ZSTD file is ".zstd" |
LZMA | 2000 | LZMA is a lossless data compression algorithm with higher compression ratio than LZ4, GZIP, BZIP, and ZSTD. Several archive container formats supports LZMA: (1) ".lzma" is a very simple format to contain LZMA, which is legacy and gradually replaced by ".xz" format. (2) ".7z" and ".xz" format, whose default compression method is LZMA. |
LPAQ8 | 2008 | LPAQ8 is a slow, high-compression-ratio lossless data compression algorithm by Alexander Rhatushnyak and Matt Mahoney. The basic principle of LPAQ8 is context-mixing instead of LZ77. You can download the official implement of LPAQ8 from https://mattmahoney.net/dc/lpaq8.zip . I've put lpaq8.exe (official executable file) in this repo for comparison. |
ZIP | 1989 | ZIP is not actually a data compression algorithm, but a container format that supports file packaging and compressing by many compression algorithms. This code supports compress a file to ZIP container by deflate algorithm or LZMA algorithm. |
On Linux, run following command to compile. The output Linux binary file is tinyZZZ
Note: The code complies with the C99 standard.
gcc src/*.c -O2 -std=c99 -Wall -o tinyZZZ
If you installed MinGW in Windows, run following command to compile. The output executable file is tinyZZZ.exe
gcc src\*.c -O2 -std=c99 -Wall -o tinyZZZ.exe
If you added MSVC compiler (cl.exe) to environment, run following command to compile. The output executable file is tinyZZZ.exe
cl src\*.c /Ox /FetinyZZZ.exe
Run TinyZZZ to show usage:
└─$ ./tinyZZZ
|-------------------------------------------------------------------------------------------|
| Usage : |
| - decompress a GZIP file : *** not yet supported! *** |
| - compress a file to GZIP file : tinyZZZ -c --gzip <input_file> <output_file(.gz)> |
| - decompress a LZ4 file : tinyZZZ -d --lz4 <input_file(.lz4)> <output_file> |
| - compress a file to LZ4 file : tinyZZZ -c --lz4 <input_file> <output_file(.lz4)> |
| - decompress a ZSTD file : tinyZZZ -d --zstd <input_file(.zst)> <output_file> |
| - compress a file to ZSTD file : *** not yet supported! *** |
| - decompress a LZMA file : tinyZZZ -d --lzma <input_file(.lzma)> <output_file> |
| - compress a file to LZMA file : tinyZZZ -c --lzma <input_file> <output_file(.lzma)> |
| - decompress a LPAQ8 file : tinyZZZ -d --lpaq8 <input_file(.lpaq8)> <output_file> |
| - compress a file to LPAQ8 file: tinyZZZ -c --lpaq8 <input_file> <output_file(.lpaq8)> |
|-------------------------------------------------------------------------------------------|
| Usage (compress to ZIP container) : |
| - use Deflate method : tinyZZZ -c --gzip --zip <input_file> <output_file(.zip)> |
| - use LZMA method : tinyZZZ -c --lzma --zip <input_file> <output_file(.zip)> |
|-------------------------------------------------------------------------------------------|
Example1: decompress the file example.txt.zst
to example.txt
use following command.
./tinyZZZ -d --zstd example.txt.zst example.txt
Example2: compress example.txt
to example.txt.gz
use following command. The outputting ".gz" file can be extracted by many other software, such as 7ZIP, WinRAR, etc.
./tinyZZZ -c --gzip example.txt example.txt.gz
Example3: compress example.txt
to example.txt.lzma
use following command.
./tinyZZZ -c --lzma example.txt example.txt.lzma
Example4: decompress example.txt.lzma
to example.txt
use following command.
./tinyZZZ -d --lzma example.txt.lzma example.txt
Example5: compress example.txt
to example.txt.lz4
use following command.
./tinyZZZ -c --lz4 example.txt example.txt.lz4
Example6: decompress example.txt.lz4
to example.txt
use following command.
./tinyZZZ -d --lz4 example.txt.lz4 example.txt
Example7: compress example.txt
to example.zip
use following command (method=deflate). The outputting ".zip" file can be extracted by many other software, such as 7ZIP, WinRAR, etc.
./tinyZZZ -c --gzip --zip example.txt example.zip
Example8: compress example.txt
to example.zip
use following command (method=LZMA). The outputting ".zip" file can be extracted by many other software, such as 7ZIP, WinRAR, etc.
./tinyZZZ -c --lzma --zip example.txt example.zip
On Windows, you can use the official 7ZIP/LZMA software to decompress the generated ".lzma" file. To get it, download the "LZMA SDK", extract it. In the "bin" directory, you can see "lzma.exe". To decompress a ".lzma" file, run command as format:
.\lzma.exe d <input_lzma_file> <output_file>
On Linux, you can decompress ".lzma" file using the official "p7zip" software. You should firstly install it:
apt-get install p7zip
Then use following command to decompress the ".lzma" file. It may report a error : "ERROR: There are some data after the end of the payload data" . Just ignore it, because there may be a extra "0x00" at the end of ".lzma" file. It won't affect the normal data decompression.
7z x [input_lzma_file]
-
GZIP specification: https://www.rfc-editor.org/rfc/rfc1951
-
Deflate algorithm specification: https://www.rfc-editor.org/rfc/rfc1952
-
LZ4 official code: https://github.com/lz4/lz4
-
LZ4 specification: https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md , https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md
-
ZSTD specification: https://www.rfc-editor.org/rfc/rfc8878
-
ZSTD official code: https://github.com/facebook/zstd
-
ZSTD official lightweight decompressor: https://github.com/facebook/zstd/tree/dev/doc/educational_decoder
-
LZMA official code and the 7ZIP software: https://www.7-zip.org/sdk.html
-
another LZMA official code and the XZ software: https://tukaani.org/xz/
-
An introduction to LZMA algorithm: https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm
-
An FPGA-based hardware GZIP data compressor: https://github.com/WangXuan95/FPGA-Gzip-compressor
-
An FPGA-based hardware LZMA data compressor: https://github.com/WangXuan95/FPGA-LZMA-compressor
-
LPAQ8 official code : https://mattmahoney.net/dc/#lpaq
-
principle of context-mixing and PAQ : https://mattmahoney.net/dc/dce.html#Section_43