Release BSLab (including lzp-rollhash, mtf_shelwien and mtf_cuda) and Radix-sort benchmarks · Bulat-Ziganshin/Compression-Research

BSL (the block-sorting lab) benchmark:

A lot of experiments with CUDA MTF implementations. The best one, depending on actual data, is either mtf_scalar, mtf_2buffers or mtf_4by8 (see results.txt).
CUDA MTF raw speed reached 700 MB/s on ENWIK8 data, that is 1.5-2 GB/s effective speed, taking into account that preceding RLE stage shaves off 60-70% of BWT output.
CPU MTF algorithm by Eugene Shelwien, 150-200 MB/s raw speed, i.e. 500 MB/s effective speed per core.
New rolling-hash based LZP preprocessing algorithm, up to 500 MB/s per core.
Almost complete, LZP+BWT/ST+RLE+MTF stack (only entropy coding isn't yet implemented), allowing to measure speed/ratio of various stage combinations.

Radix-sort benchmark: measures speed of the CUB radix sort with various parameters.

All GPU speeds are measured on GF560Ti overclocked to 900 MHz. All CPU speeds are measured on the Haswell i7-4770.

Provide feedback