-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blosc use case #73
Comments
Closed further research answered most questions. |
Yes, the default compressor in Blosc (BloscLZ) is geared towards speed, not 2014-12-03 9:08 GMT+01:00 OneArb notifications@github.com:
Francesc Alted |
http://heartofcomp.altervista.org/MOC/MOCACE.htm Would it be worth submiting and get blosck in the fray ? Looking over the benchmark section I notice that bloscLZ is the only decompressor able to outperform memcopy, at least on your machine. [blosc zlib benchmark] 'http://www.blosc.org/benchmarks-zlib.html) use a different compression scale than the other compressor. It also starts at 0% (vs. 1) which interfers with the graph readability. Some chart across compressors would ease comparison. I sure would like to see bloscLZ take its due place within the compressor benchmark community.
|
Regarding the zlib Benchmarks, the first measurement is also at one, but because zlib has such high compression ratios, especially with that dataset, it looks like the measurement is at zero. Ideally we should start all graphs at one, since this means "no compression". Regarding the speed of BloscLZ, I believe what you are seeing is a distortion due to measurement. The only benchmarks we have listed fo LZ4 right now are from a BlueGene. This is a HPC architecture and let's just say things behave differently there than on commodity hardware. I believe that both LZ4 and BloscLZ (maybe snappy too) can outperform |
FYI: the reason we get these "off-the-charts" ratios for zlib is because of the shuffle filter in Blosc that can pre-condition certain datasets favorably for zlib, effectively boosting the compression ratio. See also: http://slides.zetatech.org/haenel-ep14-compress-me-stupid.pdf page 23 onwards |
https://www.youtube.com/watch?v=IzqlWUTndTo at 9:39 at 11:19 compressor charts vs. memcopy for each distribution type. I see Intel Core i5 test for each supported compressor on http://blosc.org/synthetic-benchmarks.html, perhaps the benchmark distorsion has some other source ? |
I am checking if I could use blosc to compress 1000 char long strings or so.
As a test I am using the string "Methionylthreonylthreonylglutaminyla..." which is highly repetive.
http://blog.jmay.us/2009/11/longest-english-word.html
I modified simple.c and the best I can get is 1.5x compression with shuffle and 2.8x without shuffle at clevel 9
without shuffle
chars
1000 1.4x
2000 1.8x
3000 2x
4000 2.1x
5000 2.3x
ZIP compresses the full string to 5.5x
Follows my settings :
define LINESIZE 98310
define SIZE 100000
define SHAPE {10,10,10}
define CHUNKSHAPE {1,10,10}
static unsigned char data[LINESIZE];
static unsigned char data_out[SIZE];
static unsigned char data_dest[LINESIZE];
Questions : Am I within expected compression ratios without switching to Zlib ?
Is the block / string I intend to compress too small for blosc use case ?
Is there any prospect for blosc to support indexed and random access of compressed blocks ?
Any suggestions for performance "small" string compression ?
The text was updated successfully, but these errors were encountered: