Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add info about the codec used for compression #8

Closed
FrancescAlted opened this issue Feb 16, 2014 · 2 comments
Closed

Add info about the codec used for compression #8

FrancescAlted opened this issue Feb 16, 2014 · 2 comments

Comments

@FrancescAlted
Copy link
Member

Currently, the info subcommand does not offer info on the codec used for compressing a file:

$ blpk i p.dat.blp
blpk: bloscpack header: 
blpk:     format_version=3,
blpk:     offsets=True,
blpk:     metadata=False,
blpk:     checksum='adler32',
blpk:     typesize=8,
blpk:     chunk_size=1.0M (1048576B),
blpk:     last_chunk=962.0K (985088B),
blpk:     nchunks=763,
blpk:     max_app_chunks=7630
blpk: 'offsets':
blpk: [67176,257788,400131,536937,653836,...]

I think the recently added get_clib(cbuffer) function in python-blosc 1.2.1 should help here.

@esc
Copy link
Member

esc commented Feb 16, 2014

Yes and no. One could print the clib version of, for example the first chunk. However, since when using append it is perfectly fine to use a different codec for different chunks, that might not be entirely reliable. One could also print the codec used in all chunks, but that would defeat the purpose of the info subcommand, which by design is meant to return quickly and read only the header of the file. Best solution would be to add a new field to the Bloscpack header, which can contain the clib int or -1 if it is non-uniform. The problem with that being, that of course the header definition must change and the code to handle this must be written. ;)

@FrancescAlted
Copy link
Member Author

Ah yes, makes perfect sense. In fact, in the future it would be nice to add a special compression mode in Blosc (either c-blosc or python-blosc) so that a codec could be chosen automatically in order to optimize compression ratio, speed, or a balance between them.

So I agree that it makes non sense this to be supported at all, except for the case where the clib is enforced to be uniform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants