A blocking, shuffling and loss-less compression library that can be faster than `memcpy()`.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSES Updated year in copyright Mar 10, 2017
appveyor Removed conan configuration and moved to its own branch Jul 6, 2017
bench [FIX] Revert previous commit in config for CMake on Win Mar 16, 2018
blosc Post 1.14.4 release actions done Jul 30, 2018
cmake Preliminary support for Ztsd (0.7.4) is here. Jul 19, 2016
compat [FIX] Moved fsize declaration so as to be c89 compliant. Fixes #221 Mar 15, 2018
examples Thightened some loose ends in Zstd support Jul 20, 2016
internal-complibs [UPD] Zstd sources updated to 1.3.4 Apr 2, 2018
test_package Conan support Feb 2, 2018
tests Check that destination is larger than BLOSC_MAX_OVERHEAD. Fixes #234. Jun 21, 2018
.editorconfig Added EditorConfig specification to help normalize code styles. May 9, 2015
.gitignore Misc cmake tweaks to allow subproject use Oct 7, 2016
.mailmap Updated mail address Nov 7, 2014
.travis.yml Another test for using Python 3.6.6 on OSX (take 4) Jul 30, 2018
ANNOUNCE.rst Post 1.14.4 release actions done Jul 30, 2018
CMakeLists.txt Add a new DEACTIVATE_SSE2 option for cmake. Fixes #236. Jun 21, 2018
CODE_OF_CONDUCT.md Add a code of conduct Sep 13, 2018
README.md Fix benchmarks link in README Feb 1, 2018
README_HEADER.rst Refuse to read future format versions Feb 22, 2018
README_THREADED.rst Refactor: rename txt to rst Mar 12, 2012
RELEASE_NOTES.rst Post 1.14.4 release actions done Jul 30, 2018
RELEASING.rst [DOC] Updated the section on how to check forward compatibility Feb 16, 2018
THANKS.rst Added Kiyo Masui to the THANKS file Jun 30, 2015
THOUGHTS_FOR_2.0.txt Added an idea for having 64-bit sizes in header in addition to 32-bit Nov 11, 2014
appveyor.yml Stable channel by default Feb 2, 2018
blosc.pc.in Add a pkg-config file to installed prefix Oct 8, 2016
build.py Conan support Feb 2, 2018
cmake_uninstall.cmake.in Basic cmake setup Mar 17, 2013
conanfile.py Try to use symlinks for the shared libraries Mar 1, 2018

README.md

Blosc: A blocking, shuffling and lossless compression library

Author Contact URL
Francesc Alted francesc@blosc.org http://www.blosc.org
Gitter Travis CI Appveyor
Build Status Build Status Build Status

What is it?

Blosc is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that I'm aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations.

It uses the blocking technique so as to reduce activity in the memory bus as much as possible. In short, this technique works by dividing datasets in blocks that are small enough to fit in caches of modern processors and perform compression / decompression there. It also leverages, if available, SIMD instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in order to accelerate the compression / decompression process to a maximum.

See some benchmarks about Blosc performance.

Blosc is distributed using the BSD license, see LICENSES/BLOSC.txt for details.

Meta-compression and other differences over existing compressors

C-Blosc is not like other compressors: it should rather be called a meta-compressor. This is so because it can use different compressors and filters (programs that generally improve compression ratio). At any rate, it can also be called a compressor because it happens that it already comes with several compressor and filters, so it can actually work like a regular codec.

Currently C-Blosc comes with support of BloscLZ, a compressor heavily based on FastLZ (http://fastlz.org/), LZ4 and LZ4HC (https://github.com/Cyan4973/lz4), Snappy (https://github.com/google/snappy), Zlib (http://www.zlib.net/) and Zstd (http://www.zstd.net).

C-Blosc also comes with highly optimized (they can use SSE2 or AVX2 instructions, if available) shuffle and bitshuffle filters (for info on how and why shuffling works see here). However, additional compressors or filters may be added in the future.

Blosc is in charge of coordinating the different compressor and filters so that they can leverage the blocking technique as well as multi-threaded execution (if several cores are available) automatically. That makes that every codec and filter will work at very high speeds, even if it was not initially designed for doing blocking or multi-threading.

Finally, C-Blosc is specially suited to deal with binary data because it can take advantage of the type size meta-information for improved compression ratio by using the integrated shuffle and bitshuffle filters.

When taken together, all these features set Blosc apart from other compression libraries.

Compiling the Blosc library

Blosc can be built, tested and installed using CMake_. The following procedure describes the "out of source" build.

  $ cd c-blosc
  $ mkdir build
  $ cd build

Now run CMake configuration and optionally specify the installation directory (e.g. '/usr' or '/usr/local'):

  $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory ..

CMake allows to configure Blosc in many different ways, like preferring internal or external sources for compressors or enabling/disabling them. Please note that configuration can also be performed using UI tools provided by CMake (ccmake or cmake-gui):

  $ ccmake ..      # run a curses-based interface
  $ cmake-gui ..   # run a graphical interface

Build, test and install Blosc:

  $ cmake --build .
  $ ctest
  $ cmake --build . --target install

The static and dynamic version of the Blosc library, together with header files, will be installed into the specified CMAKE_INSTALL_PREFIX.

Codec support with CMake

C-Blosc comes with full sources for LZ4, LZ4HC, Snappy, Zlib and Zstd and in general, you should not worry about not having (or CMake not finding) the libraries in your system because by default the included sources will be automatically compiled and included in the C-Blosc library. This effectively means that you can be confident in having a complete support for all the codecs in all the Blosc deployments (unless you are explicitly excluding support for some of them).

But in case you want to force Blosc to use external codec libraries instead of the included sources, you can do that:

  $ cmake -DPREFER_EXTERNAL_ZSTD=ON ..

You can also disable support for some compression libraries:

  $ cmake -DDEACTIVATE_SNAPPY=ON ..  # in case you don't have a C++ compiler

Examples

In the examples/ directory you can find hints on how to use Blosc inside your app.

Supported platforms

Blosc is meant to support all platforms where a C89 compliant C compiler can be found. The ones that are mostly tested are Intel (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM Blue Gene Q embedded "A2" processor are reported to work too.

Mac OSX troubleshooting

If you run into compilation troubles when using Mac OSX, please make sure that you have installed the command line developer tools. You can always install them with:

  $ xcode-select --install

Wrapper for Python

Blosc has an official wrapper for Python. See:

https://github.com/Blosc/python-blosc

Command line interface and serialization format for Blosc

Blosc can be used from command line by using Bloscpack. See:

https://github.com/Blosc/bloscpack

Filter for HDF5

For those who want to use Blosc as a filter in the HDF5 library, there is a sample implementation in the hdf5-blosc project in:

https://github.com/Blosc/hdf5-blosc

Mailing list

There is an official mailing list for Blosc at:

blosc@googlegroups.com http://groups.google.es/group/blosc

Acknowledgments

See THANKS.rst.


Enjoy data!