Skip to content
A simple, compressed, fast and persistent data store library for C
C CMake C++ Makefile Python Starlark
Branch: master
Clone or download

Latest commit

Latest commit e840dad May 29, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/workflows
.idea Use per project settings with clion May 26, 2020
.vscode Check more error conditions and add vscode config files May 26, 2020
LICENSES Cosmetic change in license Apr 16, 2019
bench Fix a warning on unused openmp pragma May 19, 2020
blosc More cleanup May 29, 2020
cmake
compat Fix warnings May 18, 2020
contrib Add development files for NEON Jul 28, 2017
examples Fix warnings May 18, 2020
internal-complibs Internal Zstd sourced updated to 1.4.5 May 26, 2020
scripts Added a newline at EOF Apr 26, 2015
tests Some fine-tuning on blosclz codec May 26, 2020
.editorconfig Added EditorConfig specification to help normalize code styles. May 9, 2015
.gitignore Add build folder to gitignore. May 9, 2015
.mailmap Updated mail address Nov 7, 2014
ANNOUNCE.md Fix a typo in docs and new summary line for C-Blosc2 Apr 22, 2020
CMakeLists.txt Merge branch 'master' into improvements/disable-cxx Apr 29, 2020
CODE_OF_CONDUCT.md
CONTRIBUTING.md Update CONTRIBUTING.md Feb 11, 2019
DEVELOPING-GUIDE.rst Getting ready for release 2.0.0-beta.4 Sep 13, 2019
FUNDING.yml Add a sponsor/donate button Nov 13, 2019
LICENSE.txt Copyright shifted into the Blosc Development Team Aug 12, 2019
README.rst Added GitHub Actions badge. Apr 27, 2020
README_ARM.rst Benchmark for ODROID-XU3 Sep 8, 2015
README_CHUNK_FORMAT.rst Document bit for additional storage area and codecs May 28, 2020
README_FRAME_FORMAT.rst Small typo Sep 18, 2019
README_THREADED.rst documentation for blosc_set_threads_callback (#86) Sep 13, 2019
RELEASE_NOTES.md Internal Zstd sourced updated to 1.4.5 May 26, 2020
RELEASING.rst Mention new ALTIVEC optimizations in release notes Apr 22, 2020
ROADMAP.md Updated roadmap after steering council meeting Mar 26, 2020
THANKS.rst Update RELEASE_NOTES Aug 29, 2019
THOUGHTS_FOR_2.0.txt Added an idea for having 64-bit sizes in header in addition to 32-bit Nov 11, 2014
TODO-refactorization.txt [WIP] First steps for a truly filter pipeline Aug 18, 2017
blosc.pc.in Add a pkg-config file to installed prefix Jun 13, 2017
cmake_uninstall.cmake.in Basic cmake setup Mar 17, 2013

README.rst

C-Blosc2: A simple, compressed, fast and persistent data store library for C

Author: The Blosc Development Team
Contact: blosc@blosc.org
URL:http://www.blosc.org
Gitter:Join the chat at https://gitter.im/Blosc/c-blosc
Actions:actions
NumFOCUS:numfocus

What is it?

Blosc is a high performance compressor optimized for binary data (i.e. floating point numbers, integers and booleans). It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc main goal is not just to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations.

C-Blosc2 is the new major version of C-Blosc, with full support for 64-bit containers, filter pipelining, new filters, new codecs and dictionaries for improved compression ratio. The new 64-bit data containers support both sparse (super-chunks) and sequential (frames) storage, either in-memory or on-disk. The frame is a sequential format that is very simple and meant to be used for either persistency or send to other processes or machines. Finally, the frames can be annotated with metainfo (metalayers, usermeta) that is provided by the user. More info about the improved capabilities of C-Blosc2 can be found in this talk.

C-Blosc2 tries hard to be backward compatible with both the C-Blosc1 API and in-memory format. Furthermore, if you just use the C-Blosc1 API you are guaranteed to generate compressed data containers that can be read with C-Blosc2, but getting the benefit of better performance, like for example leveraging the accelerated versions of codecs present in Intel's IPP (LZ4 is supported now and others will follow).

C-Blosc2 is currently in beta stage, so not ready to be used in production yet. Having said this, the beta stage means that the API has been declared frozen, so there is guarantee that your programs will continue to work with future versions of the library. If you want to collaborate in this development you are welcome. We need help in the different areas listed at the ROADMAP; also, be sure to read our DEVELOPING-GUIDE. Blosc is distributed using the BSD license.

Meta-compression and other advantages over existing compressors

C-Blosc2 is not like other compressors: it should rather be called a meta-compressor. This is so because it can use different compressors and filters (programs that generally improve compression ratio). At any rate, it can also be called a compressor because it happens that it already comes with several compressor and filters, so it can actually work like so.

Currently C-Blosc2 comes with support of BloscLZ, a compressor heavily based on FastLZ, LZ4 and LZ4HC, Zstd, Lizard and Zlib, via miniz:, as well as a highly optimized (it can use SSE2, AVX2, NEON or ALTIVEC instructions, if available) shuffle and bitshuffle filters (for info on how shuffling works, see slide 17 of http://www.slideshare.net/PyData/blosc-py-data-2014).

Blosc is in charge of coordinating the different compressor and filters so that they can leverage the blocking technique as well as multi-threaded execution automatically. That makes that every codec and filter in the pipeline will run efficiently on modern CPUs, even if it was not initially designed for doing blocking or multi-threading.

Another important aspect of C-Blosc2 is that it splits large datasets in smaller containers called chunks, which are basically Blosc1 containers. For maximum performance, these chunks are meant to fit in the LLC (Last Level Cache) of CPUs. In practice this means that in order to leverage C-Blosc2 containers effectively, the user should ask for C-Blosc2 to uncompress the chunks, consume them before they hit main memory and then proceed with the new chunk (as in any streaming operation). We call this process Streamed Compressed Computing and it effectively avoids uncompressed data to travel to RAM, saving precious time in modern architectures where RAM access is very expensive compared with CPU speeds.

Multidimensional containers

As said, C-Blosc2 adds a powerful mechanism for adding different metalayers on top of its containers. Caterva is a sibling library that adds such a metalayer specifying not only the dimensionality of a dataset, but also the dimensionality of the chunks inside the dataset. In addition, Caterva adds machinery for retrieving arbitrary multi-dimensional slices (aka hyper-slices) out of the multi-dimensional containers in the most efficient way. Hence, Caterva brings the convenience of multi-dimensional containers to your application very easily. For more info, check out the Caterva documentation.

Compiling the C-Blosc2 library with CMake

Blosc can be built, tested and installed using CMake. The following procedure describes a typical CMake build.

Create the build directory inside the sources and move into it:

$ cd c-blosc2-sources
$ mkdir build
$ cd build

Now run CMake configuration and optionally specify the installation directory (e.g. '/usr' or '/usr/local'):

$ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory ..

CMake allows to configure Blosc in many different ways, like prefering internal or external sources for compressors or enabling/disabling them. Please note that configuration can also be performed using UI tools provided by CMake (ccmake or cmake-gui):

$ ccmake ..      # run a curses-based interface
$ cmake-gui ..   # run a graphical interface

Build, test and install Blosc:

$ cmake --build .
$ ctest
$ cmake --build . --target install

The static and dynamic version of the Blosc library, together with header files, will be installed into the specified CMAKE_INSTALL_PREFIX.

Once you have compiled your Blosc library, you can easily link your apps with it as shown in the examples/ directory.

Handling support for codecs (LZ4, LZ4HC, Zstd, Lizard, Zlib)

C-Blosc2 comes with full sources for LZ4, LZ4HC, Zstd, Lizard and Zlib and in general, you should not worry about not having (or CMake not finding) the libraries in your system because by default the included sources will be automatically compiled and included in the C-Blosc2 library. This means that you can be confident in having a complete support for all the codecs in all the Blosc deployments (unless you are explicitly excluding support for some of them).

If you want to force Blosc to use external libraries instead of the included compression sources:

$ cmake -DPREFER_EXTERNAL_LZ4=ON ..

You can also disable support for some compression libraries:

$ cmake -DDEACTIVATE_SNAPPY=ON ..

Supported platforms

C-Blosc2 is meant to support all platforms where a C99 compliant C compiler can be found. The ones that are mostly tested are Intel (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM Blue Gene Q embedded "A2" processor are reported to work too.

For Windows, you will need at least VS2015 or higher on x86 and x64 targets (i.e. ARM is not supported on Windows).

For Mac OSX, make sure that you have installed the command line developer tools. You can always install them with:

$ xcode-select --install

Support for the LZ4 optimized version in Intel IPP

C-Blosc2 comes with support for a highly optimized version of the LZ4 codec present in Intel IPP, and actually if the cmake machinery in C-Blosc2 discovers IPP installed in your system it will use it automatically by default. Here it is a way to easily install Intel IPP in Ubuntu machines:

$ wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
$ apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
$ sudo sh -c 'echo deb https://apt.repos.intel.com/ipp all main > /etc/apt/sources.list.d/intel-ipp.list'
$ sudo apt-get update && sudo apt-get install intel-ipp-64bit-2019.X  # replace .X by the latest version

Check Intel IPP website for instructions on how to install it for other platforms.

Mailing list

There is an official mailing list for Blosc at:

blosc@googlegroups.com http://groups.google.es/group/blosc

Acknowledgments

See THANKS.rst.


Enjoy data!
You can’t perform that action at this time.