Skip to content
A columnar data container that can be compressed.
C Jupyter Notebook Python C++ CMake Objective-C Other
Branch: master
Clone or download

Latest commit

Latest commit 3fc7e55 Apr 10, 2020


Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSES Getting ready for 1.2.0 final Mar 23, 2018
bcolz Fix deprecation of DataFrame.from_items Feb 19, 2020
bench Fix deprecation of DataFrame.from_items Feb 19, 2020
bench_asv rename project as bcolz Oct 7, 2015
c-blosc Internal C-Blosc sources bumped to 1.14.3 Apr 12, 2018
conda.recipe use conda Jul 19, 2015
continuous-integration/appveyor [FIX] Making appeveyor.yml up-to-date (take 5) Mar 23, 2018
docs Merge pull request #395 from bingyao/doc_tutorial_bug_fix Jun 27, 2019
examples Add a timing for a query without outcols param May 20, 2019
.binstar.yml remove e-mail notification Nov 24, 2014
.coveragerc coveragerc Aug 31, 2015
.gitignore Struggling with versions May 25, 2016
.mailmap update .mailmap Jun 30, 2015
.travis.yml [FEAT] Python 2.6 and 3.4 are deprecated now Mar 23, 2018
ANNOUNCE.rst Getting ready for 1.2.1 final Apr 13, 2018 Add a code of conduct Sep 13, 2018
DISK_FORMAT_v1.rst Added the a description for the blessed format file for bcolz 1.0 Mar 8, 2016 doc/ moved into more standard docs/ May 25, 2016
README.rst Getting ready for 1.2.0 final Mar 23, 2018
RELEASE_NOTES.rst Post 1.2.1 release actions done Apr 13, 2018
RELEASING.rst Post 1.1.0 release actions done Jun 10, 2016
THANKS.rst Added Alistair to the THANKS file. Feb 10, 2017
appveyor.yml [FIX] Making appeveyor.yml up-to-date (take 8) Mar 23, 2018 Fix coverage > 100% Sep 1, 2015 updated to 4.0.0 Apr 12, 2018
makefile doc/ moved into more standard docs/ May 25, 2016
persistence.rst -> Sep 1, 2015
requirements.txt Updated versions of dependencies Apr 6, 2016
requirements_rtfd.txt Adding all the dependencies so that the Cython docstrings can be incl… May 25, 2016
requirements_test.txt Added the numpydoc requirement for RTFD May 25, 2016 New version of Apr 12, 2018


bcolz: columnar and compressed data containers

Join the chat at
Travis CI:travis


bcolz provides columnar, chunked data containers that can be compressed either in-memory and on-disk. Column storage allows for efficiently querying tables, as well as for cheap column addition and removal. It is based on NumPy, and uses it as the standard data container to communicate with bcolz objects, but it also comes with support for import/export facilities to/from HDF5/PyTables tables and pandas dataframes.

bcolz objects are compressed by default not only for reducing memory/disk storage, but also to improve I/O speed. The compression process is carried out internally by Blosc, a high-performance, multithreaded meta-compressor that is optimized for binary data (although it works with text data just fine too).

bcolz can also use numexpr internally (it does that by default if it detects numexpr installed) or dask so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr/dask can optimize the memory usage and use multithreading for doing the computations, so it is blazing fast. This, in combination with carray/ctable disk-based, compressed containers, can be used for performing out-of-core computations efficiently, but most importantly transparently.

Just to whet your appetite, here it is an example with real data, where bcolz is already fulfilling the promise of accelerating memory I/O by using compression:


By using compression, you can deal with more data using the same amount of memory, which is very good on itself. But in case you are wondering about the price to pay in terms of performance, you should know that nowadays memory access is the most common bottleneck in many computational scenarios, and that CPUs spend most of its time waiting for data. Hence, having data compressed in memory can reduce the stress of the memory subsystem as well.

Furthermore, columnar means that the tabular datasets are stored column-wise order, and this turns out to offer better opportunities to improve compression ratio. This is because data tends to expose more similarity in elements that sit in the same column rather than those in the same row, so compressors generally do a much better job when data is aligned in such column-wise order. In addition, when you have to deal with tables with a large number of columns and your operations only involve some of them, a columnar-wise storage tends to be much more effective because minimizes the amount of data that travels to CPU caches.

So, the ultimate goal for bcolz is not only reducing the memory needs of large arrays/tables, but also making bcolz operations to go faster than using a traditional data container like those in NumPy or Pandas. That is actually already the case in some real-life scenarios (see the notebook above) but that will become pretty more noticeable in combination with forthcoming, faster CPUs integrating more cores and wider vector units.


  • Python >= 2.7 and >= 3.5
  • NumPy >= 1.8
  • Cython >= 0.22 (just for compiling the beast)
  • C-Blosc >= 1.8.0 (optional, as the internal Blosc will be used by default)


  • numexpr >= 2.5.2
  • dask >= 0.9.0
  • pandas
  • tables (pytables)


There are different ways to compile bcolz, depending if you want to link with an already installed Blosc library or not.

Compiling with an installed Blosc library (recommended)

Python and Blosc-powered extensions have a difficult relationship when compiled using GCC, so this is why using an external C-Blosc library is recommended for maximum performance (for details, see

Go to and download and install the C-Blosc library. Then, you can tell bcolz where is the C-Blosc library in a couple of ways:

Using an environment variable:

$ BLOSC_DIR=/usr/local     (or "set BLOSC_DIR=\blosc" on Win)
$ export BLOSC_DIR         (not needed on Win)
$ python build_ext --inplace

Using a flag:

$ python build_ext --inplace --blosc=/usr/local

Compiling without an installed Blosc library

bcolz also comes with the Blosc sources with it so, assuming that you have a C++ compiler installed, do:

$ python build_ext --inplace

That's all. You can proceed with testing section now.

Note: The requirement for the C++ compiler is just for the Snappy dependency. The rest of the other components of Blosc are pure C (including the LZ4 and Zlib libraries).


After compiling, you can quickly check that the package is sane by running:

$ PYTHONPATH=.   (or "set PYTHONPATH=." on Windows)
$ export PYTHONPATH    (not needed on Windows)
$ python -c"import bcolz; bcolz.test()"  # add `heavy=True` if desired


Install it as a typical Python package:

$ pip install -U .

Optionally Install the additional dependencies:

$ pip install .[optional]


You can find the online manual at:

but of course, you can always access docstrings from the console (i.e. help(bcolz.ctable)).

Also, you may want to look at the bench/ directory for some examples of use.


Visit the main bcolz site repository at:

Home of Blosc compressor:

User's mail list: (

An introductory talk (20 min) about bcolz at EuroPython 2014. Slides here.


Please see BCOLZ.txt in LICENSES/ directory.

Share your experience

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.

Enjoy Data!

You can’t perform that action at this time.