FrancescAlted edited this page Jun 25, 2014 · 6 revisions

bcolz: columnar and compressed data containers

bcolz provides columnar and compressed data containers. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data.


By using compression, you can deal with more data using the same amount of memory. In case you wonder: which is the price to pay in terms of performance? you should know that nowadays memory access is the most common bottleneck in many computational scenarios, and CPUs spend most of its time waiting for data, and having data compressed in memory can reduce the stress of the memory subsystem.

In other words, the ultimate goal for bcolz is not only reducing the memory needs of large arrays, but also making bcolz objects to make operations faster than using a traditional ndarray object from NumPy. That is already the case for some special cases now (2011), but will happen more generally in a short future, when bcolz will be able to take advantage of newer CPUs integrating more cores and wider vector units (256 bit and more).

See https://github.com/Blosc/bcolz/wiki/Query-Speed-and-Compression for an example of how bcolz can perform complex queries on you datasets in an easy, yet powerful way.


bcolz is distributed under the provisions of the 3-clause BSD license. Please see BCOLZ.txt in LICENSES/ directory of the sources.