A Python package to manage extremely large amounts of data
Python C C++ CMake Shell Makefile Other
Latest commit 2e48ec3 Sep 24, 2016 @FrancescAlted FrancescAlted committed on GitHub Merge pull request #578 from keszybz/skip-cpuinfo-on-error
setup.py: gracefuly handle cpuinfo failure
Permalink
Failed to load latest commit information.
LICENSES Update external lib licenses Aug 14, 2015
bench pytables_backend had hard-coded paths that are used as workspace by b… Jun 26, 2016
c-blosc Updated C-Blosc sources to 1.11.1 for fixing a critical bug in 1.11.0 Sep 3, 2016
ci/appveyor Remove unused script Aug 31, 2016
contrib Formatting Sep 1, 2012
doc Remove duplicate link to 3.2.2 rel notes Sep 13, 2016
examples Updated results to use blosc:lz4 + native gcc 5.4.0 Aug 4, 2016
hdf5-blosc Updated to latest hdf5-blosc sources (C-Blosc < 1.8.0 not supported) Sep 2, 2016
src Do not shuffle when complevel == 0 Apr 22, 2016
tables Fix numpy 1.9 compatibilty Sep 7, 2016
utils initial working version of ptree, walk_nodes is a bit slow Nov 8, 2014
.gitignore Update gitignore Oct 4, 2014
.travis.yml Remove Python 2.6 and Python 3.3 from Travis Sep 6, 2016
ANNOUNCE.txt.in Point to github releases in ANNOUNCE.txt.in Sep 12, 2016
LICENSE.txt Changed the baseline for 3.2.0rc1 Apr 21, 2015
MANIFEST.in Add missing files to MANIFEST.in Jul 5, 2016
Makefile Fix the manifest template to include the rename README.rst Aug 15, 2014
README.rst Fix typo in README.rst Sep 6, 2016
RELEASE_NOTES.txt Post release actions 3.3.0 -> 3.3.1-dev0 Sep 12, 2016
THANKS fix typo Apr 19, 2013
VERSION Post release actions 3.3.0 -> 3.3.1-dev0 Sep 12, 2016
appveyor.yml nomkl package not available on windows Sep 8, 2016
cpuinfo.py Internal C-Blosc bumped to 1.8.1 Jul 2, 2016
requirements.txt Updated README and installation procedures, specially with required v… Apr 20, 2016
setup.cfg Merged in r4147 and r4148 (put in sync setup.cfg and setup.py). Jun 9, 2009
setup.py setup.py: gracefuly handle cpuinfo failure Sep 24, 2016
subtree-merge-blosc.sh Updated to the new version (0.12) of the subtree merge script for Blosc Apr 15, 2015

README.rst

PyTables: hierarchical datasets in Python

Join the chat at https://gitter.im/PyTables/PyTables https://travis-ci.org/PyTables/PyTables.svg?branch=develop https://ci.appveyor.com/api/projects/status/github/PyTables/PyTables?branch=develop&svg=true Code Climate
URL:http://www.pytables.org/

PyTables is a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data.

It is built on top of the HDF5 library and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively save and retrieve very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that they take much less space (between a factor 3 to 5, and more if the data is compressible) than other solutions, like for example, relational or object oriented databases.

State-of-the-art compression

PyTables comes with out-of-box support for the Blosc compressor. This allows for extremely high compression speed, while keeping decent compression ratios. By doing so, I/O can be accelerated by a large extent, and you may end achieving higher performance than the bandwidth provided by your I/O subsystem. See the Tuning The Chunksize section of the Optimization Tips chapter of user documentation for some benchmarks.

Not a RDBMS replacement

PyTables is not designed to work as a relational database replacement, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), or as a centralized repository for system logs, to name only a few possible uses.

Tables

A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and strict "data types" seems to be quite a strange requirement for an interpreted language like Python, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O.

Arrays

There are other useful objects like arrays, enlargeable arrays or variable length arrays that can cope with different missions on your project.

Easy to use

One of the principal objectives of PyTables is to be user-friendly. In addition, many different iterators have been implemented so as to enable the interactive work to be as productive as possible.

Platforms

We are using Linux on top of Intel32 and Intel64 boxes as the main development platforms, but PyTables should be easy to compile/install on other UNIX or Windows machines.

Compiling

To compile PyTables you will need, at least, a recent version of HDF5 (C flavor) library, the Zlib compression library and the NumPy and Numexpr packages. Besides, it comes with support for the Blosc, LZO and bzip2 compressor libraries. Blosc is mandatory, but PyTables comes with Blosc sources so, although it si recommended to have Blosc installed in your system, you don't absolutely need to install it separately. LZO and bzip2 compression libraries are, however, optional.

Installation

1. Make sure you have HDF5 version 1.8.4 or above. HDF5 1.10.x is not supported.

On OSX you can install HDF5 using Homebrew:

$ brew tap homebrew/science
$ brew install hdf5

On ubuntu:

$ sudo apt-get install libhdf5-serial-dev

If you have the HDF5 library in some non-standard location (that is, where the compiler and the linker can't find it) you can use the environment variable HDF5_DIR to specify its location. See the manual for more details.

  1. For stability (and performance too) reasons, it is strongly recommended that you install the C-Blosc library separately, although you might want PyTables to use its internal C-Blosc sources.
  1. Optionally, consider to install the LZO compression library and/or the bzip2 compression library.

  2. Install!:

    $ pip install tables
    
  3. To run the test suite run:

    $ python -m tables.tests.test_all
    

    If there is some test that does not pass, please send the complete output for tests back to us.

Enjoy data! -- The PyTables Team