<div class="alert alert-info">Much of the material in this notebook has been drawn from the latest PyTables [official tutorials](http://www.pytables.org/usersguide/tutorials.html). This use is allowed under the terms of PyTables' [BSD 3-clause license](https://opensource.org/licenses/BSD-3-Clause). The tutorial material has been reformatted into Notebook format and changes have been made to the text in some places. So this notebook could be accurately called a fork of the PyTables tutorials.</div>

# Contents
- [Background](#Background)
- [A PyTables Glossary](#A-PyTables-Glossary)
- [Our First HDF5 File](#Our-First-HDF5-File)
- [Browsing the Data Tree](#Browsing-the-Data-Tree)
- [Multidimensional Data](#Multidimensional-Data)
- [References](#References)

# Background
[The Hierachical Data Format (HDF)](https://www.hdfgroup.org/) is designed to store and organise large amounts of data in a hierarchy of groups and datasets, along with descriptive metadata. HDF is self-describing. Metadata in the file allows applications to interpret the structure and contents of a file without reference to outside information.

## PyTables
It is hard to do better than the [official documentation](http://www.pytables.org/index.html) for a description:

> PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.

>PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data. 

## PyTables vs h5py
At present, there are two main Python libraries for working with HDF5 data: [PyTables](http://www.pytables.org/) and [h5py](http://www.h5py.org/). h5py follows the underlying HDF5 API closely, mapping the HDF5 data to numpy data structures. PyTables provides a higher-level, database-like approach to data storage, with features such as advanced indexing, fast queries, undo/redo, and enriched (compared to NumPy/h5py) data types.

For more information on PyTables vs h5py, you can read both the [PyTables](http://www.pytables.org/FAQ.html#how-does-pytables-compare-with-the-h5py-project) and [h5py](http://docs.h5py.org/en/latest/faq.html#what-s-the-difference-between-h5py-and-pytables) sides of the story. Work is in progress to unify the efforts of h5py and PyTables, although this work has not yet reached the main release. This effort will see PyTables being built on top of h5py rather than independently producing bindings directly to the HDF5 API. See:
- [PyTables: A New Backend Interface](https://github.com/PyTables/PyTables/blob/pt4/doc/New-Backend-Interface.rst)
- [Python and HDF5 - A Vision](https://www.hdfgroup.org/2015/09/python-hdf5-a-vision/)

# A PyTables Glossary
HDF5 files are organised into a hierarchical tree-like structure. Starting from the file, nodes are used to represent each item in the tree. PyTables contains a number of different node types including groups, tables, and arrays. 

**File**: The file is the basic unit of storage for HDF5 and PyTables. The entire data hierarchy is stored in a single, often large, file on disk (note I am glossing over the external link functionality which lets you create a link from a node to an external file).

The PyTables interface to the HDF5 file is provided by the [`File`](http://www.pytables.org/usersguide/libref/file_class.html) class.

**Node**: Each element in the hierarchy is represented by a node. Three features of interest are:
- root node: This sits at the top of the hierarchy. This node is always present, even in an empty file. It can be accessed by the attribute `File.root`.
- Paths: Every node in the hierarchy has a path. Similar to files in a file system, the path is formed by concatenating the names of parent nodes, separated by a `/`. Paths start with `/` to represent the root node.
- There are two main types of node: groups and leaves.
- Nodes can contain metadata.

**Groups**: PyTables uses the [`Group`](http://www.pytables.org/usersguide/libref/hierarchy_classes.html#the-group-class) class to organise the data. Instances of this class are grouping structures containing child instances of zero or more groups or leaves, together with supporting metadata. Each group has exactly one parent group.

**Leaves**: Leaf nodes sit inside a group node, but unlike a group they cannot have any further children. This is similar to files in a file system.

**Tables**: The [`Table`](http://www.pytables.org/usersguide/libref/structured_storage.html#the-table-class) class allows storage of heterogeneous tabular data in a HDF5 file. Table data consists of a unidimensional sequence of rows, where each row contains one or more fields. Fields have an associated unique name and position, with the first field having position 0. All rows have the same fields, which are arranged in columns. Tables are leaf nodes.

**Arrays**: Arrays allow the storage of multidimensional homogenous data in a HDF5 file. The main PyTables class is the [`Array`](http://www.pytables.org/usersguide/libref/homogenous_storage.html#the-array-class), although other classes are available for enabling [data compression](http://www.pytables.org/usersguide/libref/homogenous_storage.html#carrayclassdescr), [resizable arrays](http://www.pytables.org/usersguide/libref/homogenous_storage.html#the-earray-class), and [ragged arrays](http://www.pytables.org/usersguide/libref/homogenous_storage.html#the-vlarray-class). If you have previously used HDF5 or NetCDF, then PyTables arrays will be the most familiar mechanism for storing data.

# Our First HDF5 File
In this section, we will see how to define our own records in Python and save collections of them (i.e. a table) into a file. Then we will select some of the data in the table using Python cuts and create NumPy arrays to store this selection as separate objects in a tree.

First, import PyTables and Numpy:

In [None]:
import numpy as np
import tables as pt

## Define the Data
Now, imagine that we have a particle detector and we want to create a table object in order to save data retrieved from it. You need first to define the table, the number of columns it has, what kind of object is contained in each column, and so on.

Our particle detector has a TDC (Time to Digital Converter) counter with a dynamic range of 8 bits and an ADC (Analogical to Digital Converter) with a range of 16 bits. For these values, we will define 2 fields in our record object called `TDCcount` and `ADCcount`. We also want to save the grid position in which the particle has been detected, so we will add two new fields called `grid_i` and `grid_j`. Our instrumentation also can obtain the pressure and energy of the particle. The resolution of the pressure-gauge allows us to use a single-precision float to store pressure readings, while the energy value will need a double-precision float. Finally, to track the particle we want to assign it a name to identify the kind of the particle it is and a unique numeric identifier. So we will add two more fields: name will be a string of up to 16 characters, and idnumber will be an integer of 64 bits (to allow us to store records for extremely large numbers of particles).

Having determined our columns and their types, we can now declare a new Particle class that will contain all this information:

In [None]:
class Particle(pt.IsDescription):
    name      = pt.StringCol(16)   # 16-character String
    idnumber  = pt.Int64Col()      # Signed 64-bit integer
    ADCcount  = pt.UInt16Col()     # Unsigned short integer
    TDCcount  = pt.UInt8Col()      # unsigned byte
    grid_i    = pt.Int32Col()      # 32-bit integer
    grid_j    = pt.Int32Col()      # 32-bit integer
    pressure  = pt.Float32Col()    # float  (single-precision)
    energy    = pt.Float64Col()    # double (double-precision)

We declare a class variable for each field, assigning an instance of the appropriate Col subclass, according to the required column attributes (the data type, the length, the shape, etc). See the [The Col class and its descendants](http://www.pytables.org/usersguide/libref/declarative_classes.html#colclassdescr) for a complete description of these subclasses. See also [Supported data types in PyTables](http://www.pytables.org/usersguide/datatypes.html#datatypes) for a list of data types supported by the Col constructor.

From now on, we can use Particle instances as a descriptor for our detector data table. We will see later on how to pass this object to construct the table. But first, we must create a file where all the actual data pushed into our table will be saved.

## Creating a PyTables File
Use the top-level `open_file()` function to create a PyTables file:

In [None]:
?pt.open_file

In [None]:
h5file = pt.open_file(
    filename='tutorial1.h5',  # File name (will be created)
    mode='w',  # Create in write mode
    title='Test file')  # Our first metadata - a descriptive title for the file

In [None]:
h5file.root

This function attempts to open the file, and if successful, returns the `File` (see [The File Class](http://www.pytables.org/usersguide/libref/file_class.html#fileclassdescr)) object instance `h5file`. The root of the object tree is specified in the instance's root attribute.

## Creating a new group
Now, to better organize our data, we will create a group called detector that branches from the root node. We will save our particle data table in this group:

In [None]:
?pt.File.create_group

In [None]:
group = h5file.create_group(
    where='/',
    name='detector',
    title='Detector information')

Here, we have taken the `File` instance h5file and invoked its `File.create_group()` method to create a new group called `detector` branching from "/" (another way to refer to the h5file.root object we mentioned above). This will create a new `Group` (see [The Group class](http://www.pytables.org/usersguide/libref/hierarchy_classes.html#groupclassdescr)) object instance that will be assigned to the variable group.

## Creating a New Table
Let’s now create a `Table` (see [The Table class](http://www.pytables.org/usersguide/libref/structured_storage.html#tableclassdescr)) object as a branch off the newly-created group. We do that by calling the `File.create_table()` method of the h5file object:

In [None]:
?pt.File.create_table

In [None]:
table = h5file.create_table(where=group, name='readout', description=Particle, title='Readout example')

Right, so now we have created a table based on the `Particle` class, under the 'detector' group. We can examine the file structure by printing the `File` variable:

In [None]:
print(h5file)

More information, including the column datatypes for each table can also be displayed:

In [None]:
display(h5file)

We can also print just the table:

In [None]:
print(table)

What do you think that `Table(0,)` means? 

**Hint:** try printing the table again after adding some data.

## Adding Data to the Table
We can now start adding data to the readout table. First we get a reference to the [`Row`](http://www.pytables.org/usersguide/libref/structured_storage.html#rowclassdescr) handle for the table. Data for each column and row can then be written to the `Row` object as though it was a dictionary, with keys corresponding to the column names.

In [None]:
particle = table.row
particle

Note that the `Row` instance keeps track of the current row. Calling `Row.append()` saves the current data and moves the internal reference to a new row.

In [None]:
for i in range(10):
    particle['name']  = 'Particle: %6d' % (i)
    particle['TDCcount'] = i % 256
    particle['ADCcount'] = (i * 256) % (1 << 16)
    particle['grid_i'] = i
    particle['grid_j'] = 10 - i
    particle['pressure'] = float(i*i)
    particle['energy'] = float(particle['pressure'] ** 4)
    particle['idnumber'] = i * (2 ** 34)
    # Insert a new particle record
    particle.append()

After we have processed all our data, we should flush the table’s I/O buffer if we want to write all this data to disk. We achieve that by calling the `table.flush()` method:

In [None]:
table.flush()
table.close()
h5file.close()

Remember, flushing a table is a very important step as it will not only help to maintain the integrity of your file, but also will free valuable memory resources (i.e. internal buffers) that your program may need for other things.

## Reading and Selecting Data in a Table
Let's have a look at our current working directory:

In [None]:
!ls

`tutorial1.h5` contains the table of particle detector readouts. So let's open it in `r+`-mode (read-write mode, but the file must already exist) and read some data:

In [None]:
h5file = pt.open_file(
    filename='tutorial1.h5',
    mode='r+')

If you need to check the file structure, just print it:

In [None]:
print(h5file)

There is the readout table containing the particle detector readings that we need. Let's read it:

In [None]:
table = h5file.root.detector.readout  # create an alias to the readout table
print(table)

The `Table` class supports iteration by rows:

In [None]:
for row in table:
    print(row)

That's not really what we expected. Recall that the `Row` class is a PyTables class, not the `Particle` class we defined earlier. To access the row data, you can call the `Row.fetch_all_fields()` method, which returns a tuple of all field data as Numpy scalar types. You can also use slice syntax, but in this case the fields are returned as native Python types. See the [Row documentation](http://www.pytables.org/usersguide/libref/structured_storage.html#rowclassdescr) for details.

In [None]:
for row in table:
    print(row.fetch_all_fields())

Fields can also be retrieved by name:

In [None]:
for row in table:
    print(row['TDCcount'], ':', row['pressure'])

If you need to check the column names, you can either refer to the column definition class (`Particle`), or query the table `colnames` attribute:

In [None]:
table.colnames

Very often though, you don't want to retrieve an entire dataset. PyTables provides efficient ways to query and filter the data. Let's select all pressure values for observations where $TDCcount>3$ and $20 <= pressure < 50$:

In [None]:
pressure = [x['pressure'] for x in table if x['TDCcount'] > 3 and 20 <= x['pressure'] < 50]
pressure

List comprehensions work well for small data sets, but PyTables provides additional search functionality that are more appropriate for large tables or where query speed is critical. They are called *in-kernel* and *indexed* queries, and you can use them through `Table.where()` and other related methods.

Let’s repeat the pressure query with an in-kernel method:

In [None]:
pressure = [x['pressure'] for x in table.where('(TDCcount > 3) & (pressure >= 20) & (pressure < 50)')]
pressure

Note that this functionality is built on top of the NumExpr library. If you have used this library before (directly or perhaps through `pandas.eval`), you will recall that the syntax is not exactly the same as pure Python. Notice the use of `&` instead of `and`, as well as the parentheses around each term. 

 See [Condition Syntax](http://www.pytables.org/usersguide/condition_syntax.html#condition-syntax) and [Accelerating your searches](http://www.pytables.org/usersguide/optimization.html#searchoptim) for more information on in-kernel and indexed selections.

### Strings Require Special Care
<div class="alert alert-danger">
Recall that in the `Particle` definition, we defined `name` as a 16 character string (`StringCol(16)`)? PyTables stores this as a byte-array and not a string. Among other things, this means that queries on string columns need special handling.
</div>

First, let's check the column types through the table object:

In [None]:
table.coltypes

It says that `name` is a `string`. Let's retrieve the first item from the name column and examine the actual type:

In [None]:
display(table.cols.name[0])
display(type(table.cols.name[0]))

Let's see how this affects a query:

In [None]:
for row in table.where('(name == "Particle:      5") | (name == "Particle:      7")'):
    print(row['name'])

In Python 2 the previous query will work, but in Python 3 it fails since the a unicode literal cannot be compared to a numpy byte array. To build a string query that works on all Python versions, you need to specify byte array literals:

In [None]:
for row in table.where('(name == b"Particle:      5") | (name == b"Particle:      7")'):
    print(row['name'])

## Adding New Data to a Table
In order to separate the selected data from the mass of detector data, we will create a new group columns branching off the root group. Afterwards, under this group, we will create two arrays that will contain the selected pressure and name data.

Note that the new arrays are not a dynamic query. If the main particle data changes later, the new data will not automatically reflect the changes.

First, we create the group:

In [None]:
gcolumns = h5file.create_group(
    where=h5file.root,  # Note that we give a reference to the parent group, instead of the string path "/"
    name="columns",
    title="Pressure and Name")

Now, create the pressure array using the `File.create_array()` method, converting the list of pressure results to a numpy array:

In [None]:
h5file.create_array(
    where=gcolumns,
    name='pressure',
    obj=np.array(pressure),  # The data to be saved into the array, converted from list to numpy array
    title="Pressure column selection")

Now the array for names. In this case we store the Python list as-is:

In [None]:
names = [x['name'] for x in table.where('(TDCcount > 3) & (pressure >= 20) & (pressure < 50)')]
h5file.create_array(
    where=gcolumns,
    name='name',
    obj=names,
    title="Name column selection")

As you can see, `File.create_array()` accepts names (which is a regular Python list) as the `obj` parameter. Actually, it accepts a variety of different regular objects as parameters. The flavor attribute (see the output above) saves the original object type so that PyTables will be able to retrieve exactly the same object from disk later on.

Now lets examine the current file structure to confirm that our new arrays are there:

In [None]:
print(h5file)

## Closing the File and Examining the Contents
First, let's close the file (closing also flushes the file to disk first):

In [None]:
h5file.close()

You have now created your first PyTables file with a table and two arrays. You can examine it with any generic HDF5 tool, such as [h5dump](https://support.hdfgroup.org/HDF5/Tutor/cmdtoolview.html#dh5dump) or [h5ls](https://support.hdfgroup.org/HDF5/Tutor/cmdtoolview.html#h5ls). Here is what `tutorial1.h5` looks like when read with the h5ls program:

In [None]:
!h5ls -r tutorial1.h5

You can also use the PyTables command-line utility `ptdump`. 

In [None]:
!ptdump tutorial1.h5

`ptdump` understands the PyTables metadata in the file, and gives more information about how the data will appear to PyTables compared to generic utilities.

Note that h5ls described both '/columns/name' and '/detector/readout' as datasets, while ptdump understands that one is a PyTables array and the other is a table.

# Browsing the Data Tree
In this section we will learn how to browse the HDF5 data tree, as well as reading and writing data and metadata.
## Traversing the Tree
Let's start by opening the file from the last section:

In [None]:
h5file = pt.open_file('tutorial1.h5', 'a')

You can get a preliminary overview of the object tree by simply printing the existing `File` instance:

In [None]:
print(h5file)

Now let’s make use of the `File` iterator to see how to list all the nodes in the object tree:

In [None]:
for node in h5file:
    print(node)

That shows all nodes (`RootGroup`, `Group`, `Array`, and `Table` in this case), and is equivalent to calling `File.walk_nodes()`.

There are two basic methods for examining the tree structure. `File.walk_nodes()` performs an in-order recursive traversal of the tree. `File.iter_nodes()` performs a non-recursive traversal of a single node. Both methods accept an optional argument `where` indicating the starting node, and an optional argument `classname` indicating the specific node types to return.

Let's look at a few uses.

All arrays:

In [None]:
for a in h5file.walk_nodes(classname='Array'):
    print(a)

All tables:

In [None]:
for t in h5file.walk_nodes(classname='Table'):
    print(t)

Just the children of the "Columns" group:

In [None]:
for n in h5file.iter_nodes(where='/columns'):
    print(n)

And finally, all the [leaf](http://www.pytables.org/usersguide/libref/hierarchy_classes.html#leafclassdescr) nodes in the detector group:

In [None]:
for n in h5file.walk_nodes(h5file.root.detector, 'Leaf'):
    print(n)

If you are wondering, a leaf node is simply any node that does not (and often cannot) have children.

## Working with Metadata
PyTables provides an easy and concise way to complement the meaning of your node objects on the tree by using the `AttributeSet` class (see [The AttributeSet class](http://www.pytables.org/usersguide/libref/declarative_classes.html#attributesetclassdescr)). You can access this object through the standard attribute `attrs` in `Leaf` nodes and `_v_attrs` in `Group` nodes.

For example, let’s imagine that we want to save the date indicating when the data in the */detector/readout* table has been acquired, as well as the temperature during the gathering process:

In [None]:
table = h5file.root.detector.readout
table.attrs.gath_date = "Wed, 06/12/2003 18:33"
table.attrs.temperature = 18.4
table.attrs.temp_scale = "Celsius"

Retrieving specific attribute values is simple:

In [None]:
table.attrs.temp_scale

Deleting attributes is also simple:

In [None]:
del table.attrs.gath_date

We can also examine the current set of attributes through `Table.attrs`:

In [None]:
display(table.attrs)

However, that displays all attributes including the PyTables system attributes. You get more fine-grained control with the `AttributeSet._f_list()` method:

In [None]:
table.attrs._f_list()

In [None]:
table.attrs._f_list('all')

In [None]:
table.attrs._f_list('user')

In [None]:
table.attrs._f_list('sys')

Iterating attributes while retrieving their name and value is slightly less elegant, but possible. You iterate by name as shown above, and then lookup the value by name on the attribute set:

In [None]:
for name in table.attrs._f_list():
    print("{0}: {1}".format(name, table.attrs[name]))

You can also rename attributes:

In [None]:
table.attrs._f_rename('temp_scale', 'tempScale')
table.attrs

If we flush the file now, we can check with an external tool to see that the new metadata is stored in the file:

In [None]:
h5file.flush()
!h5ls -v tutorial1.h5/detector/readout

## Getting `object` Metadata
Each object in PyTables has metadata information about the data in the file. Normally this is accessible through the node instance variables. Let's take a look at some examples:

In [None]:
print("Object:", table)
print("Table name:", table.name)
print("Table title:", table.title)
print("Number of rows in table:", table.nrows)
print("Table variable names with their type and shape:")
for name in table.colnames:
    print(name, ':= %s, %s' % (table.coldtypes[name], table.coldtypes[name].shape))

Note that `table.coldtypes` is a dictionary mapping the name of each column to the corresponding Numpy `dtype`.

Now, let's retrieve the /columns/pressure Array object and look at the metadata:

In [None]:
pressureObject = h5file.get_node("/columns/pressure")
pressureObject

In [None]:
print("shape: ==>", pressureObject.shape)
print("title: ==>", pressureObject.title)
print("atom:  ==>", pressureObject.atom)
print("dtype:  ==>", pressureObject.dtype)

## Reading Data from an Array
You can use the `Array.read()` method to retrieve the data:

In [None]:
pressureArray = pressureObject.read()
display(pressureArray)
type(pressureArray)

Note that `read()` returned a numpy array. Recall that PyTables stores type information for each node in the system attribute `FLAVOR`. It uses this metadata to automatically return the same data type that was stored. For example, recall that we stored */columns/name* as a Python `list`:

In [None]:
type(h5file.get_node("/columns/name").read())

## Appending Data to an Existing Table
Adding new rows to a table is done in the same way as the initial table creation. First find the table node, then get the row iterator, append the data and finally flush the table.

Now let's append some new rows to the readout table:

In [None]:
table = h5file.root.detector.readout
particle = table.row
for i in range(10, 15):
    particle['name']  = 'Particle: %6d' % (i)
    particle['TDCcount'] = i % 256
    particle['ADCcount'] = (i * 256) % (1 << 16)
    particle['grid_i'] = i
    particle['grid_j'] = 10 - i
    particle['pressure'] = float(i*i)
    particle['energy'] = float(particle['pressure'] ** 4)
    particle['idnumber'] = i * (2 ** 34)
    particle.append()
table.flush()

For this to work, the file must have been opened in one of the append modes.

In [None]:
for row in table:
    print(row.fetch_all_fields())

## Modifying Existing Table Data
We will start modifying single cells in the first row of the Particle table by indexing into the corresponding columns:

In [None]:
print("Before modif-->", table[0])
table.cols.TDCcount[0] = 1
print("After modifying first row of ADCcount-->", table[0])
table.cols.energy[0] = 2
print("After modifying first row of energy-->", table[0])

We can modify complete ranges of columns as well. Note that PyTables slicing notation generally follows the numpy convention of `object[start:stop:step]`.

In [None]:
table.cols.TDCcount[2:5] = [2,3,4]
print("After modifying slice [2:5] of TDCcount-->")
print(table[0:5])

In [None]:
table.cols.energy[1:9:3] = [2,3,4]
print("After modifying slice [1:9:3] of energy-->")
print(table[0:9])

Finally, there is a way to modify table data using the `Row` accessor that we have used for appending rows. This can be combined with table queries:

In [None]:
print(table.cols.energy[0:4])

for row in table.where('TDCcount <= 2'):
    row['energy'] = row['TDCcount'] * 2
    row.update()
    
print(table.cols.energy[0:4])

## Deleting Table Rows
Use the `Table.remove_rows()` method. It deletes rows in the semi-closed range [start, stop) (start index is included, stop index is not). For example, delete rows 5 to 9:

In [None]:
table.remove_rows(5, 10)

`remove_rows()` returns the number of removed rows.

Single rows can also be removed with `Table.remove_row()`:

In [None]:
table.remove_row(0)

## Modifying Existing Array Data
Let’s see at how modify data on the pressureObject array:

In [None]:
pressureObject = h5file.root.columns.pressure
print("Before modif-->", pressureObject[:])

In [None]:
pressureObject[0] = 2
print("First modif-->", pressureObject[:])

In [None]:
pressureObject[1:3] = [2.1, 3.5]
print("Second modif-->", pressureObject[:])

In [None]:
pressureObject[::2] = [1,2]
print("Third modif-->", pressureObject[:])

In general, you can use any combination of (multidimensional) extended slicing.

With the sole exception that you cannot use negative values for step to refer to indexes that you want to modify. See [`Array.__getitem__()`](http://www.pytables.org/usersguide/libref/homogenous_storage.html#tables.Array.__getitem__) for more examples on how to use extended slicing in PyTables objects.

This section is now complete, so close the file:

In [None]:
h5file.close()

# Multidimensional Data
Now it’s time for a more real-life example (i.e. with errors in the code). We will create two groups that branch directly from the root node, Particles and Events. Then, we will put three tables in each group. In Particles we will put tables based on the `Particle` descriptor and in Events, the tables based on the `Event` descriptor.

Afterwards, we will provision the tables with a number of records. Finally, we will read the newly-created table /Events/TEvent3 and select some values from it, using a list comprehension.

We also introduce a new manner to describe a Table as a structured NumPy dtype (or even as a dictionary), as you can see in the Event description. See [`File.create_table()`](http://www.pytables.org/usersguide/libref/file_class.html#tables.File.create_table) about the different kinds of descriptor objects that can be passed to this method.

This section uses a different `Particle` definition to the earlier sections, so let's define it first:

In [None]:
class Particle(pt.IsDescription):
    name        = pt.StringCol(itemsize=16)  # 16-character string
    lati        = pt.Int32Col()              # integer
    longi       = pt.Int32Col()              # integer
    pressure    = pt.Float32Col(shape=(2,3)) # array of floats (single-precision)
    temperature = pt.Float64Col(shape=(2,3)) # array of doubles (double-precision)

Now define the `Event` table. We could do this in the same manner as `Particle`, but here we demonstrate the use of a Numpy `dtype` structure:

In [None]:
Event = np.dtype([
    ("name"     , "S16"),
    ("TDCcount" , np.uint8),
    ("ADCcount" , np.uint16),
    ("xcoord"   , np.float32),
    ("ycoord"   , np.float32)
    ])

Open a new file in "w"rite mode:

In [None]:
fileh = pt.open_file("tutorial2.h5", mode = "w")

Get the HDF5 root group:

In [None]:
root = fileh.root

Create the groups:

In [None]:
for groupname in ("Particles", "Events"):
    group = fileh.create_group(root, groupname)

<div class="alert alert-danger">**Note:** The following two code cells contain deliberate errors. Please experiment with changing the code to explore some of the sanity checking that PyTables performs.</div>

Now, create and fill the tables in Particles group:

In [None]:
# Create 3 new tables
for tablename in ("TParticle1", "TParticle2", "TParticle3"):
    # Create a table, or retrieve it if the table already exists (so this code cell can be executed multiple times)
    try:
        table = fileh.create_table("/Particles", tablename, Particle, "Particles: " + tablename)
    except pt.NodeError:
        table = fileh.get_node(root.Particles, tablename)

    # Get the record object associated with the table:
    particle = table.row

    # Fill the table with 257 particles
    for i in range(257):
        # First, assign the values to the Particle record
        particle['name'] = 'Particle: %6d' % (i)
        particle['lati'] = i
        particle['longi'] = 10 - i

        ########### Detectable errors start here. Play with them!
        #particle['pressure'] = np.array(i*np.arange(2*3)).reshape((2,4))  # Incorrect
        particle['pressure'] = np.array(i*np.arange(2*3)).reshape((2,3)) # Correct
        ########### End of errors

        particle['temperature'] = (i**2)     # Broadcasting

        # This injects the Record values
        particle.append()

    # Flush the table buffers
    table.flush()

Now, the Events group:

In [None]:
for tablename in ("TEvent1", "TEvent2", "TEvent3"):
    # Create or retrieve the table
    try:
        table = fileh.create_table(root.Events, tablename, Event, "Events: " + tablename)
    except pt.NodeError:
        table = fileh.get_node(root.Events, tablename)

    # Get the record object associated with the table:
    event = table.row

    # Fill the table with 257 events
    for i in range(257):
        # First, assign the values to the Event record
        event['name']  = 'Event: %6d' % (i)
        event['TDCcount'] = i % (1<<8)   # Correct range

        ########### Detectable errors start here. Play with them!
        #event['xcoor'] = float(i**2)     # Wrong spelling
        event['xcoord'] = float(i**2)   # Correct spelling
        #event['ADCcount'] = "sss"        # Wrong type
        event['ADCcount'] = i * 2       # Correct type
        ########### End of errors

        event['ycoord'] = float(i)**4

        # This injects the Record values
        event.append()

    # Flush the buffers
    table.flush()

## Shape Checking

One of the preceeding errors looked like this:

```python
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-70395748746d> in <module>()
     18 
     19         ########### Detectable errors start here. Play with them!
---> 20         particle['pressure'] = np.array(i*np.arange(2*3)).reshape((2,4))  # Incorrect
     21         #particle['pressure'] = np.array(i*np.arange(2*3)).reshape((2,3)) # Correct
     22         ########### End of errors

ValueError: cannot reshape array of size 6 into shape (2,4)
```

This error indicates that you are trying to assign an array with an incompatible shape to a table cell. Looking at the code, we see that we were trying to assign an array of shape (2,4) to a pressure element, which was defined with the shape (2,3).

In general, these kinds of operations are forbidden, with one valid exception: when you assign a scalar value to a multidimensional column cell, all the cell elements are populated with the value of the scalar. For example:

```python
particle['temperature'] = (i**2)    # Broadcasting
```

The value `i**2` is assigned to all the elements of the temperature table cell. This capability is provided by the NumPy package and is known as broadcasting.

## Field Name Checking
Another error was the `KeyError`:
```python
KeyError: 'no such column: xcoor'
```
This error indicates that we are attempting to assign a value to a non-existent field in the event table object. By looking carefully at the Event class attributes, we see that we misspelled the xcoord field (we wrote xcoor instead). This is unusual behavior for Python, as normally when you assign a value to a non-existent instance variable, Python creates a new variable with that name. Such a feature can be dangerous when dealing with an object that contains a fixed list of field names. PyTables checks that the field exists and raises a `KeyError` if the check fails.

## Data Type Checking
Finally, the last issue was a `TypeError` exception:
```python
TypeError: invalid type (<class 'str'>) for column ``ADCcount``
```

This is because we defined the `ADCcount` column as type `np.uint16`, so assigning a string value is invalid.

In [None]:
root.Events.TEvent1.coldtypes['ADCcount']

## Wrapping Up
Assuming you fixed the errors, let's close the file and examine the structure:

In [None]:
fileh.close()

In [None]:
!ptdump -v tutorial2.h5

# Where to Now?
PyTables is a large library, with many advanced features that were not covered in this notebook. Good next steps are:
- The [Using PyTables for Larger-Than-RAM Data Processing](https://kastnerkyle.github.io/posts/using-pytables-for-larger-than-ram-data-processing)
blog post. I have placed a copy of the source notebook for this blog [here](/Additional%20Notebooks/Using%20PyTables%20for%20Larger-Than-RAM%20Data%20Processing/Using%20PyTables%20for%20Larger-Than-RAM%20Data%20Processing.ipynb), with some edits to adapt it to more recent PyTables versions (if you download and run the notebook from the online source, it will give a lot of deprecated function warnings). This blog gives a good introduction to working with large data sets, including the use of the chunking and compression features of HDF5.
- The official [PyTables Tutorial](http://www.pytables.org/usersguide/tutorials.html), particularly [the second half](http://www.pytables.org/usersguide/tutorials.html#using-links-for-more-convenient-access-to-nodes) which has not been covered here.
- [The PyTables Documentation](http://www.pytables.org/index.html).

# References
- [PyTables](http://www.pytables.org)
- [h5py](http://www.h5py.org)
- [HDF5](https://www.hdfgroup.org/)