# Basic Datatypes in HDF5

> Objectives:
> * How to create (and read) HDF5 files with datasets of homogeneous and heterogenous datatypes
> * See how h5py and PyTables achieves the same thing with their own APIs
> * Be introduced to the `IsDescription` class in PyTables for declaring tables (instead of NumPy dtypes)

In [None]:
import numpy as np

## Homogeneous datatypes

In [None]:
arr_to_store = np.arange(10, dtype=np.int8)

### Using h5py

In [None]:
import h5py

In [None]:
f = h5py.File("homogeneous1.h5", "w")

In [None]:
f.create_dataset(data=arr_to_store, name="mydata")

In [None]:
f['/mydata'][:]    # data can be accessed in a NumPy-like interface

In [None]:
f

In [None]:
f.close()

In [None]:
!h5ls -rv homogeneous1.h5

In [None]:
%ls -l homogeneous1.h5

### Using PyTables

In [None]:
import tables

In [None]:
f2 = tables.open_file("homogeneous2.h5", "w")

In [None]:
f2.create_array(f2.root, name="mydata", obj=arr_to_store)

In [None]:
f2.root.mydata[:]  # data can be accessed in a NumPy-like interface

In [None]:
f2

In [None]:
f2.close()

In [None]:
%h5ls -v homogeneous2.h5/mydata

In [None]:
%ls -l homogeneous*.h5

## Compound Datatypes

In [None]:
dtype = np.dtype([("myfield1", np.int32), ("myfield2", np.float64), ("myfield3", "S4")])
table_to_store = np.fromiter(((i, i**2, "%s"%i**3) for i in range(10)), dtype=dtype)

In [None]:
table_to_store

### Using h5py

In [None]:
f = h5py.File("compound1.h5", "w")

In [None]:
f.create_dataset(data=table_to_store, name="mydata")

In [None]:
f['mydata'].dtype

In [None]:
f['mydata'][:]

In [None]:
f.close()

In [None]:
!h5ls -v compound1.h5

### Using PyTables (simple way)

In [None]:
f2 = tables.open_file("compound2.h5", "w")

In [None]:
f2.create_table(f2.root, name="mydata", obj=table_to_store)

In [None]:
f2.close()

### Using PyTables (description way)

In [None]:
class MyTable(tables.IsDescription):
    myfield1 = tables.Int32Col()
    myfield2 = tables.Float64Col()
    myfield3 = tables.StringCol(itemsize=4)

In [None]:
f3 = tables.open_file("compound3.h5", "w")

In [None]:
t = f3.create_table(f3.root, "mydata", MyTable)

In [None]:
t.append(table_to_store)

In [None]:
f3.close()

In [None]:
!h5ls -v compound2.h5/mydata

In [None]:
%ls -lh compound*.h5

Hmm, it seems like PyTables files are larger than h5py ones, why?  Let's introspect a bit into the files:

In [None]:
!h5ls compound1.h5

In [None]:
!h5ls compound2.h5

We see that the dimensionality of the table created with PyTables is `{10/Inf}`, indicating that the dataset is chunked, whereas the one created with h5py is just `{10}`, which means that it is not using chunking.  As chunked datasets take more space than non-chunked ones, this is why PyTables are larger.

The reason why PyTables tables are chunked by default is that they can be enlarged and compressed, and chunking is required in order to allow that.  More on chunking later.