# H5Py Primer
---
H5Py is a Python library for interacting with HDF5 files. Below are the most commonly used features.

Table of Contents:
1. [Importing h5py](#section1)
2. [Working with files](#section2)
3. [Working with groups](#section3)
4. [Working with datasets](#section4)
5. [Working with attributes](#section5)

REFERENCES:
- [1] Johansson, *Numerical Python: A Practical Techniques Approach for Industry*

## 1. Importing h5py <a id='section1'></a>
Before using the various commands from the h5py module, you first have to load it.

In [138]:
# import the library
import h5py
import numpy as np #numpy will also be needed for some of the examples below

## 2. Working with files <a id='section2'></a>
This sections shows examples on how to create new hdf5 files as well as open and read existing hdf5 files. 

In [88]:
# create a new file in write mode
f = h5py.File('testfile.hdf5', mode='w')

In [89]:
# check the current mode of a file
# NOTE: once a file is created/opened, its mode is either read-only ('r') or read-write ('r+'). 
f.mode

'r+'

In [90]:
# flush buffer
f.flush()

In [91]:
# close a file
f.close()

In [92]:
# open an existing file in read-only mode (file must exist)
f = h5py.File('testfile.hdf5', mode='r')
f.mode

'r'

In [93]:
# open an existing file in read-write mode (file must exist)
f.flush()
f.close()
f = h5py.File('testfile.hdf5', mode='r+')
f.mode

'r+'

In [94]:
# try to create a new file, fail if file already exists
f2 = h5py.File('testfile2.hdf5', mode='x')
f2.mode

'r+'

In [96]:
# open an existing file in read-write mode, create file if it doesn't already exist
f.flush()
f.close()
f = h5py.File('testfile.hdf5', mode='a')
f.mode

'r+'

## 3. Working with groups <a id='section3'></a>
This section illustrates how to create and explore groups in an hdf5 file.

In [174]:
# first make sure the file is openned in read-write mode
f = h5py.File('testfile.hdf5', mode='a')

In [175]:
# read the name of the root group
f.name

'/'

In [176]:
# create a new subgroup
grp1 = f.create_group("experiment1")
grp1.name

'/experiment1'

In [177]:
# create a group hierarchy (automatically creating parent groups if they don't already exsist)
grp2_s1 = f.create_group("experiment2/simulation1")
grp2_s2 = f.create_group("experiment2/simulation2")
grp2_s1.name

'/experiment2/simulation1'

In [178]:
# dictionary-like lookup for a group
f['/'], f['/experiment1'], f['/experiment2']

(<HDF5 group "/" (2 members)>,
 <HDF5 group "/experiment1" (0 members)>,
 <HDF5 group "/experiment2" (2 members)>)

In [179]:
# dictionary-like lookup for a subgroup
exp2 = f['/experiment2']
exp2['simulation1']

<HDF5 group "/experiment2/simulation1" (0 members)>

In [180]:
# get names of subgroups within a group
list(f.keys())

['experiment1', 'experiment2']

In [181]:
# get (name, value) tuples for each item in a group
list(f.items())

[('experiment1', <HDF5 group "/experiment1" (0 members)>),
 ('experiment2', <HDF5 group "/experiment2" (2 members)>)]

In [182]:
# traverse the hierarchy of groups in a file
f.visit(lambda x: print(x))

experiment1
experiment2
experiment2/simulation1
experiment2/simulation2


In [183]:
# traverse the hierarchy of (name, item) tuples in a file
f.visititems(lambda name, item: print(name, item))

experiment1 <HDF5 group "/experiment1" (0 members)>
experiment2 <HDF5 group "/experiment2" (2 members)>
experiment2/simulation1 <HDF5 group "/experiment2/simulation1" (0 members)>
experiment2/simulation2 <HDF5 group "/experiment2/simulation2" (0 members)>


In [184]:
# test group membership with the 'in' operator
print('simulation1' in f)
print('simulation1' in f['experiment1'])
print('simulation1' in f['experiment2'])

False
False
True


In [185]:
# use external hdf5 utilities to explore a file (provided by the p[ackage hdf5-tools)
f.flush()
f.close()
!h5ls -r testfile.hdf5

/                        Group
/experiment1             Group
/experiment2             Group
/experiment2/simulation1 Group
/experiment2/simulation2 Group


## 4. Working with datasets <a id='section4'></a>
This section illustrates how to create, retreive, modify, and delete datasets in an hdf5 file.

In [186]:
# first make sure the file is opened in read-write mode
f = h5py.File('testfile.hdf5', mode='a')

In [187]:
# create a dataset by directly assigning an array to an item
data1 = np.random.randn(100, 100)
f['/experiment1/simulation1/data1'] = data1
f.visititems(lambda name, item: print(name, item))

experiment1 <HDF5 group "/experiment1" (1 members)>
experiment1/simulation1 <HDF5 group "/experiment1/simulation1" (1 members)>
experiment1/simulation1/data1 <HDF5 dataset "data1": shape (100, 100), type "<f8">
experiment2 <HDF5 group "/experiment2" (2 members)>
experiment2/simulation1 <HDF5 group "/experiment2/simulation1" (0 members)>
experiment2/simulation2 <HDF5 group "/experiment2/simulation2" (0 members)>


In [188]:
# create a dataset using the create_dataset method
f.create_dataset('experiment1/simulation1/data2', data=np.random.randn(100, 100))
f.visititems(lambda name, item: print(name, item))

experiment1 <HDF5 group "/experiment1" (1 members)>
experiment1/simulation1 <HDF5 group "/experiment1/simulation1" (2 members)>
experiment1/simulation1/data1 <HDF5 dataset "data1": shape (100, 100), type "<f8">
experiment1/simulation1/data2 <HDF5 dataset "data2": shape (100, 100), type "<f8">
experiment2 <HDF5 group "/experiment2" (2 members)>
experiment2/simulation1 <HDF5 group "/experiment2/simulation1" (0 members)>
experiment2/simulation2 <HDF5 group "/experiment2/simulation2" (0 members)>


In [190]:
# create an empty dataset using the create_dataset method
f.create_dataset('experiment1/simulation1/data3', shape=(100, 100), fillvalue=-1, dtype='float64')
f.visititems(lambda name, item: print(name, item))

experiment1 <HDF5 group "/experiment1" (1 members)>
experiment1/simulation1 <HDF5 group "/experiment1/simulation1" (3 members)>
experiment1/simulation1/data1 <HDF5 dataset "data1": shape (100, 100), type "<f8">
experiment1/simulation1/data2 <HDF5 dataset "data2": shape (100, 100), type "<f8">
experiment1/simulation1/data3 <HDF5 dataset "data3": shape (100, 100), type "<f8">
experiment2 <HDF5 group "/experiment2" (2 members)>
experiment2/simulation1 <HDF5 group "/experiment2/simulation1" (0 members)>
experiment2/simulation2 <HDF5 group "/experiment2/simulation2" (0 members)>


In [191]:
# retreive a dataset by dictionary-like assignment
dset = f['/experiment1/simulation1/data1']
dset

<HDF5 dataset "data1": shape (100, 100), type "<f8">

In [192]:
# get dataset attributes
print('dataset name is', dset.name)
print('dataset type is', dset.dtype)
print('dataset shape is', dset.shape)
print('dataset length is', dset.len())

dataset name is /experiment1/simulation1/data1
dataset type is float64
dataset shape is (100, 100)
dataset length is 100


In [163]:
# get data from dataset as a numpy array
npdset = dset[...]
print(type(npdset))
npdset

<class 'numpy.ndarray'>


array([[-1.55122402, -0.25014858, -0.53197104, ...,  0.86986041,
         0.52466628,  1.97577642],
       [-1.3420808 ,  1.24863887,  0.51486834, ...,  2.15678855,
         0.45520741,  1.24565881],
       [ 0.25870432,  1.10026541, -0.24941832, ...,  0.26728039,
        -0.99732445,  0.848655  ],
       ...,
       [-0.14897499,  0.02556633, -0.52910275, ...,  1.13520198,
        -1.00031886, -0.27633616],
       [-0.30683423,  0.4422905 , -0.59246   , ...,  1.03244417,
         0.13213584, -1.38594066],
       [ 0.00835378,  0.17420096, -0.10452702, ...,  0.89675   ,
        -0.42731051,  0.47879896]])

In [167]:
# get part of the data from a dataset using numpy-like slicing
# NOTE: the slicing is done within the HDF5 library, not NumPy, which means the entire 
# dataset is not read into memory!!!
first_col = dset[:,0]
first_col.shape

(100,)

In [196]:
# change/fill dataset using numpy-like assignments
dset[:, 0] = np.arange(100)
dset[:, 0]

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38.,
       39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51.,
       52., 53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64.,
       65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77.,
       78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90.,
       91., 92., 93., 94., 95., 96., 97., 98., 99.])

In [198]:
# delete items from a group
del f['experiment1/simulation1/data3']
f.visititems(lambda name, item: print(name, item))

experiment1 <HDF5 group "/experiment1" (1 members)>
experiment1/simulation1 <HDF5 group "/experiment1/simulation1" (2 members)>
experiment1/simulation1/data1 <HDF5 dataset "data1": shape (100, 100), type "<f8">
experiment1/simulation1/data2 <HDF5 dataset "data2": shape (100, 100), type "<f8">
experiment2 <HDF5 group "/experiment2" (2 members)>
experiment2/simulation1 <HDF5 group "/experiment2/simulation1" (0 members)>
experiment2/simulation2 <HDF5 group "/experiment2/simulation2" (0 members)>


## 5. Working with attributes <a id='section5'></a>

In [199]:
# first make sure the file is opened in read-write mode
f = h5py.File('testfile.hdf5', mode='a')

In [201]:
# access attributes of hdf5 objects using the attrs method
f.attrs

<Attributes of HDF5 object at 140447206666312>

In [211]:
# retreive attributes 
list(f.attrs.keys())

['description']

In [212]:
# create an attribute to the root group
f.attrs['description'] = 'Simulation data for project X'
list(f['/'].attrs.keys())

['description']

In [214]:
# create attributes to a (sub)group
f['experiment1'].attrs['mass ratio'] = '2.5'
f['experiment2'].attrs['mass ratio'] = '1.0'
list(f['experiment1'].attrs.keys()), list(f['experiment2'].attrs.keys())

(['mass ratio'], ['mass ratio'])

In [216]:
# create attributes to a dataset
f['experiment1/simulation1/data1'].attrs['m1'] = 1.0
f['experiment1/simulation1/data1'].attrs['m2'] = 2.5
f['experiment1/simulation1/data1'].attrs['r0'] = 30.0
list(f['experiment1/simulation1/data1'].attrs.keys())

['m1', 'm2', 'r0']

In [220]:
# test for the existence of an attribute using the 'in' operator
print('mass ratio' in f['experiment1'].attrs)
print('mass ratio' in f['experiment2'].attrs)

True
True


In [221]:
# delete existing attributes
del f['experiment1'].attrs['mass ratio']
print('mass ratio' in f['experiment1'].attrs)

False
