# Snippet for HDF5

#### Kornpob Bhirombhakdi: kbhirombhakdi [at] stsci [dot] edu

This snippet shows how to operate on HDF5 files. We use h5py package.

In [1]:
import h5py
import numpy as np

## Basics

HDF5 stands for <font color='orange'>Hierarchical Data Format</font>. A HDF5 file typically has <font color='orange'>.h5 or .hdf5</font> extension.

Information in HDF5 composes of <font color='orange'>Groups, Datasets, and Attributes.</font> A Group is a folder containing information. A Dataset is a numpy array of data. For each Group or Dataset, and the HDF5 file, Attributes are metadata describing them. Therefore, schematically, a HDF5 file is structured as: <font color='orange'>(file.h5: Attributes) $\rightarrow$ many {(Group: Attributes), (Dataset: Attributes)}</font> $\rightarrow$ many {(Group: Attributes), (Dataset: Attributes)} $\rightarrow$ and so on.

<font color='orange'>HDFView</font> is a free GUI software that can be used for operating HDF5 files.

## Write a HDF5 file

In [2]:
!rm './test.h5' # remove the existing file

x = h5py.File('./test.h5', 'w') # create a HDF5 file from scratch

x.attrs['attrs1'] = '123' # create the file's attributes

x.create_group('group1') # create a group
x['group1'].attrs['attrs2'] = '456' # create an attribute of a group

x.create_dataset('dataset1'
                 , data=np.random.random_sample(size=(10,5))
                 , dtype='float64') # create a dataset at root
x['dataset1'].attrs['attrs3'] = '789' # create the dataset's attributes

x['group1'].create_dataset('dataset2'
                           , data=np.random.random_sample(size=(5,2))
                           , dtype='float64') # create a dataset in a group

x.close() # close the file

## Read

In [3]:
x = h5py.File('./test.h5', 'r') # read a HDF5 file

print("File's attributes: ", x.attrs.keys()) # show keys of the file's attributes
print(x.attrs['attrs1']) # show the file's attributes given keys
print('#################################')

print("File's objects: ", x.keys()) # show the file's objects (i.e., Groups and Datasets)
print('#################################')

print(x['dataset1'].attrs.keys()) # accessing a dataset's attributes
print(x['dataset1'][:]) # accessing a dataset's data
print('#################################')

print(x['group1'].attrs.keys()) # accessing a group's attributes
print(x['group1'].keys()) # accessing a group's objects

x.close() # close the file

File's attributes:  <KeysViewHDF5 ['attrs1']>
123
#################################
File's objects:  <KeysViewHDF5 ['dataset1', 'group1']>
#################################
<KeysViewHDF5 ['attrs3']>
[[0.21650683 0.16904489 0.45770083 0.92371047 0.52116551]
 [0.63930414 0.99667139 0.11047377 0.12729058 0.73707412]
 [0.70042789 0.42432268 0.96307696 0.58687771 0.36471726]
 [0.66515653 0.33969977 0.02072865 0.80874734 0.09299391]
 [0.8846647  0.85431128 0.30457131 0.90445188 0.78008685]
 [0.41711827 0.74956029 0.38429253 0.19425888 0.66459913]
 [0.55017959 0.21366811 0.38940415 0.69122118 0.3319481 ]
 [0.57539718 0.23914961 0.56327444 0.514589   0.09942183]
 [0.73311367 0.69936805 0.23025088 0.10757085 0.97525942]
 [0.4115406  0.76299992 0.66351388 0.70511008 0.35093854]]
#################################
<KeysViewHDF5 ['attrs2']>
<KeysViewHDF5 ['dataset2']>
