# Attributes 101

**Source:** *Python and HDF5* by Andrew Collette, O'Reilly 2013.

HDF5 attributes are the main facility to store *metadata*. In the simplest case, they are mere key/value pairs. However, in HDF5, they can be full-blown array variables, albeit without some of the conveniences of their dataset cousins (no partial I/O, no chunking or compression, etc.)

In [1]:
import numpy as np, h5py

In [2]:
f = h5py.File('attrsdemo.hdf5','w', libver="latest")

In [3]:
dset = f.create_dataset('dataset',(100,))

The `attrs` property of an HDF5 object is the gateway to it's collection of HDF5 attributes.

In [4]:
dset.attrs

<Attributes of HDF5 object at 4384140736>

## Create

HDF5 attributes can be created directly from certain Python objects, or from srcatch if more control over their creation is required.

### From Python Object

In [5]:
dset.attrs['title'] = "Dataset from third round of experiments"

In [6]:
dset.attrs['sample_rate'] = 100e6    # 100 MHz digitizer setting

In [7]:
dset.attrs['run_id'] = 144

We can "stuff" entire Python objects into attributes. *This may or may not be a good idea.*

In [8]:
import pickle

In [9]:
pickled_object = pickle.dumps({'key': 42}, protocol=0)

In [10]:
pickled_object

"(dp0\nS'key'\np1\nI42\ns."

In [11]:
dset.attrs['object'] = pickled_object

In [14]:
obj = pickle.loads(dset.attrs['object'])

In [15]:
.obj

SyntaxError: invalid syntax (<ipython-input-15-7d3c099463cc>, line 1)

In [16]:
dset.attrs['object']

"(dp0\nS'key'\np1\nI42\ns."

Attributes can be full-blown array variables.

In [17]:
dset.attrs['ones'] = np.ones((100, 100))

#### (Not the) Latest File Format

In the old days, there was a 64K size limit on attributes. This is what happened back then:

In [18]:
o = h5py.File('old.hdf5','w', driver="core")

In [19]:
o.attrs

<Attributes of HDF5 object at 4384260000>

In [20]:
o.attrs['ones'] = np.ones((100, 100))

RuntimeError: Unable to create attribute (Object header message is too large)

In [21]:
o.close()

### From Scratch

Creating an attribute from scratch gives you more control over its (element) type and shape.

In [22]:
dset.attrs.create('two_byte_int', 190, dtype='i2')

In [23]:
dset.attrs['two_byte_int']

190

## Read

In [24]:
 [(name, val) for name, val in dset.attrs.items()]

[(u'ones', array([[ 1.,  1.,  1., ...,  1.,  1.,  1.],
         [ 1.,  1.,  1., ...,  1.,  1.,  1.],
         [ 1.,  1.,  1., ...,  1.,  1.,  1.],
         ..., 
         [ 1.,  1.,  1., ...,  1.,  1.,  1.],
         [ 1.,  1.,  1., ...,  1.,  1.,  1.],
         [ 1.,  1.,  1., ...,  1.,  1.,  1.]])),
 (u'title', 'Dataset from third round of experiments'),
 (u'two_byte_int', 190),
 (u'run_id', 144),
 (u'sample_rate', 100000000.0),
 (u'object', "(dp0\nS'key'\np1\nI42\ns.")]

In [25]:
dset.attrs.get('run_id')

144

In [26]:
print(dset.attrs.get('missing'))

None


## Update

In [27]:
dset.attrs['run_id']

144

In [28]:
dset.attrs['run_id'] = 142

In [29]:
dset.attrs['run_id']

142

In [30]:
dset.attrs.modify('two_byte_int', 40000)

In [31]:
dset.attrs['two_byte_int']

32767

## Delete

Unlike HDF5 objects, attributes are **not** linked and the underlying space in the file will be freed when an attribute is deleted.

In [32]:
del dset.attrs['run_id']

In [33]:
dset.attrs['run_id']

KeyError: "Can't open attribute (Can't locate attribute in name index)"

In [34]:
f.close()