# Groups 101

**Source:** *Python and HDF5* by Andrew Collette, O'Reilly 2013. (mainly chapter 5)

HDF5 groups (and links) are the main tool to organize the objects in an HDF5 file. For beginners, it's OK to think about groups as nested "folders" or "drawers" in an "HDF5 cabinet." To use them effectively you'll have to understand the limitations of that model.

In [1]:
import numpy as np, h5py

In [2]:
f = h5py.File("groups.hdf5", "w", libver="latest", driver="core")

## Group = Collection of Links

An *HDF5 link* is an explicit representation of an association between a single source (the group) and a single destination. There are different "flavors" of links, which differ in how the destination is specified.

An HDF5 group is a collection of links, **not** objects.

### Create

HDF5 groups can be created "by-hand" or as a "side-effect" of the creation of other objects.

#### As a Side-Effect

In [80]:
dset = f.create_dataset("/group/subgroup/test2", (10, 10))


Supplying a full path HDF5 will create all the intermediate groups. 
How many groups are there? (Correct answer: 3)

In [81]:
f == f["/"]  # Root group

True

In [82]:
f["/"] == f["/group"]

False

In [83]:
f["/group"] == f["/group/subgroup"]

False

#### From Scratch

We can also work from scratch using the create_group method on the class f (which is is a subclass of the more generic group class).

This means that the file itself is a **group** (the root group "/").

In [84]:
subgroup = f.create_group("SubGroup-fs")

In [85]:
subsubgroup = subgroup.create_group("AnotherGroup")

In [86]:
subsubgroup.name

u'/SubGroup-fs/AnotherGroup'

In [87]:
out = f.create_group('/some/big/path')

### Read

The link collections stored in HDF5 groups can be accessed and traversed like Python dictionaries. 

In [88]:
len(f)

3

In [89]:
list(f.keys())

[u'group', u'SubGroup-fs', u'some']

In [91]:
[(x,y) for x, y in f.items()]

[(u'group', <HDF5 group "/group" (1 members)>),
 (u'SubGroup-fs', <HDF5 group "/SubGroup-fs" (1 members)>),
 (u'some', <HDF5 group "/some" (1 members)>)]

In [92]:
'some' in f

True

In [93]:
def printname(name):
    print(name)

We can recursively visit all groups in a file or starting at a certain group.

In [94]:
f.visit(printname)

group
group/subgroup
group/subgroup/test2
SubGroup-fs
SubGroup-fs/AnotherGroup
some
some/big
some/big/path


In [95]:
mylist = []

In [96]:
f.visit(mylist.append)

In [97]:
mylist

[u'group',
 u'group/subgroup',
 u'group/subgroup/test2',
 u'SubGroup-fs',
 u'SubGroup-fs/AnotherGroup',
 u'some',
 u'some/big',
 u'some/big/path']

### Working with links

What does it mean to give an object a name in the file?
you might think that the name is part of the object, in the same way that the dtype or
shape are part of a dataset.
But this isn’t the case. There’s a layer between the group object and the objects that are
its members. The two are related by the concept of links.

Links in HDF5 are handled in much the same way as in modern filesystems. Objects
like datasets and groups don’t have an intrinsic name; rather, they have an address (byte
offset) in the file that HDF5 has to look up. When you assign an object to a name in a
group, that address is recorded in the group and associated with the name you provided
to form a link.
This means that objects in an **HDF5 file can have more than one
name;** in fact, they have as many names as there exist links pointing to them. The number
of links that point to an object is recorded, and when no more links exist, the space used
for the object is freed.
This kind of a link, the default in HDF5, is called a hard link

#### Hard Links

Let us create an example of multiple links

In [98]:
f = h5py.File('linksdemo.hdf5','w')

In [99]:
grpx = f.create_group('x')

In [100]:
grpx.name 

u'/x'

let us create a new link pointing to group

In [101]:
f['y']=grpx

In [102]:
grpy=f['y']

In [103]:
grpy==grpx

True

Another way to see it: 

In [104]:
f['pio']=f['y']

In [105]:
list(f.keys())

[u'pio', u'x', u'y']

In [106]:
f.visit(printname)

pio


In [107]:
f['pio'] == f['x']

True

#### Soft Links

Unlike “hard” links, which associate a link name with a particular object in the file, soft links instead store **the path to an object**.

We can create links with destinations which may not yet or will never exist.



In [110]:
dset = f.create_dataset("/x/d1/d2", (10, 10))

In [113]:
f.visit(printname)

pio
pio/d1
pio/d1/d2


In [122]:
f['soft'] = h5py.SoftLink('pio')

RuntimeError: Unable to create link (Name already exists)

In [117]:
f['soft2'] = h5py.SoftLink('pio/d1/d2')

In [118]:
f.visit(printname)

pio
pio/d1
pio/d1/d2


In [125]:
f['soft2'] == dset

True

In [126]:
del dset

In [128]:
dset2=f.create_dataset("/x/d1/d2", (10, 10))

RuntimeError: Unable to create link (Name already exists)

In [127]:
f['soft2'] == dset

NameError: name 'dset' is not defined

Soft links are therefore a great way to refer to “the object which resides at /some/partic
ular/path,” rather than any specific object in the file.

This can be very handy if, for example, a particular dataset represents some information that needs to be updated without breaking all the links to it elsewhere in the file

#### External Links

A link's destination can be an object in another HDF5 file.

In [71]:
f['external_alias'] = h5py.ExternalLink("weather.h5", "/15/temperature")

In [72]:
f['external_alias'].value

array([ 0.64248708,  0.32321578,  0.62666741, ...,  0.31430819,
        0.04177788,  0.1150275 ])

In [73]:
f['anotherlink'] = h5py.ExternalLink('missing.hdf5','/')

In [74]:
for name in f:
    print(name, f.get(name, getclass=True, getlink=True))

(u'external_alias', <class 'h5py._hl.group.ExternalLink'>)
(u'bogus_group_alias', <class 'h5py._hl.group.SoftLink'>)
(u'pio', <class 'h5py._hl.group.HardLink'>)
(u'soft_alias', <class 'h5py._hl.group.SoftLink'>)
(u'x', <class 'h5py._hl.group.HardLink'>)
(u'y', <class 'h5py._hl.group.HardLink'>)
(u'anotherlink', <class 'h5py._hl.group.ExternalLink'>)


### Delete "=" Unlink

In [75]:
del f["anotherlink"]

In [76]:
"anotherlink" in f

False

In [77]:
f.close()