# Groups 101

**Source:** *Python and HDF5* by Andrew Collette, O'Reilly 2013.

HDF5 groups (and links) are the main tool to organize the objects in an HDF5 file. For beginners, it's OK to think about groups as nested "folders" or "drawers" in an "HDF5 cabinet." To use them effectively you'll have to understand the limitations of that model.

In [None]:
import numpy as np, h5py

In [None]:
f = h5py.File("groups.hdf5", "w", libver="latest", driver="core")

## Group = Collection of Links

An *HDF5 link* is an explicit representation of an association between a single source (the group) and a single destination. There are different "flavors" of links, which differ in how the destination is specified.

An HDF5 group is a collection of links, **not** objects.

### Create

HDF5 groups can be created "by-hand" or as a "side-effect" of the creation of other objects.

#### As a Side-Effect

In [None]:
dset = f.create_dataset("/group/subgroup/test2", (10, 10))

How many groups are there? (Correct answer: 3)

In [None]:
f == f["/"]  # Root group

In [None]:
f["/"] == f["/group"]

In [None]:
f["/group"] == f["/group/subgroup"]

#### From Scratch

In [None]:
subgroup = f.create_group("SubGroup")

In [None]:
subsubgroup = subgroup.create_group("AnotherGroup")

In [None]:
subsubgroup.name

In [None]:
out = f.create_group('/some/big/path')

### Read

The link collections stored in HDF5 groups can be accessed and traversed like Python dictionaries. 

In [None]:
len(f)

In [None]:
list(f.keys())

In [None]:
[(x,y) for x, y in f.items()]

In [None]:
'some' in f

In [None]:
def printname(name):
    print(name)

We can recursively visit all groups in a file or starting at a certain group.

In [None]:
f.visit(printname)

In [None]:
mylist = []

In [None]:
f.visit(mylist.append)

In [None]:
mylist

### Update

#### Hard Links

In [None]:
f['group_alias'] = f['group']

In [None]:
list(f.keys())

In [None]:
f.visit(printname)

In [None]:
f['group_alias'] == f['group']

#### Soft Links

We can create links with destinations which may not yet or will never exist.

In [None]:
f['soft_alias'] = h5py.SoftLink("group/subgroup/test2")

In [None]:
f['soft_alias'] == f['group/subgroup/test2']

In [None]:
f['bogus_group_alias'] = h5py.SoftLink("group/subgroup/test22")

In [None]:
f['bogus_group_alias']

#### External Links

A link's destination can be an object in another HDF5 file.

In [None]:
f['external_alias'] = h5py.ExternalLink("weather.h5", "/15/temperature")

In [None]:
f['external_alias'].value

In [None]:
f['anotherlink'] = h5py.ExternalLink('missing.hdf5','/')

In [None]:
for name in f:
    print(name, f.get(name, getclass=True, getlink=True))

### Delete "=" Unlink

In [None]:
del f["anotherlink"]

In [None]:
"anotherlink" in f

In [None]:
f.close()

## Advanced Topics for Discussion

- Reference counting in HDF5
- Group compression