# Layouts

A Layout in the context of this package describes the required tree-structure and multiple meta information of an HDF5 file. What and how exactly can be defined via a layout is outlined in [a later section.](#Write-an-individual-layout)

A layout essentially is a HDF itself and is used in combination with a another HDF5 file, e.g. one that contains measurement data. Specialized "wrapper-HDF files" in this package such as `H5File` or `H5PIV` come already with a pre-defined layout, which is already associated with it

The idea behind defining layouts is, that the data generation process is supervised. This means, that a *check* can be performed by the user filling the HDF5 file with data using the provided layout file. This in turn will lead to no suprises or inconsitencies when exchanging the file with other as everybody involved agreed on the respected layout. Optimally, this will save a lot of tie and costs.

There are two ways to work with layouts:
1. Via raw python
2. Via the shell

Using the shell is recommended for quick checks and to get an overview. Let's first do that and find out which layouts are already provided by the package:

In [1]:
! h5tbx layout --list-registered

@ C:\Users\da4323\AppData\Local\h5rdmtoolbox\h5rdmtoolbox\layouts:
 > H5File
 > H5Flow
 > H5PIV


If you created a layout or got a layout file from another person and want to register it in the package so you can access and use it from everywhere on your machine, register it via:
```bat
h5tbx layout -r <hdf_filenam.hdf>
```
Find out more about what the command line interface can do by printing out the help:
```bat
h5tbx layout -h
```

Now let's head back to raw python and discuss how to generate a layout and apply checks: 

In [2]:
import h5rdmtoolbox as h5tbx

If a `H5File` (or a subclass of it) is already associated with a layout, we can call the `layout`-property (We saw above that a layout for H5File indeed is registered aleady):

In [3]:
with h5tbx.H5File() as h5:
    lay = h5.layout
print(lay)

<H5FileLayout with 0 issues>


As can be seen, the file has one issue. To see which issues were found, call `report()`. In the example, the root-attribute "title" was missing:

In [4]:
lay.report()

H5FileLayout issue report (0 issues)
-------------------


As said in the introduction, a `layout` is a HDF file itself, it has similar features like wrapper files, e.g. we can get a nice html representation of the content. Like this we can see what is required to have in any file written with `H5File`.

Doing so, we directly see, that the layout **requires to have two attributes**, namely `__h5rdmtoobox_version__` and a `title`:

In [5]:
lay

## Check layout
Let's create an empty HDF5 file with `h5py`, thus no data will be available. If we then open it with a `H5File` wrapper, everything defined in the layout file is missing, hence, we expect 4 issues to raise:

In [6]:
# create an HDF5 file with the package h5py:
import h5py
filename = h5tbx.generate_temporary_filename()
with h5py.File(filename, 'w'):
    pass

#open with h5tbx and perform a check:
with h5tbx.H5File(filename, layout='H5File') as h5:
    print('run check')
    h5.check()
    
    print('\nrun check again - different call')
    h5.layout.check(h5['/'])
    
print('\nrepr of layout:')
print(h5.layout)

2022-10-21_17:39:24,538 ERROR    [layout.py:94] Attribute title missing in group /
2022-10-21_17:39:24,538 ERROR    [layout.py:94] Attribute title missing in group /
2022-10-21_17:39:24,538 ERROR    [layout.py:94] Attribute title missing in group /
2022-10-21_17:39:24,538 ERROR    [layout.py:94] Attribute title missing in group /


run check

run check again - different call

repr of layout:
<H5FileLayout with 1 issues>


The command line equivalent would be:

In [7]:
! h5tbx layout -l H5File -c "{filename}"

Checking C:\Users\da4323\AppData\Local\h5rdmtoolbox\h5rdmtoolbox\tmp\tmp155\tmp2 with layout H5File


2022-10-21_17:39:26,412 ERROR    [layout.py:94] Attribute title missing in group /
2022-10-21_17:39:26,412 ERROR    [layout.py:94] Attribute title missing in group /


## Write an individual layout

The class `H5Layout` can be found in the sub-package `conventions`. To add groups, datasets and attributes to an object `H5Layout`, we need to initializes it first and then work with the property `File`.

Let's first create a reference HDF file to which the layout shall be applied. At the beginning it is empty except the title attribute:

In [8]:
with h5tbx.H5File() as h5:
    h5.attrs['title'] = 'A test file'
    hdf_filename = h5.hdf_filename

Init the `H5Layout` object:

In [9]:
hdf_layout_filename = h5tbx.generate_temporary_filename(suffix='.hdf')
mylayout = h5tbx.conventions.H5Layout(hdf_layout_filename)
mylayout

### Attributes
Many parts of the HDF5 tree structure can be defined via the layout. Most obviously we can define attributes. We can distinguish between two definition:
1. Defining the exact value
2. Defining the name with any value

Let's enforce the user to use the attribute `title` but let the value be open (use special value string `__any`) and enforce the usage of the root attribute `type=testdata`.

Attribute values starting with `__` are ignored. Thus instead of `__any` we can use any other string, e.g. `__a_title`:

In [10]:
with mylayout.File(mode='w') as h5:
    h5.attrs['title'] = '__any'
    h5.attrs['type'] = 'testdata'
mylayout.check_file(hdf_filename)
mylayout.report()

2022-10-21_17:39:26,730 ERROR    [layout.py:94] Attribute type missing in group /
2022-10-21_17:39:26,730 ERROR    [layout.py:94] Attribute type missing in group /


H5FileLayout issue report (1 issues)
-------------------
/.type: -> missing


### Datasets

It generally makes no sense to define the data content of a dataset. However, it might be reasonable to define the expected shape or dimension of the array and once again its attributes. For this, we need to create the expected dataset at the expected group level and give it any shape. Best is to use the smallest possible in order to reduce the size of the layout file. It anyhow does not play a role because shape or dimension checks are made based on specific attributes assigned to the dataset.

Special attributes names start and end with `__`. Available ones to be used for datasets in layouts are:
- `__shape__`: Specify the exact shape: Tuple ot int
- `__dim__`: Specify the exact dimension: int

In [11]:
with mylayout.File(mode='r+') as h5:
    # expect a 1D array dataset called "x":
    dsx = h5.create_dataset('x', shape=(1,))
    dsx.attrs['__ndim__'] = 1
    dsx.attrs['standard_name.alt:long_name'] = '__any'
    dsx.attrs['units'] = 'm' 
    
    # expect a 2D array dataset called "data" with shape (20, 30):
    dsdata = h5.create_dataset('data', shape=(1,))
    dsdata.attrs['__shape__'] = (20, 30)
    dsdata.attrs['standard_name.alt:long_name'] = '__any'
    dsdata.attrs['units'] = 'kg' 

Let's update the reference file:

In [12]:
with h5py.File(hdf_filename, mode='r+') as h5:
    h5.attrs['type'] = 'testdata'
    dsx = h5.create_dataset('x', data=[1,2,3])
    dsx.attrs['long_name'] = 'x coordinate'
    
    dsdata = h5.create_dataset('data', shape=(20, 32))
    dsdata.attrs['units'] = 'm'

The layout check reveals that some meta data is mising, other is wrong

In [13]:
mylayout.check_file(hdf_filename)
mylayout.report()

2022-10-21_17:39:26,767 ERROR    [layout.py:31] Wrong shape of dataset /data: (20, 30) != (20, 32)
2022-10-21_17:39:26,767 ERROR    [layout.py:31] Wrong shape of dataset /data: (20, 30) != (20, 32)
2022-10-21_17:39:26,770 ERROR    [layout.py:77] Neither of the attribute standard_name, long_name exist in /data
2022-10-21_17:39:26,770 ERROR    [layout.py:77] Neither of the attribute standard_name, long_name exist in /data
2022-10-21_17:39:26,770 ERROR    [layout.py:68] Base-units check failed for /data: kg != m
2022-10-21_17:39:26,770 ERROR    [layout.py:68] Base-units check failed for /data: kg != m
2022-10-21_17:39:26,774 ERROR    [layout.py:71] Attribute units missing in /x
2022-10-21_17:39:26,774 ERROR    [layout.py:71] Attribute units missing in /x


H5FileLayout issue report (4 issues)
-------------------
/data: -> wrong shape: (20, 32) != (20, 30)
/data.standard_name or long_name: -> missing
/data.units: -> wrong
/x.units: -> missing


### Groups

Defining groups is straight forward and similar to datasets, just without the option to define shape/dim and similar.