# CF-like Convention

The "CF-like Convention" is based on the CF Metadata Convention (https://cfconventions.org/).
At its core, so-called `standard_names` are defined for each relevant physical quantity. It is documented with its
canonical units and a comprehensive description in a versioned table available to all users. No credits are taken
for the following as everything is based on the CF Metadata Convention.

> 💡 In the scope of this package, the convention is called "cf-like" because it does not incorporate all features of the CF Metadata Convention.


## Standard Names
The `standard_name` is used to identify a physical quantity. To fulfill interoperability, findability as well as
re-usability each dataset **must** be assigned with either the attribute `long_name` or `standard_name`.

long_name
    A human-readable string.
standard_name
    A string respecStandardized Name Table
-----------------------
A standardized name table (snt) again is motivated by the CF Metadata Convention. It is a table
containing at least name, description and canonical_units. A python class is provided to read
a table from and write to a YML or XML file. Such an object
is passed to a wrapper-HDF-class to control the above described metadata of datasets.ting more or less strict rules defined by a community and defined in a name table.

Note, that the `long_name` attribute does not guarantee interoperability but `standard_name` does, if
the convention is known to each user.
In addition to this, an additional attribute is required, namely `units`. As we work with scientific
data, each dataset has a physical unit, e.g. [m]. If no physical unit can be set, it might because the
variable is dimensionless, which is an information about the unit anyhow, so we set `units=''`.

If the `standard_name` is provided in the dataset creation method and a standard name table (snt) is
available, then `units` is verified by that table. The table holds the base-units (canonical units) for
each standard name (check is performed on basic SI-units).


## Standardized Name Table

A standardized name table (snt) again is motivated by the CF Metadata Convention. It is a table
containing at least name, description and canonical_units. A python class is provided to read
a table from and write to a YML or XML file. Such an object
is passed to a wrapper-HDF-class to control the above described metadata of datasets.

## Usage

To activate the convention inside the `h5rdmtoolbox`, call `use(cflike)`:

In [1]:
import h5rdmtoolbox as h5tbx
from h5rdmtoolbox import use

use('cflike')

2023-03-27_12:34:52,419 INFO     [__init__.py:78] Switched to "cflike"


Whenever a dataset is written and the parameter `standard_name` is set, it is verified against the standard name convention/table associated with the wrapper class. If the constant `STRICT` is set to True (default), the name is looked-up in the table and, if not found, the dataset cannot be written. To allow standard names, that fulfill the spelling requirements but are not yet listed in the table, set `STRICT` to False:

In [2]:
h5tbx.conventions.cflike.standard_name.STRICT = False

The string representation of the HDF5 file object indicates, that the cflike-convention is enabled:

In [3]:
h5tbx.get_current_convention_name()

'cflike'

In [4]:
h5tbx.use('default')
with h5tbx.File() as h5:
    print(h5)
    h5.attrs.title=[1,2,3]
    print(h5.attrs.keys())
    print(h5.attrs['title'])

2023-03-27_12:34:52,476 INFO     [__init__.py:63] Switched to "default"


<class 'h5rdmtoolbox.File' convention: default>
{'standard_name': <class 'h5rdmtoolbox.conventions.cflike.standard_name.StandardNameGroupAttribute'>, 'standard_name_table': <class 'h5rdmtoolbox.conventions.cflike.standard_name.StandardNameTableAttribute'>, 'units': <class 'h5rdmtoolbox.conventions.cflike.units.UnitsAttribute'>, 'long_name': <class 'h5rdmtoolbox.conventions.cflike.long_name.LongNameAttribute'>, 'title': <class 'h5rdmtoolbox.conventions.cflike.title.TitleAttribute'>, 'references': <class 'h5rdmtoolbox.conventions.cflike.references.ReferencesAttribute'>}
<KeysViewHDF5 ['__h5rdmtoolbox_version__']>


KeyError: "Can't open attribute (can't locate attribute: 'title')"

Some attributes are obligatory to provide with a dataset:
- `units`
- `standard_name`
- `long_name` (not needed if `standard_name` exists and vise versa)

**units**<br>
We expect that each data set written to the HDF5 file has a physical unit or no unit at all. It is registered in the attribute `units`.

**standard_name and long_name**<br>
For the sake of improved readability and interpretability the convention suggests to use `long_name` or `standard_name` as additional attributes. While `long_name` is human-readable and interpretable attribute, `standard_name` is intended to be read by a machine (other software). This allows to automate exploration and processing work.

### Dataset Creation

In [None]:
with h5tbx.File(title='MyFile') as h5:
    h5.create_dataset('u', data=1.2,
                      units='m/s',
                      standard_name='x_velocity')
    h5.create_dataset('v', data=[1,2,3],
                      units='m/s',
                      standard_name='y_velocity')
    h5.create_dataset('method', data="linear_interpolation",
                      long_name='The inteprolation method used for something.')
    h5.dump()

## Group creation

In addition to the standard way of creating groups, a `long_name` can be passed. It is optional, though:

In [None]:
with h5tbx.File() as h5:
    h5.create_group('mygrp')
    h5.create_group('othergrp', long_name='my other group')
    h5.sdump()

### Other attributes definitions

During file initialization, a `title`:

In [None]:
with h5tbx.File(title='MyFile') as h5:
    h5.dump()

## Initialize a Standard Name Convention
A standardized name table is a XML document, which contains (at least) a description and a canonical unit for a standarized name. We'll build one from scratch first and then have a look into already implemented ones:

Call `StandardNameTable` from the sub-package `conventions` and provide a `name`, `version`, `table`, `contact` and and `insitution`:

In [None]:
from h5rdmtoolbox.conventions.cflike import StandardNameTable

In [None]:
sc = StandardNameTable(
    name='Test_SNC',
    table={},
    version_number=1,
    contact='contact@python.com',
    institution='my_institution'
)
sc

We have built an empty convention (no table content). Lets add content. We can do this by creating a dictionary first...

In [None]:
tabledict = {'x_velocity': dict(canonical_units='m/s', description='velocity is a vector quantity.')}
tabledict

... and add it to the object by calling `update()`:

In [None]:
sc.update(tabledict)
sc.dump()

New entries can be assigned by using `set` or `modified` depending on whether the entry already exists or not:

In [None]:
sc.set('time', canonical_units='s', description='physical time')
sc.modify('x_velocity', canonical_units='m/s', description='velocity is a vector quantity. x indicates the component in y-axis direction')
sc.set('y_velocity', canonical_units='m/s', description='velocity is a vector quantity. y indicates the component in y-axis direction')
sc.set('z_velocity', canonical_units='m/s', description='velocity is a vector quantity. z indicates the component in z-axis direction')
sc.sdump()

## Saving Standard Name Table

Standard name tables should be saved as xml-documents or yaml-files:

In [None]:
xml_filename = h5tbx.generate_temporary_filename(suffix='.xml')
sc.to_xml(xml_filename)

yml_filename = h5tbx.generate_temporary_filename(suffix='.yml')
sc.to_yaml(yml_filename)
pass

For later usage from anywhere, the table can be registerd with the toolbox. Call `register()`. It save the convention as yml file in the user directory for standard name data and will use the `versionname`:

In [None]:
sc.register(overwrite=True)
StandardNameTable.print_registered()

Use the command line call

In [None]:
! h5tbx standard_name --list-registered

## Load Standard Name Convention from file

If you have standard name tables to your hand, just load them. They must be provided as XML or YML:

In [None]:
sc_test = StandardNameTable.from_yaml(yml_filename)
print(sc_test.versionname)
sc_test.dump()

In [None]:
sc_test['x_velocity']

In [None]:
sc_test['x_velocity'].snt

Load a registered Standard Name Table from the toolbox:

In [None]:
StandardNameTable.load_registered('Test_SNC-v1').dump()

## Load from web
Optimally a community has defined a naming conventions, just like the cfconventions from where the concept is adoped. Let's imort their latest xml document:

In [None]:
cf = StandardNameTable.from_web(url='https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml')
cf

In [None]:
cf.versionname

In [None]:
cf.dump(max_rows=4)

## Perform checks
A naming convention can be used to test new standard names, whether they comply with it or not:

In [None]:
cf.check_name('zenith_angle', strict=True)

In [None]:
cf['x_wind_gust'].canonical_units

In [None]:
try:
    cf.check_units('x_wind_gust', units='m/s')
except h5tbx.erros.StandardizedNameError as e:
    print(e)

In [None]:
try:
    cf.check_units('zenith_angle', units='K')
except h5tbx.errors.StandardNameError as e:
    print(e)
cf.check_units('zenith_angle', units='degree')

Perform a check on a file

In [None]:
with h5tbx.File() as h5:
    h5.create_dataset('zenith angle 1', shape=(3,), units='K', standard_name='zenith_angle')
    h5.create_dataset('zenith angle 2', shape=(3,), units='degree', standard_name='zenith_angle')

In [None]:
cf.check_file(h5.hdf_filename, raise_error=False)

Use the command line call

In [None]:
cf.register(overwrite=True)

In [None]:
! h5tbx standard_name -f {h5.hdf_filename} -t {cf.versionname}