# CF-like Convention

The "CF-like Convention" is based on the CF Metadata Convention (https://cfconventions.org/).
At its core, so-called `standard_names` are defined for each relevant physical quantity. It is documented with its
canonical units and a comprehensive description in a versioned table available to all users. No credits are taken
for the following as everything is based on the CF Metadata Convention.

> 💡 In the scope of this package, the convention is called "cf-like" because it does not incorporate all features of the CF Metadata Convention.


## Standard Names
The `standard_name` is used to identify a physical quantity. To fulfill interoperability, findability as well as
re-usability each dataset **must** be assigned with either the attribute `long_name` or `standard_name`.

long_name
    A human-readable string.
standard_name
    A string respecStandardized Name Table
-----------------------
A standardized name table (snt) again is motivated by the CF Metadata Convention. It is a table
containing at least name, description and canonical_units. A python class is provided to read
a table from and write to a YML or XML file. Such an object
is passed to a wrapper-HDF-class to control the above described metadata of datasets.ting more or less strict rules defined by a community and defined in a name table.

Note, that the `long_name` attribute does not guarantee interoperability but `standard_name` does, if
the convention is known to each user.
In addition to this, an additional attribute is required, namely `units`. As we work with scientific
data, each dataset has a physical unit, e.g. [m]. If no physical unit can be set, it might because the
variable is dimensionless, which is an information about the unit anyhow, so we set `units=''`.

If the `standard_name` is provided in the dataset creation method and a standard name table (snt) is
available, then `units` is verified by that table. The table holds the base-units (canonical units) for
each standard name (check is performed on basic SI-units).


In [1]:
from h5rdmtoolbox.conventions.cflike import StandardName
StandardName(name='x_wind', canonical_units='m/s', description='x wind component')

<StandardName: x_wind [m/s] | SNT: None | desc: x wind component>

## Standardized Name Table

A standardized name table (snt) again is motivated by the CF Metadata Convention. It is a table
containing at least name, description and canonical_units. A python class is provided to read
a table from and write to a YML or XML file. Such an object
is passed to a wrapper-HDF-class to control the above described metadata of datasets.

## Usage

To activate the convention inside the `h5rdmtoolbox`, call `use(cflike)`:

In [2]:
import h5rdmtoolbox as h5tbx
from h5rdmtoolbox import use

use('cflike')

2023-03-27_23:34:41,296 INFO     [__init__.py:78] Switched to "cflike"


Whenever a dataset is written and the parameter `standard_name` is set, it is verified against the standard name convention/table associated with the wrapper class. If the constant `STRICT` is set to True (default), the name is looked-up in the table and, if not found, the dataset cannot be written. To allow standard names, that fulfill the spelling requirements but are not yet listed in the table, set `STRICT` to False:

In [3]:
h5tbx.conventions.cflike.standard_name.STRICT = False

The string representation of the HDF5 file object indicates, that the cflike-convention is enabled:

Some attributes are obligatory to provide with a dataset:
- `units`
- `standard_name`
- `long_name` (not needed if `standard_name` exists and vise versa)

**units**<br>
We expect that each data set written to the HDF5 file has a physical unit or no unit at all. It is registered in the attribute `units`.

**standard_name and long_name**<br>
For the sake of improved readability and interpretability the convention suggests to use `long_name` or `standard_name` as additional attributes. While `long_name` is human-readable and interpretable attribute, `standard_name` is intended to be read by a machine (other software). This allows to automate exploration and processing work.

### Dataset Creation

In [4]:
with h5tbx.File(title='MyFile') as h5:
    h5.create_dataset('u', data=1.2,
                      units='m/s',
                      standard_name='x_velocity')
    h5.create_dataset('v', data=[1,2,3],
                      units='m/s',
                      standard_name='y_velocity')
    h5.create_dataset('method', data="linear_interpolation",
                      long_name='The inteprolation method used for something.')
    h5.dump()

## Group creation

In addition to the standard way of creating groups, a `long_name` can be passed. It is optional, though:

In [5]:
with h5tbx.File() as h5:
    h5.create_group('mygrp')
    h5.create_group('othergrp', long_name='my other group')
    h5.sdump()

[3ma: __h5rdmtoolbox_version__[0m: 0.3.0a8
/[1mmygrp[0m
/[1mothergrp[0m
  [3ma: long_name[0m: my other group


### Other attributes definitions

During file initialization, a `title`:

In [6]:
with h5tbx.File(title='MyFile') as h5:
    h5.dump()

## Initialize a Standard Name Convention
A standardized name table is a XML document, which contains (at least) a description and a canonical unit for a standarized name. We'll build one from scratch first and then have a look into already implemented ones:

Call `StandardNameTable` from the sub-package `conventions` and provide a `name`, `version`, `table`, `contact` and and `insitution`:

In [7]:
from h5rdmtoolbox.conventions.cflike import StandardNameTable

In [8]:
sc = StandardNameTable(
    name='Test_SNC',
    table={},
    version_number=1,
    contact='contact@python.com',
    institution='my_institution'
)
sc

Test_SNC (version number: 1)

We have built an empty convention (no table content). Lets add content. We can do this by creating a dictionary first...

In [9]:
tabledict = {'x_velocity': dict(canonical_units='m/s', description='velocity is a vector quantity.')}
tabledict

{'x_velocity': {'canonical_units': 'm/s',
  'description': 'velocity is a vector quantity.'}}

... and add it to the object by calling `update()`:

In [10]:
sc.update(tabledict)
sc.dump()

Unnamed: 0,canonical_units,description
x_velocity,m/s,velocity is a vector quantity.


New entries can be assigned by using `set` or `modified` depending on whether the entry already exists or not:

In [11]:
sc.set('time', canonical_units='s', description='physical time')
sc.modify('x_velocity', canonical_units='m/s', description='velocity is a vector quantity. x indicates the component in y-axis direction')
sc.set('y_velocity', canonical_units='m/s', description='velocity is a vector quantity. y indicates the component in y-axis direction')
sc.set('z_velocity', canonical_units='m/s', description='velocity is a vector quantity. z indicates the component in z-axis direction')
sc.sdump()

Test_SNC (version: 1)
+------------+-------------------+------------------------------------------------------------------------------+
|            | canonical_units   | description                                                                  |
|------------+-------------------+------------------------------------------------------------------------------|
| time       | s                 | physical time                                                                |
| x_velocity | m/s               | velocity is a vector quantity. x indicates the component in y-axis direction |
| y_velocity | m/s               | velocity is a vector quantity. y indicates the component in y-axis direction |
| z_velocity | m/s               | velocity is a vector quantity. z indicates the component in z-axis direction |
+------------+-------------------+------------------------------------------------------------------------------+


## Saving Standard Name Table

Standard name tables should be saved as xml-documents or yaml-files:

In [12]:
xml_filename = h5tbx.generate_temporary_filename(suffix='.xml')
sc.to_xml(xml_filename)

yml_filename = h5tbx.generate_temporary_filename(suffix='.yml')
sc.to_yaml(yml_filename)
pass

For later usage from anywhere, the table can be registerd with the toolbox. Call `register()`. It save the convention as yml file in the user directory for standard name data and will use the `versionname`:

In [13]:
sc.register(overwrite=True)
StandardNameTable.print_registered()

 > fluid-v1
 > piv-v1
 > Test-v1
 > Test_SNC-v1


Use the command line call

In [14]:
! h5tbx standard_name --list-registered

 > fluid-v1
 > piv-v1
 > Test-v1
 > Test_SNC-v1


## Load Standard Name Convention from file

If you have standard name tables to your hand, just load them. They must be provided as XML or YML:

In [15]:
sc_test = StandardNameTable.from_yaml(yml_filename)
print(sc_test.versionname)
sc_test.dump()

Test_SNC-v1


Unnamed: 0,canonical_units,description
time,s,physical time
x_velocity,m/s,velocity is a vector quantity. x indicates the component in y-axis direction
y_velocity,m/s,velocity is a vector quantity. y indicates the component in y-axis direction
z_velocity,m/s,velocity is a vector quantity. z indicates the component in z-axis direction


In [16]:
sc_test['x_velocity']

<StandardName: x_velocity [m/s] | SNT: Test_SNC | desc: velocity is a vector quantity. x indicates the component in y-axis direction>

In [17]:
sc_test['x_velocity'].snt

Test_SNC (version number: 1)

Load a registered Standard Name Table from the toolbox:

In [18]:
StandardNameTable.load_registered('Test_SNC-v1').dump()

Unnamed: 0,canonical_units,description
time,s,physical time
x_velocity,m/s,velocity is a vector quantity. x indicates the component in y-axis direction
y_velocity,m/s,velocity is a vector quantity. y indicates the component in y-axis direction
z_velocity,m/s,velocity is a vector quantity. z indicates the component in z-axis direction


## Load from web
Optimally a community has defined a naming conventions, just like the cfconventions from where the concept is adoped. Let's imort their latest xml document:

In [19]:
cf = StandardNameTable.from_web(url='https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml')
cf

standard_name_table (version number: 79)

In [20]:
cf.versionname

'standard_name_table-v79'

In [21]:
cf.dump(max_rows=4)

Unnamed: 0,canonical_units,grib,amip,description
acoustic_signal_roundtrip_travel_time_in_sea_water,s,,,"The quantity with standard name acoustic_signal_roundtrip_travel_time_in_sea_water is the time taken for an acoustic signal to propagate from the emitting instrument to a reflecting surface and back again to the instrument. In the case of an instrument based on the sea floor and measuring the roundtrip time to the sea surface, the data are commonly used as a measure of ocean heat content."
aerodynamic_particle_diameter,m,,,The diameter of a spherical particle with density 1000 kg m-3 having the same aerodynamic properties as the particles in question.
...,...,...,...,...
y_wind_gust,m s-1,,,"""y"" indicates a vector component along the grid y-axis, positive with increasing y. Wind is defined as a two-dimensional (horizontal) air velocity vector, with no vertical component. (Vertical motion in the atmosphere has the standard name upward_air_velocity.) A gust is a sudden brief period of high wind speed. In an observed time series of wind speed, the gust wind speed can be indicated by a cell_methods of maximum for the time-interval. In an atmospheric model which has a parametrised calculation of gustiness, the gust wind speed may be separately diagnosed from the wind speed."
zenith_angle,degree,,,Zenith angle is the angle to the local vertical; a value of zero is directly overhead.


## Perform checks
A naming convention can be used to test new standard names, whether they comply with it or not:

In [22]:
cf.check_name('zenith_angle', strict=True)

True

In [23]:
cf['x_wind_gust'].canonical_units

'm/s'

In [24]:
try:
    cf.check_units('x_wind_gust', units='m/s')
except h5tbx.erros.StandardizedNameError as e:
    print(e)

In [25]:
try:
    cf.check_units('zenith_angle', units='K')
except h5tbx.errors.StandardNameError as e:
    print(e)
cf.check_units('zenith_angle', units='degree')

Unit of standard name "zenith_angle" not as expected: "K" != "degree"


True

Perform a check on a file

In [26]:
with h5tbx.File() as h5:
    h5.create_dataset('zenith angle 1', shape=(3,), units='K', standard_name='zenith_angle')
    h5.create_dataset('zenith angle 2', shape=(3,), units='degree', standard_name='zenith_angle')

In [27]:
cf.check_file(h5.hdf_filename, raise_error=False)

2023-03-27_23:34:45,205 ERROR    [standard_name.py:830]  > ds: /zenith angle 1: Unit of standard name "zenith_angle" not as expected: "K" != "degree"
2023-03-27_23:34:45,205 ERROR    [standard_name.py:830]  > ds: /zenith angle 1: Unit of standard name "zenith_angle" not as expected: "K" != "degree"


Use the command line call

In [28]:
cf.register(overwrite=True)

In [29]:
! h5tbx standard_name -f {h5.hdf_filename} -t {cf.versionname}

 > Checking file "C:\Users\da4323\AppData\Local\h5rdmtoolbox\h5rdmtoolbox\tmp\tmp862\tmp11.hdf" with standard name table "standard_name_table-v79"


2023-03-27_23:35:02,603 ERROR    [standard_name.py:830]  > ds: /zenith angle 1: Unit of standard name "zenith_angle" not as expected: "K" != "degree"
2023-03-27_23:35:02,603 ERROR    [standard_name.py:830]  > ds: /zenith angle 1: Unit of standard name "zenith_angle" not as expected: "K" != "degree"
