# Dataset creation

As motivated, the package enforces us to use certain meta information to meet the FAIR principles. A dataset creation as known from the `h5py` package is therefore not possible, because we have to pass `units` and `standard_name` or `long_name`:

In [None]:
import h5rdmtoolbox as h5tbx

In [None]:
with h5tbx.H5File(standard_name_table=None) as h5:
    try:
        h5.create_dataset('x', shape=(4,))
    except h5tbx.conventions.UnitsError as e:
        print(e)
    h5.create_dataset('x', shape=(4,), units='m', long_name='a coordinate')

For now we only used a long name. What about standard name?

In [None]:
with h5tbx.H5File(standard_name_table=None) as h5:
    h5.create_dataset('x', shape=(4,), units='m', standard_name='a coordinate')

No problem so far because standard names are not regulated yet since we did not specify a `convention` with the `H5File`-object. In fact we even passed `standard_name_table=None`.

Let's pass the already implemented fluid convention to the wrapper class (The convention is motivated once again from the cf-conventions). We run through various errors first:

In [None]:
with h5tbx.H5File(standard_name_table=h5tbx.conventions.FluidStandardNameTable) as h5:
    try:
        h5.create_dataset('x', shape=(4,), units='m', standard_name='a coordinate')
    except h5tbx.conventions.StandardizedNameError as e:
        print(e)
    
    try:
        h5.create_dataset('x', shape=(4,), units='m', standard_name='a_coordinate')
    except h5tbx.conventions.StandardizedNameError as e:
        print(e)
    
    try:
        h5.create_dataset('x', shape=(4,), units='kg', standard_name='x_coordinate')  # note the wrong units!
    except h5tbx.conventions.StandardizedNameError as e:
        print(e)
        
    h5.create_dataset('x', shape=(4,), units='m', standard_name='x_coordinate')  # not finally correct
    h5.create_dataset('y', shape=(4,), units='km', standard_name='y_coordinate')  # only base units is checked

### String datasets
String datasets can be created very quickly. No standard_name, long_name or units *must* be given. As units generally anyhow makes no sense, there is still the option to pass long and standard name via the method parameters.<br>
The dump method will display single strings but not lists of strings.<br>
The return value when sliced will still be a `xarray.DataArray` as attributes should still be attached to the object. Use `.values` to get the raw string:

In [None]:
with h5tbx.H5File() as h5:
    h5.create_string_dataset('astr', 'hello_world')
    h5.create_string_dataset('string_list', ['hello', 'world'])
    h5.dump()
    
    print('---\n', h5['astr'][()])
    print('---\n',h5['astr'].values[()])
    
    print('---\n', h5['string_list'][:])
    print('---\n',h5['string_list'].values[:])

### Advanced dataset creation

There is more to dataset creation. You can:
- add attributes

In [None]:
with h5tbx.H5File() as h5:
    h5.create_dataset('ds', shape=(10, ), units='', attrs=dict(long_name='a long name', anothera='another attr'))  # unitless dataset. long_name is passed via parameter attrs

- make and attach scales (Note the output using `dump()`: the scale "link" is shown)

In [None]:
with h5tbx.H5File() as h5:
    h5.create_dataset('x', data=[1,2,3], units='m', standard_name='x_coordinate', make_scale=True)
    h5.create_dataset('t', data=[20.1, 18.5, 24.7], units='degC', standard_name='temperature', attach_scale=h5['x'])
    print(h5.t.x)  # note, that you can access the dimension scale using attribute-style-syntax
    h5.dump()

- add `xarry.DataArrays`

In [None]:
import xarray as xr
import numpy as np
arr =  xr.DataArray(dims=('y', 'x'), data=np.random.rand(3, 2),
                                 coords={'y': xr.DataArray(dims='y', data=[1, 2, 3],
                                                               attrs={'units': 'm',
                                                                      'standard_name': 'y_coordinate'}),
                                         'x': xr.DataArray(dims='x',
                                                               data=[0, 1],
                                                               attrs={'standard_name': 'x_coordinate'})
                                        },
                                 attrs={'long_name': 'a long name',
                                        'units': 'm/s'})

with h5tbx.H5File() as h5:
    h5.create_dataset('temperature', data=arr)
    h5.dump()

We may also create a dataset by using the `__setitem__`:

In [None]:
with h5tbx.H5File() as h5:
    h5['x'] = [1,2,3], 'm/s', {'long_name':'hallo'}
with h5tbx.H5File() as h5:
    h5['x'] = ([1,2,3], 'm/s', 'long_name', 'standard_name')
with h5tbx.H5File() as h5:
    h5['x'] = ([1,2,3], dict(units='m/s', long_name='long_name',
                             attrs={'hello': 'world'}, compression='gzip'))