# Decorators

`earthkit-data` has a number of decorators which are used in downstream earthkit packages to
make the user interface for the simpler for non-technical users..

This notebook demonstrates the handlers and the effect when used.

In [1]:
# Import earhtkit data and set cache policy
from earthkit import data as ekd
ekd.settings.set("cache-policy", "user")

# Import the handlers
from earthkit.data.utils.inputs_transform import format_handler, metadata_handler

# Import some common libraries we will use for typesetting example
import xarray as xr
import numpy as np
import pandas as pd

Create some example earthkit data objects to use as examples, the following a various `Reader` objects using test data

In [2]:
from earthkit.data.testing import earthkit_remote_test_data_file
ekds_grib = ekd.from_source("url", earthkit_remote_test_data_file("era5_temperature_europe_20150101.grib"))
ekds_geojson = ekd.from_source("url", earthkit_remote_test_data_file("NUTS_RG_60M_2021_4326_LEVL_0.geojson"))
ekds_netcdf = ekd.from_source("url", earthkit_remote_test_data_file("test_single.nc"))
ekds_netcdf_satellite = ekd.from_source("url",earthkit_remote_test_data_file("CO2_iasi_metop_c_nlis_2021_01.nc"))

reader_objects = [
    ekds_grib,
    ekds_geojson,
    ekds_netcdf,
    ekds_netcdf_satellite
]

To continue decoding into a timedelta64 dtype, either set `decode_timedelta=True` when opening this dataset, or add the attribute `dtype='timedelta64[ns]'` to this variable on disk.
To opt-in to future behavior, set `decode_timedelta=False`.
  return xr.open_dataset(self.path_or_url)


Create some common data objects to use examples, these are create from the `Reader` objects.

In [3]:
fieldlist_example = ekds_grib.to_fieldlist()
xarray_ds_example = ekds_netcdf.to_xarray()
xarray_da_example = xarray_ds_example[list(xarray_ds_example.data_vars)[0]]
pandas_df_example = ekds_netcdf_satellite.to_pandas()
pandas_series_example = pandas_df_example.iloc[:, 0]
numpy_example = ekds_grib.to_numpy()
geopandas_example = ekds_geojson.to_geopandas()

data_objects = [
    fieldlist_example,
    xarray_ds_example,
    xarray_da_example,
    pandas_df_example,
    pandas_series_example,
    numpy_example,
    geopandas_example
]

To continue decoding into a timedelta64 dtype, either set `decode_timedelta=True` when opening this dataset, or add the attribute `dtype='timedelta64[ns]'` to this variable on disk.
To opt-in to future behavior, set `decode_timedelta=False`.
  return xr.open_mfdataset(


## The `format_handler` decorator

The `format_handler` decorator is used to automatically convert the input data to the desired format.
The standard behaviou it to use the typesetting of the function, the example below demonstrates how `data`
is always converted to the type by the function signiture.


In [4]:
@format_handler()
def xrDataArray_function(data: xr.DataArray):
    assert isinstance(data, xr.DataArray), f"data is not an xarray.DataArray: got {type(data)}"
    return data

test_types = [
    ekds_grib,
    ekds_netcdf,
    ekds_netcdf_satellite,
    fieldlist_example,
    xarray_da_example,
    xarray_ds_example,
    pandas_series_example,
    numpy_example
]

print(f"{'Input type':<30} -> {'Output type':<30}")
for input in test_types:
    output = xrDataArray_function(input)
    print(f"{type(input)} -> {type(output)}")

Input type                     -> Output type                   
<class 'earthkit.data.readers.grib.file.GRIBReader'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.readers.netcdf.NetCDFFieldListReader'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.readers.netcdf.NetCDFFieldListReader'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.indexing.fieldlist.SimpleFieldList'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'xarray.core.dataarray.DataArray'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'xarray.core.dataset.Dataset'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'pandas.Series'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'numpy.ndarray'> -> <class 'xarray.core.dataarray.DataArray'>


To continue decoding into a timedelta64 dtype, either set `decode_timedelta=True` when opening this dataset, or add the attribute `dtype='timedelta64[ns]'` to this variable on disk.
To opt-in to future behavior, set `decode_timedelta=False`.
  return xr.open_mfdataset(


The example above only uses input objects that can be successfully converted to the requested type, an `xarray.DataArray`. When the conversion is not possible, the decorator will pass the unconverted object to the function and let the function fail naturally. Below we pass a `pandas.DataFrame` object which can not be converted to a `xarray.DataArray` because it has too many columns, i.e. variables:

In [5]:
xrDataArray_function(pandas_df_example)

AssertionError: data is not an xarray.DataArray: got <class 'pandas.DataFrame'>

If the downstream function can handle multiple datatypes. The decorator will first check if it matches any type listed, if not it will then attempt to convert the input in the order of the listed types. In the example below, the function can handle `xarray.DataArray` or `xarray.Dataset`. You can see that the objects from the previous example are converted to `xarray.DataArray`, and the new ones (which cannot be converted to an `xarray.DataArray`) are converted to an `xarray.Dataset`.

In [None]:
@format_handler()
def xrDataArray_function(data: xr.DataArray | xr.Dataset):
    assert isinstance(data, (xr.DataArray, xr.Dataset)), f"data is not an xarray.DataArray or xarray.Dataset: got {type(data)}"
    return data

test_types = [
    ekds_grib,
    ekds_netcdf,
    ekds_netcdf_satellite,
    fieldlist_example,
    xarray_da_example,
    xarray_ds_example,
    pandas_series_example,
    numpy_example,
    pandas_df_example,
    geopandas_example,
    ekds_geojson,
]

print(f"{'Input type':<30} -> {'Output type':<30}")
for input in test_types:
    output = xrDataArray_function(input)
    print(f"{type(input)} -> {type(output)}")

Input type                     -> Output type                   
<class 'earthkit.data.readers.grib.file.GRIBReader'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.readers.netcdf.NetCDFFieldListReader'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.readers.netcdf.NetCDFFieldListReader'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.indexing.fieldlist.SimpleFieldList'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'xarray.core.dataarray.DataArray'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'xarray.core.dataset.Dataset'> -> <class 'xarray.core.dataset.Dataset'>
<class 'pandas.Series'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'numpy.ndarray'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'pandas.DataFrame'> -> <class 'xarray.core.dataset.Dataset'>
<class 'geopandas.geodataframe.GeoDataFrame'> -> <class 'xarray.core.dataset.Dataset'>
<class 'earthkit.data.readers.geojson.GeojsonReader'> -> <clas

To continue decoding into a timedelta64 dtype, either set `decode_timedelta=True` when opening this dataset, or add the attribute `dtype='timedelta64[ns]'` to this variable on disk.
To opt-in to future behavior, set `decode_timedelta=False`.
  return xr.open_mfdataset(


In [None]:
@format_handler()
def np_str_function(data: str | np.ndarray):
    print(type(data))
    return data

this_np_str = np_str_function(ekds_netcdf_satellite)
this_converted_str = np_str_function("a_string")
this_converted_str = np_str_function(1)

<class 'earthkit.data.readers.netcdf.NetCDFFieldListReader'>
<class 'str'>
<class 'str'>


## The `metadata_handler` handler decorator

The `metadata_handler` decorator enables modifications to the metadata based on the function requirements. Currently this includes an option to ensure that units are the correct type, and an option to add provenance metadata to the returned object.

## `ensure_units`

The example below demonstrates how to ensure that the units are in degrees Celsius. The conversion uses [`pint`](https://pint.readthedocs.io/en/stable/), therefore the units string provided must be recognised as such. The source units are detected from the data object metadata. If the source units or target units cannot be detected, no attempt to convert will be made.

In [7]:
@metadata_handler(ensure_units = {"data": "celsius"})
def xrDataArray_convert_units(data: xr.DataArray):
    print(f"Units in function: {data.units}")
    return data

print(f"Input units before function call: {xarray_da_example.units}")
new_data = xrDataArray_convert_units(xarray_da_example)
print(f"Input units after function call: {xarray_da_example.units}")
print(f"New data units after function call: {new_data.units}")

Input units before function call: K
Units in function: celsius
Input units after function call: K
New data units after function call: celsius


The output above demonstrates that units within the function call have been updated to those requests ("celsius"), but upon returning to the top level workflow the units were as initially provided.