## Xarray engine: auxiliary coordinates

In [1]:
import earthkit.data as ekd

### Basic examples

In [2]:
ds_fl = ekd.from_source("sample", "pl.grib")
ds_fl.ls().head()

                                                                                                                                                                                                                      

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,t,isobaricInhPa,700,20240603,0,0,fc,0,regular_ll
1,ecmf,r,isobaricInhPa,700,20240603,0,0,fc,0,regular_ll
2,ecmf,t,isobaricInhPa,500,20240603,0,0,fc,0,regular_ll
3,ecmf,r,isobaricInhPa,500,20240603,0,0,fc,0,regular_ll
4,ecmf,t,isobaricInhPa,700,20240603,0,6,fc,0,regular_ll


In [3]:
ds = ds_fl.to_xarray(
    profile="grib",
    aux_coords={"expver": ("expver", "forecast_reference_time")}, 
)
ds.load()

In [4]:
ds2 = ds_fl.to_xarray(
    profile="grib",
    remapping={"centre_expver": "{centre}_{expver}"}, 
    aux_coords={"centre_and_expver": ("centre_expver", ("forecast_reference_time", "step"))}, 
)
ds2.load()

The feature of declaring auxiliary coordinates works also with the mono_variable setting

In [5]:
ds3 = ds_fl.to_xarray(
    profile="grib",
    fixed_dims=["param", "forecast_reference_time", "step", "level"], 
    mono_variable=True, 
    remapping={"centre_expver": "{centre}_{expver}"}, 
    aux_coords={"centre_and_expver": ("centre_expver", ("forecast_reference_time", "step"))}, 
)
ds3.load()

### Quantiles in a probabilistic forecast

Let us now consider a probabilistic forecast of 2-metre temperature.

In [6]:
ds_fl2 = ekd.from_source("sample", "quantiles_pd.grib")

                                                                                                                                                                                                                      

In this dataset, the fields are indexed by the metadata key ``"quantile"``, which is in turn composed of ``"number"`` and ``"numberOfForecastsInEnsemble"``

In [7]:
ds_fl2.ls(keys=[
    "shortName", 
    "dataDate", 
    "dataTime", 
    "stepRange", 
    "dataType", 
    "quantile", 
    "number", 
    "numberOfForecastsInEnsemble"
])

Unnamed: 0,shortName,dataDate,dataTime,stepRange,dataType,quantile,number,numberOfForecastsInEnsemble
0,2tp,20251209,0,0-168,pd,1:3,1,3
1,2tp,20251209,0,0-168,pd,1:5,1,5
2,2tp,20251209,0,0-168,pd,1:10,1,10
3,2tp,20251209,0,0-168,pd,2:3,2,3
4,2tp,20251209,0,0-168,pd,2:5,2,5
5,2tp,20251209,0,0-168,pd,2:10,2,10
6,2tp,20251209,0,0-168,pd,3:3,3,3
7,2tp,20251209,0,0-168,pd,3:5,3,5
8,2tp,20251209,0,0-168,pd,3:10,3,10
9,2tp,20251209,0,0-168,pd,4:5,4,5


Note that, in this context, the usual meaning of the metadata key ``"number"`` (and the related ``"numberOfForecastsInEnsemble"``) is overridden by ``"quantile"``. As a result, the ensemble dimension normally derived from ``"number"`` is no longer applicable.

For this reason, we must:
- declare ``"quantile"`` as an extra dimension, and
- remove the predefined ensemble dimension ``"number"``, since it would otherwise conflict with the ``"quantile"`` dimension.

Still, it might be useful to keep the information carried by ``"number"`` and ``"numberOfForecastsInEnsemble"`` is auxiliary coordinates.

In [8]:
ds4 = ds_fl2.to_xarray(
    profile="grib", 
    squeeze=False, 
    extra_dims="quantile", 
    drop_dims="number", 
    add_earthkit_attrs=False, 
    aux_coords={
        "quantile_rank": ("number", "quantile"), 
        "nquantiles": ("numberOfForecastsInEnsemble", "quantile")
    }
)
ds4.load()