## Xarray engine: extra dimensions

In [1]:
import earthkit.data as ekd

### 2D wave spectra example

We analyse a 2D wave spectra product at 2025-12-15 00 UTC and 03 UTC.
A specific feature of this dataset is that the fields are additionally
indexed by **wavelength** and **frequency**, on top of the standard
temporal dimension.

In [2]:
ds_fl = ekd.from_source("sample", "2d-wave-spectra_an.grib")

                                                                                

In [3]:
ds_fl.ls(extra_keys=["directionNumber", "frequencyNumber"])

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType,directionNumber,frequencyNumber
0,ecmf,2dfd,meanSea,0,20251215,0,0,an,0,regular_ll,1,1
1,ecmf,2dfd,meanSea,0,20251215,0,0,an,0,regular_ll,2,1
2,ecmf,2dfd,meanSea,0,20251215,0,0,an,0,regular_ll,3,1
3,ecmf,2dfd,meanSea,0,20251215,0,0,an,0,regular_ll,4,1
4,ecmf,2dfd,meanSea,0,20251215,0,0,an,0,regular_ll,5,1
...,...,...,...,...,...,...,...,...,...,...,...,...
2083,ecmf,2dfd,meanSea,0,20251215,300,0,an,0,regular_ll,32,29
2084,ecmf,2dfd,meanSea,0,20251215,300,0,an,0,regular_ll,33,29
2085,ecmf,2dfd,meanSea,0,20251215,300,0,an,0,regular_ll,34,29
2086,ecmf,2dfd,meanSea,0,20251215,300,0,an,0,regular_ll,35,29


To represent this structure in Xarray, the predefined dimensions of the
Xarray engine must therefore be complemented with dimensions derived
from the metadata keys ``"directionNumber"`` and ``"frequencyNumber"``.

In [4]:
ds = ds_fl.to_xarray(
    profile="grib", 
    extra_dims=["directionNumber", "frequencyNumber"], 
    add_earthkit_attrs=False, 
)
ds

The ``extra_dims`` option also supports defining an explicit mapping
between the name of an extra dimension and the corresponding metadata
key, in a way that is conceptually similar to **dimension roles**.

In [5]:
ds2 = ds_fl.to_xarray(
    profile="grib", 
    extra_dims=[
        {"d": "directionNumber"}, 
        {"f": "frequencyNumber"}
    ],
    add_earthkit_attrs=False, 
)
ds2

### Quantiles in a probabilistic forecast

Let us now consider a probabilistic forecast of 2-metre temperature.

In [6]:
ds_fl2 = ekd.from_source("sample", "quantiles_pd.grib")

                                                                                

In this dataset, the fields are indexed by the metadata key ``"quantile"``.

In [7]:
ds_fl2.ls(keys=[
    "shortName", 
    "dataDate", 
    "dataTime", 
    "stepRange", 
    "dataType", 
    "quantile", 
    "number", 
    "numberOfForecastsInEnsemble"
])

Unnamed: 0,shortName,dataDate,dataTime,stepRange,dataType,quantile,number,numberOfForecastsInEnsemble
0,2tp,20251209,0,0-168,pd,1:3,1,3
1,2tp,20251209,0,0-168,pd,1:5,1,5
2,2tp,20251209,0,0-168,pd,1:10,1,10
3,2tp,20251209,0,0-168,pd,2:3,2,3
4,2tp,20251209,0,0-168,pd,2:5,2,5
5,2tp,20251209,0,0-168,pd,2:10,2,10
6,2tp,20251209,0,0-168,pd,3:3,3,3
7,2tp,20251209,0,0-168,pd,3:5,3,5
8,2tp,20251209,0,0-168,pd,3:10,3,10
9,2tp,20251209,0,0-168,pd,4:5,4,5


Note that, in this context, the usual meaning of the metadata key ``"number"`` (and the related ``"numberOfForecastsInEnsemble"``) is overridden by ``"quantile"``. As a result, the ensemble dimension normally derived from ``"number"`` is no longer applicable.

For this reason, we must:

- declare ``"quantile"`` as an extra dimension, and

- remove the predefined ensemble dimension ``"number"``, since it would otherwise conflict with the ``"quantile"`` dimension.

In [8]:
ds3 = ds_fl2.to_xarray(
    profile="grib", 
    squeeze=False, 
    extra_dims="quantile", 
    drop_dims="number", 
    add_earthkit_attrs=False, 
)
ds3

### The option ``ensure_dims`` vs ``extra_dims``

The ``extra_dims`` and ``ensure_dims`` options partially overlap in their usage - when introducing a new dimension that must not be squeezed, it is sufficient to list it in ``ensure_dims``. In this case, there is no need to repeat the same dimension in ``extra_dims``.

In [9]:
ds4 = ds_fl2.sel(quantile="2:3").to_xarray(
    profile="grib", 
    squeeze=True, 
    ensure_dims="quantile", 
    drop_dims="number", 
    add_earthkit_attrs=False, 
)
ds4