## Xarray engine: mono variable

This notebook demonstrates how to generate an Xarray with a single dataarray containing all the parameters from a GRIB fieldlist. This data structure is often needed for machine learning.

First, we get 2m temperature and dewpoint data for a whole year on a low resolution regular latitude-longitude grid. It contains 2 fields per day (at 0 and 12 UTC) per parameter. 

In [1]:
import earthkit.data as ekd
ds_fl = ekd.from_source("sample", "t2_td2_1_year.grib")
len(ds_fl)

t2_td2_1_year.grib:   0%|          | 0.00/515k [00:00<?, ?B/s]

1464

In [2]:
ds = ds_fl.to_xarray(fixed_dims=["valid_time", "param"],
                     ensure_dims=["number"],
                     mono_variable=True,
                     chunks={"valid_time": 1},                    
                     flatten_values=True,                   
                     add_earthkit_attrs=False,   
                    )
ds

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 72 B 72 B Shape (9,) (9,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",9  1,

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 72 B 72 B Shape (9,) (9,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",9  1,

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,102.94 kiB,144 B
Shape,"(732, 2, 9)","(1, 2, 9)"
Dask graph,732 chunks in 2 graph layers,732 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 102.94 kiB 144 B Shape (732, 2, 9) (1, 2, 9) Dask graph 732 chunks in 2 graph layers Data type float64 numpy.ndarray",9  2  732,

Unnamed: 0,Array,Chunk
Bytes,102.94 kiB,144 B
Shape,"(732, 2, 9)","(1, 2, 9)"
Dask graph,732 chunks in 2 graph layers,732 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


When generating the Xarray we flattened the field values and chose the chunking so that one chunk contains all the data belonging to a given valid time.

In [3]:
ds["data"]

Unnamed: 0,Array,Chunk
Bytes,102.94 kiB,144 B
Shape,"(732, 2, 9)","(1, 2, 9)"
Dask graph,732 chunks in 2 graph layers,732 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 102.94 kiB 144 B Shape (732, 2, 9) (1, 2, 9) Dask graph 732 chunks in 2 graph layers Data type float64 numpy.ndarray",9  2  732,

Unnamed: 0,Array,Chunk
Bytes,102.94 kiB,144 B
Shape,"(732, 2, 9)","(1, 2, 9)"
Dask graph,732 chunks in 2 graph layers,732 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 72 B 72 B Shape (9,) (9,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",9  1,

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 72 B 72 B Shape (9,) (9,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",9  1,

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


#### Adding ensemble dimension

We add the ensemble member as an additional dimension to the generated Xarray. Because the input is not ensemble data the value of the "number" ecCodes key can be missing. So we need to provide a meaningful default with the ``deafults`` kwarg to be able to build the "number" dimension. 

In [4]:
ds = ds_fl.to_xarray(fixed_dims=["valid_time", "param", "number"],
                     ensure_dims=["number"],
                     mono_variable=True,
                     chunks={"valid_time": 1},                    
                     flatten_values=True,                   
                     add_earthkit_attrs=False,   
                     defaults={"number": 0},
                    )
ds

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 72 B 72 B Shape (9,) (9,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",9  1,

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 72 B 72 B Shape (9,) (9,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",9  1,

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(9,)","(9,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,102.94 kiB,144 B
Shape,"(732, 2, 1, 9)","(1, 2, 1, 9)"
Dask graph,732 chunks in 2 graph layers,732 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 102.94 kiB 144 B Shape (732, 2, 1, 9) (1, 2, 1, 9) Dask graph 732 chunks in 2 graph layers Data type float64 numpy.ndarray",732  1  9  1  2,

Unnamed: 0,Array,Chunk
Bytes,102.94 kiB,144 B
Shape,"(732, 2, 1, 9)","(1, 2, 1, 9)"
Dask graph,732 chunks in 2 graph layers,732 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
