# Testing earthkit-data xarray engine on single level ERA5 data from CDS

To run this notebook, install this version of earthkit-data package:
https://github.com/ecmwf/earthkit-data/tree/feature/improve-xr-splitter

In [1]:
import earthkit.data as ekd

Load a dataset containing 30k+ GRIB messages with single level ERA5 data from CDS (with degraded spatial resolution)

In [2]:
fl = ekd.from_source('url', 'https://get.ecmwf.int/repository/test-data/earthkit-data/test-data/xr_engine/cds-reanalysis-era5-single-levels-20230101-low-resol.grib')

                                                                                                                                                        

Explore the content of the dataset

In [3]:
fl.unique_values('edition', 'stream', 'dataType', 'stepType', 'edition', 'gridType', 'Ni')

{'edition': (1, 2),
 'stream': ('oper', 'wave', 'ewda', 'enda'),
 'dataType': ('an', 'fc', 'em', 'es'),
 'stepType': ('instant', 'accum', 'max', 'avg'),
 'gridType': ('regular_ll',),
 'Ni': (36, 18, 12)}

In [4]:
fl_ls = fl.ls(extra_keys=['stream', 'stepType', 'step', 'Ni', 'Nj', 'validityDate', 'validityTime', 
                         'gridType', 'md5GridSection', 'bitmapPresent', 'gridSpec', 'edition'])
fl_ls

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType,...,stepType,step,Ni,Nj,validityDate,validityTime,md5GridSection,bitmapPresent,gridSpec,edition
0,ecmf,10u,surface,0,20230101,0,0,an,0,regular_ll,...,instant,0,36,19,20230101,0,33c7d6025995e1b4913811e77d38ec50,0,,1
1,ecmf,10v,surface,0,20230101,0,0,an,0,regular_ll,...,instant,0,36,19,20230101,0,33c7d6025995e1b4913811e77d38ec50,0,,1
2,ecmf,2d,surface,0,20230101,0,0,an,0,regular_ll,...,instant,0,36,19,20230101,0,33c7d6025995e1b4913811e77d38ec50,0,,1
3,ecmf,2t,surface,0,20230101,0,0,an,0,regular_ll,...,instant,0,36,19,20230101,0,33c7d6025995e1b4913811e77d38ec50,0,,1
4,ecmf,msl,surface,0,20230101,0,0,an,0,regular_ll,...,instant,0,36,19,20230101,0,33c7d6025995e1b4913811e77d38ec50,0,,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12611,ecmf,swvl1,depthBelowLandLayer,0,20230101,2100,0,es,0,regular_ll,...,instant,0,18,9,20230101,2100,3d13e67882e20f1c127f846bdc472564,0,,1
12612,ecmf,swvl2,depthBelowLandLayer,7,20230101,2100,0,es,0,regular_ll,...,instant,0,18,9,20230101,2100,3d13e67882e20f1c127f846bdc472564,0,,1
12613,ecmf,swvl3,depthBelowLandLayer,28,20230101,2100,0,es,0,regular_ll,...,instant,0,18,9,20230101,2100,3d13e67882e20f1c127f846bdc472564,0,,1
12614,ecmf,swvl4,depthBelowLandLayer,100,20230101,2100,0,es,0,regular_ll,...,instant,0,18,9,20230101,2100,3d13e67882e20f1c127f846bdc472564,0,,1


Here we see that md5GridSection key is not ideal to look for GRIB messages having the same grid: grid section is organised differently in GRIB edition 1 and 2

In [5]:
fl_ls[['edition', 'Ni', 'md5GridSection']].value_counts().reset_index().sort_values('Ni')

Unnamed: 0,edition,Ni,md5GridSection,count
2,1,12,e09e4d6171c0ac85da1d256b2f8acf88,1840
0,1,18,3d13e67882e20f1c127f846bdc472564,5640
3,2,18,82a7e502a7ebe916255822ef509349d8,24
1,1,36,33c7d6025995e1b4913811e77d38ec50,5112


Some further metadata exploration

In [6]:
fl_ls[['dataDate', 'dataTime', 'stepType', 'step', 'stepRange', 'validityDate', 'validityTime']].value_counts().reset_index()

Unnamed: 0,dataDate,dataTime,stepType,step,stepRange,validityDate,validityTime,count
0,20230101,900,instant,0,0,20230101,900,568
1,20230101,2100,instant,0,0,20230101,2100,568
2,20230101,300,instant,0,0,20230101,300,568
3,20230101,1800,instant,0,0,20230101,1800,568
4,20230101,600,instant,0,0,20230101,600,568
...,...,...,...,...,...,...,...,...
133,20230101,600,max,7,6-7,20230101,1300,5
134,20230101,600,max,8,7-8,20230101,1400,5
135,20230101,600,max,10,9-10,20230101,1600,5
136,20230101,600,max,11,10-11,20230101,1700,5


In [7]:
fl_ls['number'].value_counts()

number
0    12616
Name: count, dtype: int64

An example of conversion to NetCDF using splitting wrt several keys

In [8]:
dss, split_coords_list = fl.to_xarray(
    split_dims=['stream', 'dataType', 'edition', 'Ni'], 
    time_dim_mode='valid_time', 
    squeeze=False, 
)
len(dss)

11

In [9]:
split_coords_list

[{'stream': 'enda', 'dataType': 'an', 'edition': 1, 'Ni': 18},
 {'stream': 'enda', 'dataType': 'em', 'edition': 1, 'Ni': 18},
 {'stream': 'enda', 'dataType': 'es', 'edition': 1, 'Ni': 18},
 {'stream': 'enda', 'dataType': 'fc', 'edition': 1, 'Ni': 18},
 {'stream': 'enda', 'dataType': 'fc', 'edition': 2, 'Ni': 18},
 {'stream': 'ewda', 'dataType': 'an', 'edition': 1, 'Ni': 12},
 {'stream': 'ewda', 'dataType': 'em', 'edition': 1, 'Ni': 12},
 {'stream': 'ewda', 'dataType': 'es', 'edition': 1, 'Ni': 12},
 {'stream': 'oper', 'dataType': 'an', 'edition': 1, 'Ni': 36},
 {'stream': 'oper', 'dataType': 'fc', 'edition': 1, 'Ni': 36},
 {'stream': 'wave', 'dataType': 'an', 'edition': 1, 'Ni': 18}]

In [10]:
dss[0]