## GRIB: selection using metadata

We read an example GRIB file containing 18 messages.

In [1]:
import earthkit.data as ekd

ds = ekd.from_source("sample", "tuv_pl.grib")
len(ds)

tuv_pl.grib:   0%|          | 0.00/4.22k [00:00<?, ?B/s]

18

### Using sel

In [2]:
a = ds.sel(level=500)
a.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll
1,u,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll
2,v,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


We can use a dict instead of keyword arguments:

In [3]:
a = ds.sel({"level": 500, "variable": "v"})
a.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,v,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


Lists are accepted:

In [4]:
a = ds.sel(level=[500, 850])
a.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
1,u,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
2,v,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
3,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll
4,u,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll
5,v,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


Slices can define closed intervals, so they are treated as inclusive of both the start and stop values, unlike normal Python indexing:

In [5]:
a = ds.sel(variable="t", level=slice(500, 850))
a.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
1,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,700,pressure,0,regular_ll
2,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


### Using isel

In [6]:
a = ds.isel(level=0)
a.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,300,pressure,0,regular_ll
1,u,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,300,pressure,0,regular_ll
2,v,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,300,pressure,0,regular_ll


In [7]:
a = ds.isel({"level": 2, "shortName": 1})
a.ls()

In [8]:
a = ds.isel(level=[2,3], param=0)
a.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,700,pressure,0,regular_ll
1,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


Slices are used as in normal Python indexing: 

In [9]:
a = ds.isel(level=slice(2,5), param=0)
a.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
1,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,700,pressure,0,regular_ll
2,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


### Using order_by

In [10]:
b = a.order_by()
b.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
1,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,700,pressure,0,regular_ll
2,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


The sorting keys can be specified as a list:

In [11]:
b = a.order_by(["variable"])
b.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
1,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,700,pressure,0,regular_ll
2,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


We can prescribe the actual order within a key. It only works when all the possible values are specified:

In [12]:
a = a.order_by(variable=["v", "t", "u"])
a.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
1,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,700,pressure,0,regular_ll
2,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


### Combining sel and order_by

In [13]:
a = ds.sel(level=[500, 850]).order_by(["variable"])
a.ls()

Unnamed: 0,variable,valid_datetime,base_datetime,step,level,vertical_type,member,grid_type
0,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
1,t,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll
2,u,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
3,u,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll
4,v,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,850,pressure,0,regular_ll
5,v,2018-08-01 12:00:00,2018-08-01 12:00:00,0 days,500,pressure,0,regular_ll


### Using indices

In [14]:
ds.indices()

{'param': ['t', 'u', 'v'],
 'level': [300, 400, 500, 700, 850, 1000],
 'shortName': []}

We can use the *squeeze* option to see only the keys having more than one values:

In [15]:
ds.indices(squeeze=True)

{'param': ['t', 'u', 'v'], 'level': [300, 400, 500, 700, 850, 1000]}

In [16]:
ds.index("param")

['t', 'u', 'v']

In [17]:
ds.index("date")

KeyError: 'date'

Aliases can be used. E.g. instead of levelist we can use level:

In [None]:
ds.index("level")

Count the number of fields for each available level:

In [None]:
for level in ds.index("level"):
    print(f"level={level} len={len(ds.sel(level=level))}")