## Using GRIB indexing 

In [1]:
import earthkit.data

First we prepare the data:

In [2]:
earthkit.data.download_example_file(["test.grib", "tuv_pl.grib"])

In [3]:
!test -d _grib_dir_with_sql || (mkdir -p _grib_dir_with_sql; cp -f test.grib tuv_pl.grib _grib_dir_with_sql/)

### Indexing

We can perform indexing on the input GRIB data by using the **indexing** option in *load_source()*. The indexing is performed on first data access using the MARS ecCodes keys. The index-data is stored in a sqlite database, which is located in the earthkit-data cache. Subsequent loading of the same data is very fast because it will use the cached index-data.

In [4]:
fs = earthkit.data.from_source("file", "tuv_pl.grib", indexing=True)

In [5]:
len(fs)

18

Indexing works for a list of files or directories (here "\_grib_dir_with_sql") is a directory:

In [6]:
fs = earthkit.data.from_source("file", "./_grib_dir_with_sql", indexing=True)

In [7]:
len(fs)

20

### Methods using the SQL database

When calling *sel()*, *isel()* or *order_by()* the metadata is directly read from the index database, so there is no need to load/open any of the GRIB messages.

In [8]:
a = fs.sel(level=500)
len(a)

3

In [9]:
a = fs.order_by("param")
len(a)

20

In [10]:
a = fs.isel(level=2)
len(a)

3

When keys not present in the index db are used the functions above do not work:

In [11]:
try:
    a = fs.sel(gridType="regular_ll")
except KeyError as e:
    print(f"error: {e}")   

###Â Methods not using the SQL database

Most of the other methods still need to load/open the GRIB messages to extract the required metadata. Ideally they should all use the index database. 

In [12]:
print(fs[1])

GribField(u,1000,20180801,1200,0,0)


In [13]:
fs[2:4].metadata("param")

['v', 't']

### from_source() arguments

Selection and sorting arguments can be directly passed to *from_souce()*:

In [14]:
fs = earthkit.data.from_source("file", "./_grib_dir_with_sql", indexing=True, level=500)
fs.ls()

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,t,isobaricInhPa,500,20180801,1200,0,an,0,regular_ll
1,ecmf,u,isobaricInhPa,500,20180801,1200,0,an,0,regular_ll
2,ecmf,v,isobaricInhPa,500,20180801,1200,0,an,0,regular_ll


In [15]:
fs = earthkit.data.from_source("file", "./_grib_dir_with_sql", indexing=True, level=[500, 850], order_by="variable")
fs.ls()

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,t,isobaricInhPa,850,20180801,1200,0,an,0,regular_ll
1,ecmf,t,isobaricInhPa,500,20180801,1200,0,an,0,regular_ll
2,ecmf,u,isobaricInhPa,850,20180801,1200,0,an,0,regular_ll
3,ecmf,u,isobaricInhPa,500,20180801,1200,0,an,0,regular_ll
4,ecmf,v,isobaricInhPa,850,20180801,1200,0,an,0,regular_ll
5,ecmf,v,isobaricInhPa,500,20180801,1200,0,an,0,regular_ll
