In [1]:
!test -f test.grib || wget https://github.com/ecmwf/emohawk/raw/main/docs/examples/test.grib
!test -f tuv_pl.grib || wget https://github.com/ecmwf/emohawk/raw/main/docs/examples/tuv_pl.grib
!test -d _grib_dir_with_sql || (mkdir -p _grib_dir_with_sql; cp -f test.grib tuv_pl.grib _grib_dir_with_sql/)

### Using SQL-based GRIB indexing 

We load all the GRIB files in a directory and generate indexing based on the MARS keys. The indexing is stored in a sqlite database (located in the emohawk cache). Subsequent loading of the data is very fast becuase it used the database in the cache.

In [2]:
import emohawk

fs = emohawk.load_from("directory", "./_grib_dir_with_sql")
type(fs)

emohawk.readers.grib.index.FieldsetInFilesWithSqlIndex

The resulting object behaves like any other GRIB objects.

In [3]:
len(fs)

20

#### Methods using the SQL database

When calling sel() and order_by() the metadata is directly read from the SQL database, so there is no need to load/open any of the GRIB messages.

In [4]:
a = fs.sel(level=500)
type(a)

emohawk.readers.grib.index.FieldsetInFilesWithSqlIndex

In [5]:
fs.order_by("shortName")
type(fs)

emohawk.readers.grib.index.FieldsetInFilesWithSqlIndex

#### Methods not using the SQL database

Most of the other methods still need to load/open the GRIB messages to extract the required metadata. Ideally they should all use the database. 

In [6]:
print(fs[1])

GribField(u,1000,20180801,1200,0,0)


In [7]:
g = fs[2:4]
g.ls()

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,v,isobaricInhPa,1000,20180801,1200,0,an,0,regular_ll
1,ecmf,t,isobaricInhPa,850,20180801,1200,0,an,0,regular_ll


#### Availability

The availabilty tells us what MARS settings we would need if we wanted to donwload the data from the MARS archive.

In [8]:
a.availability

'andate=None, anoffset=None, antime=None, channel=None, class=od, date=20180801, diagnostic=None, direction=None, domain=g, expver=0001, fcmonth=None, fcperiod=None, frequency=None, hdate=None, ident=None, instrument=None, iteration=None, leadtime=None, levelist=500, levtype=pl, method=None, number=None, obstype=None, opttime=None, origin=None, param=[t, u, v], quantile=None, range=None, reference=None, reportype=None, step=0, stream=oper, time=1200, type=an, verify=None\n'