## Reading GRIB from FDB

In [1]:
 import earthkit.data

<div class="alert alert-block alert-warning">
This example is only supposed to work on the ECMWS ATOS supercomputer. It requires FDB access and the <b>FDB_HOME</b> environment variable has to be set correctly.
</div>

In [2]:
# date must be adjusted since FDB at ECMWF only stores the most recent dates
request = {
    'class': 'od',
    'expver': '0001',
    'stream': 'oper',
    'date': '20230524',
    'time': [0, 12],
    'domain': 'g',
    'type': 'an',
    'levtype': 'sfc',
    'step': 0,
    'param': [151, 167]
}

### Reading as a stream

We can retrieve data from FDB as a stream.

#### Stream: iteration with one field at a time in memory

In [3]:
ds = earthkit.data.from_source("fdb", request)

Nothing is read at this moment.

We can only use *ds* for iteration. Fields crerated in the iteration get deleted when going out of scope:

In [4]:
for f in ds:
    print(f"  param={f['param']} shape={f.values.shape} mean={f.values.mean()}")

  param=msl shape=(6599680,) mean=101179.41746872576
  param=2t shape=(6599680,) mean=289.94663208132107
  param=msl shape=(6599680,) mean=101172.08997167287
  param=2t shape=(6599680,) mean=290.78671716434076


Once the iteration is completed, there is nothing left in ds.

In [5]:
for f in ds:
    print(f"type(f)={type(f)}")

#### Stream: using batch_size

We can read multiple fields into memory from the stream at a time by using **batch_size** in *from_source()*:

In [6]:
ds = earthkit.data.from_source("fdb", request, batch_size=2)

In [7]:
for f in ds:
    # f is a FieldList containing 2 fields. It gets deleted when going out of scope
    print(len(f))
    print(f.metadata("param"))

2
['msl', '2t']
2
['msl', '2t']


#### Stream: storing all the fields in memory

In [8]:
ds = earthkit.data.from_source("fdb", request, batch_size=0)

Nothing is read at this moment:

In [9]:
print(f"stored fields count={len(ds._reader._fields)}")

stored fields count=0


If we call any function on the fieldlist it reads the messages into memory

In [10]:
len(ds)

4

In [11]:
print(f"stored fields count={len(ds._reader._fields)}")

stored fields count=4


In [12]:
ds.ls()

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,msl,surface,0,20230524,0,0,an,0,reduced_gg
1,ecmf,2t,surface,0,20230524,0,0,an,0,reduced_gg
2,ecmf,msl,surface,0,20230524,1200,0,an,0,reduced_gg
3,ecmf,2t,surface,0,20230524,1200,0,an,0,reduced_gg


In [13]:
ds.sel(param="2t").ls()

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,2t,surface,0,20230524,0,0,an,0,reduced_gg
1,ecmf,2t,surface,0,20230524,1200,0,an,0,reduced_gg


In [14]:
ds.to_xarray()

### Reading into a file

We can retrieve data from FDB into a file, which is located in the cache: 

In [15]:
ds = earthkit.data.from_source("fdb", request, stream=False)

In [16]:
ds.ls()

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,msl,surface,0,20230524,0,0,an,0,reduced_gg
1,ecmf,2t,surface,0,20230524,0,0,an,0,reduced_gg
2,ecmf,msl,surface,0,20230524,1200,0,an,0,reduced_gg
3,ecmf,2t,surface,0,20230524,1200,0,an,0,reduced_gg


The data is now cached. Subsequent retrivals will used the cached file directly.