# General data reader for AQUA 
## Spatial regridding

The reader includes also regridding functionality. The idea of the regridder (which uses sparse matrix multiplication) is to generate first the weights for the interpolation (an operation which needs to be done only once) and then to use them for each regridding operation.

The regridding weights are generated automatically by the reader if not already existant and stored in a directory specified in a machine-specific `config/machines/<machine-name>/regrid.yaml` file, where `<machine-name>` is the name of the current machine (e.g. "levante"). The same file also contains a list of predefined target grids (only regular lon-lat for now). For example "r100" is a regular grid at 1° resolution. New target grids can be defined in `regrid.yaml`. Since CDO is used to generate the weights, also a sample target file could be used. The precomputed weights are stored in a directory also specified in the `regrid.yaml` configuration file for each machine. This directory could be shared among the members of a research group in order to reduce the need to recompute the weights.

3D weights are particularly expensive in termso of computation and memory. We suggest to run the first instance of the weights generation process (the first time you initialize the Reader for a data source and a specific target grid) on a dedicated computing node. For example, if `nproc=16` computing cores are used (the default - this spawns 16 parallel cdo processes) for FESOM NG5 data, up to about 170 GB of memory will be needed and the calculation will last about 10 minutes. This will need to be done only once, to precompute the weights.

CDO is used for generating the weights and it will be needed in your environment. CDO is not needed to use the regridder, only to generate the weights. The regridder configuration file also stores for now (this may be modified in the future) information on files containing a grid description for different data sources. A fixed path to the CDO executable for each machine can be specified in the `config/config.yaml` file. If not specified the system `$PATH` will be used.

For regridding to work you will need the external [smmregrid](https://github.com/jhardenberg/smmregrid) module.

In [1]:
from aqua import Reader

### Example 1: IFS

We load IFS data specifying that we wish to regrid them to a 2° grid.

In [2]:
reader = Reader(model="IFS", exp="tco2559-ng5", source="ICMGG_atm2d", regrid="r200", fix=False)
data = reader.retrieve()

KeyError: 'regrid'

by default, the data are still on the raw original grid. Let's look at temperature

In [None]:
data["2t"][1,:]

... but we can now ask to regrid it (or part of it) to the destination grid which we chose when we instantiated the reader.

In [None]:
tasr = reader.regrid(data["2t"][0:3,:])

In [None]:
tasr[0,:,:].plot()

### Example 2: ICON

Instantiate a reader for ICON data specifying that we will want to interpolate to a 2 deg grid (the grids are defined in the `regrid.yaml` file and the weights saved in a special directory). If the weights file does not already exist in our collection it will be created automatically.

In [None]:
from aqua import Reader, catalogue
reader = Reader(model="ICON", exp="ngc2009", source="atm_2d_ml_R02B09", regrid="r200")

Load the actual data. By default these data have not been regridded yet. 

You could ask to regrid them directly by specifying the argument `regrid=True` but please be warned that without a selection on dates this will take longer. It is usually more efficient to first load the data, select and then regrid.

In [None]:
data = reader.retrieve()

By default these data have not been regridded yet

In [None]:
data

Now we actually regrid part of the data (the first 96 frames)

In [None]:
tasr = reader.regrid(data['2t'][1:100,:]) 

In [None]:
tasr

In [None]:
tasr.mean("time").plot()

### Example 3: Original 2D FESOM data - further interpolation

In [None]:
from aqua import Reader
reader = Reader(model="FESOM", exp="tco2559-ng5-cycle3", source="2D_daily_native", regrid="r250")

In [None]:
data = reader.retrieve()

In [None]:
sst0=data.zos.isel(time=0)

In [None]:
sst0

In [None]:
sstr = reader.regrid(sst0)

In [None]:
sstr.plot()

### Example 4: Original 3D FESOM data - further interpolation

The regridder can also deal with 3D masked fields, thanks to new functionality in *smmregrid*.

This functionality now works with any input source, even if it contains multiple vertical coordinates.
Let's show this loading original 3D FESOM data. 

Please be advised that calculation of the weights (which will be done only the first time, if weights are not available) is a very memory-intensive task (a full node on levante with 256GB memory may be needed for 3D FESOM data from the tco2559 experiment). Using 16 cores (by default) generation of weights for FESOM, with two vertical coordinates, may take about 40 minutes, so please be patient. 

The weights for the "r100" grid (1° regular grid) case have been precomputed on levante.

In [None]:
from aqua import Reader
reader = Reader(model="FESOM", exp="tco2559-ng5-cycle3", source="3D_daily_native", regrid="r100")

The reader loaded the precomputed 3D weights (pointing to `weights_FESOM_tco2559-ng5_original_3d_ycon_r100_l3d.nc` for example) and organizes them in a dictionary according to the vertical coordinate ("2d" for 2D variables)

In [None]:
reader.weights

We can now retrieve the data

In [None]:
data = reader.retrieve()
data

and regrid them (as an elternative we could have used directly the option "regrid=True" for the `retrieve()` method)

In [None]:
s0 = data.so.isel(time=0)
s0r = reader.regrid(s0)

In [None]:
s0r.isel(nz1=0).plot()

In [None]:
s0r.isel(nz1=55).plot()

We can also ask the regridder to regrid all variables at once upon retrieval (in this case the data include variables depending on two different vertical coordinates):

In [None]:
data = reader.retrieve(regrid=True)
w0 = data.wo.isel(time=0)
w0.isel(nz=5).plot()

### Example 5: ICON on HEALPix

We can also regrid ICON Healpix data from NextGEMS Cycle 3. Here an example for three different zoom level (i.e. grid hierarchy, with 0 as the coarser)

In [None]:
import matplotlib.pyplot as plt
from aqua import Reader

for zoom in [0, 3, 6]:
    reader = Reader(model="ICON", exp="ngc3028", source="P1D", zoom=zoom, regrid="r100")
    data = reader.retrieve()
    tas = reader.regrid(data['2t'][0])
    plt.figure()
    tas.plot()

### Example 6: 3D ICON data (NextGEMS cycle 3, HEALPix)

We load nextgems data on a HEALPix grid at zoom level 6 regridding already when reading the data. The weights will be computed only the first time this is called (if they are not already available in the grids directory).
We retrieve three variables which are 2D (zos) or 3D and living on different vertical levels (tke, to).

In [None]:
from aqua import Reader
zoom=6

reader = Reader(model="ICON", exp="ngc3028", source="P1D", zoom=6, regrid="r100")
data = reader.retrieve(vars=["to", "tke", "zos", "2t"], regrid=True)

3D ocean data have been correctly masked:

In [None]:
data.ocpt.isel(depth_full=60).isel(time=0).plot()

In [None]:
data.tke.isel(depth_half=12).isel(time=0).plot()

In [None]:
data.zos.isel(time=0).plot()

while atmospheric variables have not been masked:

In [None]:
data["2t"].isel(time=0).plot()

The Reader can distinguish ICON atmospheric from ocean components checking for the presence of the "component: ocean" attribute (actually any attribute could be used, this is defined in regrid.yaml)