## The Reader class

First of all we import the Reader class, cornerstone of AQUA.

In [1]:
from aqua import Reader

In AQUA the available data are organized in a catalog, a collection of yaml file where the technical details of the data are hidden to the final user. The catalogs are then made to be shared among users in order to facilitate the data retrieval.

When we instanciate the Reader class, we're not yet retrieving the data, but just preparing the Reader object to do so.

The mandatory arguments to pass to the Reader class are the three level hierarchy of the AQUA catalog:

model, exp, source

where:

- `model` is the name of the climate model, reanalysis or observational dataset
- `exp` is the name of the experiment. For production run it is control-1990, historical-1990 or ssp370, while for test run it is possible that is the expid of autosubmit
- `source` is a third level that can be used to specify frequency, realm, resolution, etc.

There is another argument very important, the `catalog`, which specify in which catalog the yaml files are stored. The `Reader` is able to guess the catalog if the `catalog` argument is not passed, but if you are afraid there can be another catalog with the same triplet model-exp-source, you must specify it.

Let's try to open the monthly dataset of the ERA5 reanalysis.

In [2]:
reader_era5 = Reader(catalog='obs', model='ERA5', exp='era5', source='monthly')

In order to have our dataset as a xarray.Dataset object, we must call the `retrieve` method of the Reader object.

In [3]:
data_era5 = reader_era5.retrieve()

As you can see the data_era5 object is a `xarray.Dataset` object, containing all the variables of the dataset at all times.
The data are lazily loaded, meaning that the data are not loaded in memory until you ask for it. This is very useful when you have to deal with big datasets and it allows to natively work with `dask` to parallelize the code.

In [4]:
data_era5

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 4 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 5 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 4 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 4 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 4 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 4 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,31.19 GiB,47.53 MiB
Shape,"(1008, 8, 721, 1440)","(12, 1, 721, 1440)"
Dask graph,672 chunks in 5 graph layers,672 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 31.19 GiB 47.53 MiB Shape (1008, 8, 721, 1440) (12, 1, 721, 1440) Dask graph 672 chunks in 5 graph layers Data type float32 numpy.ndarray",1008  1  1440  721  8,

Unnamed: 0,Array,Chunk
Bytes,31.19 GiB,47.53 MiB
Shape,"(1008, 8, 721, 1440)","(12, 1, 721, 1440)"
Dask graph,672 chunks in 5 graph layers,672 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 5 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 5 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 5 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 4 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 5 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,31.19 GiB,47.53 MiB
Shape,"(1008, 8, 721, 1440)","(12, 1, 721, 1440)"
Dask graph,672 chunks in 5 graph layers,672 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 31.19 GiB 47.53 MiB Shape (1008, 8, 721, 1440) (12, 1, 721, 1440) Dask graph 672 chunks in 5 graph layers Data type float32 numpy.ndarray",1008  1  1440  721  8,

Unnamed: 0,Array,Chunk
Bytes,31.19 GiB,47.53 MiB
Shape,"(1008, 8, 721, 1440)","(12, 1, 721, 1440)"
Dask graph,672 chunks in 5 graph layers,672 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 4 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 4 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 4 graph layers,84 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 5 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 5 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.90 GiB 47.53 MiB Shape (1008, 721, 1440) (12, 721, 1440) Dask graph 84 chunks in 5 graph layers Data type float32 numpy.ndarray",1440  721  1008,

Unnamed: 0,Array,Chunk
Bytes,3.90 GiB,47.53 MiB
Shape,"(1008, 721, 1440)","(12, 721, 1440)"
Dask graph,84 chunks in 5 graph layers,84 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,31.19 GiB,47.53 MiB
Shape,"(1008, 8, 721, 1440)","(12, 1, 721, 1440)"
Dask graph,672 chunks in 5 graph layers,672 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 31.19 GiB 47.53 MiB Shape (1008, 8, 721, 1440) (12, 1, 721, 1440) Dask graph 672 chunks in 5 graph layers Data type float32 numpy.ndarray",1008  1  1440  721  8,

Unnamed: 0,Array,Chunk
Bytes,31.19 GiB,47.53 MiB
Shape,"(1008, 8, 721, 1440)","(12, 1, 721, 1440)"
Dask graph,672 chunks in 5 graph layers,672 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,31.19 GiB,47.53 MiB
Shape,"(1008, 8, 721, 1440)","(12, 1, 721, 1440)"
Dask graph,672 chunks in 5 graph layers,672 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 31.19 GiB 47.53 MiB Shape (1008, 8, 721, 1440) (12, 1, 721, 1440) Dask graph 672 chunks in 5 graph layers Data type float32 numpy.ndarray",1008  1  1440  721  8,

Unnamed: 0,Array,Chunk
Bytes,31.19 GiB,47.53 MiB
Shape,"(1008, 8, 721, 1440)","(12, 1, 721, 1440)"
Dask graph,672 chunks in 5 graph layers,672 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


We can finally have some info on the source with the `info` method of the Reader object.

In [5]:
reader_era5.info()

Reader for model ERA5, experiment era5, source monthly
Data fixing is active:
  Fixer name is ERA5-destine-v1
Metadata:
  source_grid_name: era5-r025
  fixer_name: ERA5-destine-v1
  catalog_dir: /home/b/b382289/.aqua/catalogs/obs/catalog/ERA5/
  dims: {'time': 1008, 'lon': 1440, 'lat': 721, 'plev': 8}
  data_vars: {'CI': ['time', 'lon', 'lat'], 'E': ['time', 'lon', 'lat'], 'HCC': ['time', 'lon', 'lat'], 'LCC': ['time', 'lon', 'lat'], 'MCC': ['time', 'lon', 'lat'], 'MSL': ['time', 'lon', 'lat'], 'Q': ['time', 'lon', 'lat', 'plev'], 'SLHF': ['time', 'lon', 'lat'], 'SSHF': ['time', 'lon', 'lat'], 'SSR': ['time', 'lon', 'lat'], 'SSTK': ['time', 'lon', 'lat'], 'STR': ['time', 'lon', 'lat'], 'T': ['time', 'lon', 'lat', 'plev'], 'T2M': ['time', 'lon', 'lat'], 'TCC': ['time', 'lon', 'lat'], 'TP': ['time', 'lon', 'lat'], 'TSR': ['time', 'lon', 'lat'], 'TTR': ['time', 'lon', 'lat'], 'U': ['time', 'lon', 'lat', 'plev'], 'V': ['time', 'lon', 'lat', 'plev']}
  coords: ('time', 'lon', 'lat', 'plev')

## Reader important arguments

We've seen the mandatory arguments to pass to the Reader class, but there are other important arguments that can be passed to the Reader class:

- `fix`, True by default, it will fix the metadata of the dataset, variable names, units, etc. By default it will make all your data GSV compliant, but custom fixes can be built, in order to adapt your dataset to another standard.
- `areas`, it is the other option True by default. If True, the Reader will load the areas of the grid cells, which are needed to compute the area weighted mean of the data.
- `regrid`, similarly to the `areas` argument, if you want to regrid your data the Reader has to load the grid of the target dataset. The regrid argument has to be enabled and it has to have as value the resolution of the target grid, e.g. 'r100' for a 1°x1° grid.
- `startdate` and `enddate`, the start and end date of the dataset to retrieve. Especially for big datasets, even only exploring the metadata can be time consuming, so it is useful to specify the time range of interest.
- `loglevel`, the level of verbosity of the logger. The default is 'WARNING', but you can set it to 'INFO' or 'DEBUG' to have more info on the operations done by the Reader. This will propagate also to internal functions, so you can have a lot of info on the operations done by the Reader.

`**kwargs` can be defined in the catalog entry and used when calling the Reader, enabling for example the usage of ensemble members.

!! Only the fixer will be applied automatically to the dataset, the regridder and the areas will be applied only when needed, i.e. when you ask for it.

## Retrieve arguments

The `retrieve` method of the Reader object has some important arguments:

- `var`, the variable to retrieve. If not specified, all the variables of the dataset will be retrieved.
- `startdate` and `enddate`, the start and end date of the dataset to retrieve. If not specified, the whole dataset will be retrieved. As you can see this can be specified both at Reader or retrieve level. If you know immediately which dates do you need, it is better to specify them at Reader level.

## xarray compatibility

The data retrieved by the Reader are xarray.Dataset objects, so you can use all the xarray functionalities to analyze and plot the data.

Let's try to do a little exercise with the ERA5 data and the IFS-NEMO historical simulation from phase1.

You can open this two entries:

- catalog='obs', model='IFS-NEMO', exp= 'historical-1990', source='lra-r100-monthly'
- catalog='climatedt-phase1', model='ERA5', exp='era5', source='monthly'

Both the sources are regular lon-lat grids, so we can compute for the two easily the global mean temperature (var=2t) and plot a time series of it. Can you do it for the period 1990-2005 with the usage of the Reader and the xarray functionalities?

If you need hint or the solution, please check on the hedgedoc of the AQUAthon.