# Search and discover data using ```intake-esm```

This example shows how to exploit the ```intake-esm``` features to 
- search and discover the CMIP6 datasets available in the ENES Data Space archive 
- load the corresponding data assets (NetCDF files) into xarray datasets.

Finding, investigating and loading assets into data array containers such as xarray can be a daunting task due to the large number of files a user may be interested in. 

**```intake-esm```** (https://github.com/intake/intake-esm) is a data cataloging utility built on top of ```intake```, ```pandas``` and ```xarray``` aiming to address these issues by providing the necessary functionalities for data searching, discovering, access/loading.

Import the required modules

In [None]:
import intake

**Open an ESM (Earth System Model) collection definition file**: intake-esm will use it to establish a link to a database (CSV file) that contains data assets locations and associated metadata.

The ESM collection file is located under the ```data``` folder (```CMIP6_ESM_colletion_file.json```)

In [None]:
from os.path import expanduser
home = expanduser("~")
esm_file = home+"/data/CMIP6_ESM_colletion_file.json"
col = intake.open_esm_datastore(esm_file)
col

In [None]:
col.df.head()

In [None]:
uniques = col.unique(columns=["source_id","experiment_id"])
import pprint
pprint.pprint(uniques, compact=True)

### Search and Discovery: execute a search query against the catalog

We are interested in:
- ```CMCC``` datasets about the ```CMCC-CM2-SR5``` model
- ```tas``` variable
- ```ssp585``` experiment

In [None]:
query = dict( experiment_id="ssp585",
             source_id="CMCC-CM2-SR5", 
             variable_id="tas"
)
cat = col.search(**query)
cat.df

In [None]:
cat.keys()

### Access data

When you are satisfied with the results of your query, you can ask *intake-esm* to load data assets (NetCDF files) into xarray datasets

In [None]:
dset_dict = cat.to_dataset_dict()

In [None]:
dset_dict

### Analyze data

As an example, let's compute a temporal aggregation (e.g. mean) and plot the output data on a map

In [None]:
mean = dset_dict[list(cat)[0]].tas.mean('time').compute()
mean.plot()

In [None]:
import cartopy.crs as ccrs
from cartopy.mpl.geoaxes import GeoAxes
from cartopy.util import add_cyclic_point
import matplotlib.pyplot as plt
import matplotlib as mpl
import warnings
import numpy as np
warnings.filterwarnings("ignore")

fig = plt.figure(figsize=(10, 5), dpi=100)

#Add Geo axes to the figure with the specified projection (PlateCarree)
projection = ccrs.PlateCarree()
ax = plt.axes(projection=projection)

#Draw coastline and gridlines
ax.coastlines()

gl = ax.gridlines(crs=projection, draw_labels=True, linewidth=1, color='black', alpha=0.9, linestyle=':')
gl.xlabels_top = False
gl.ylabels_right = False

#Get the near-surface air temperature field and the dimensions values
time_index = 0
lat = mean.lat
lon = mean.lon
tas = np.reshape(mean, (len(lat), len(lon)))

#Wraparound points in longitude
var_cyclic, lon_cyclic = add_cyclic_point(mean, coord=np.asarray(lon))
x, y = np.meshgrid(lon_cyclic,lat)

#Define color levels for color bar
levStep = (np.nanmax(mean)-np.nanmin(mean))/20
clevs = np.arange(np.nanmin(mean),np.nanmax(mean)+levStep,levStep)

#Set filled contour plot
cnplot = ax.contourf(x, y, var_cyclic, clevs,cmap=plt.cm.jet)  # didn't use transform, but looks ok...
plt.colorbar(cnplot,ax=ax)

ax.set_aspect('auto', adjustable=None)

plt.title('Near-Surface Air Temperature (deg K)')
plt.show()