# **Accessing CMIP6 catalogs with intake-esm**

This notebook is useful if you want to use data that is not represented in the shared folders on the jupyterhub.

First, load `intake` and `intake-esm` 
(We will also need to suppress some annoying future warnings).

In [None]:
import intake
import intake_esm
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)



Then, you can open a catalog file from a known dataset, f.e. [Pangeo CMIP6 google cloud](https://pangeo-data.github.io/pangeo-cmip6-cloud/)



In [None]:
cat = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")
cat

The summary above tells us that this catalog contains 7674 data assets. You can get more information on the individual data assets contained in the catalog by looking at the underlying dataframe created when we load the catalog:

In [None]:
cat.df.head()

## Finding unique entries
To get unique values for given columns in the catalog, intake-esm provides a unique() method:

Let’s query the data catalog to see what models(source_id), experiments (experiment_id) and temporal frequencies (table_id) are available.

In [None]:
unique = cat.unique()
unique

In [None]:
unique['source_id']

In [None]:
unique['experiment_id']

In [None]:
unique['table_id']

The `search()` method allows the user to perform a query on a catalog using keyword arguments. The keyword argument names must match column names in the catalog. The search method returns a subset of the catalog with all the entries that match the provided query.

In the example below, we are going to search for the following:

* source: `CESM2` model
* experiment: `historical`
* member: `r1ip1f1` ensemble member
* table: `Lmon` Land monthly
* variable: `tsl` Temperature of a soil layer °K



In [None]:
cat_subset = cat.search(
    source_id = "CESM2",
    experiment_id = "historical",
    member_id = "r1i1p1f1",
    table_id = "Lmon",
    variable_id = "tsl"
)

We will get 1 dataset, however, you can supply lists to the `search()` keyword arguments, f.e. `experiment_id=["historical","ssp585"]`.

In [None]:
cat_subset

Intake-esm implements convenience utilities for loading the query results into higher level xarray datasets. The logic for merging/concatenating the query results into higher level xarray datasets is provided in the input JSON file and is available under `.aggregation_info` property of the catalog:

In [None]:
cat.esmcat.aggregation_control

Now, let's load some extra modules

In [None]:
import xarray
import gcsfs

And open our dataset as a dictionary of xarray.Dataset's, wait until the bar is 100%.

In [None]:
dset_dict = cat_subset.to_dataset_dict(
    xarray_open_kwargs={"consolidated": True, "decode_times": True, "use_cftime": True},
)


In [None]:
[key for key in dset_dict.keys()]


Now, lets take one value from the dictionary and plot something.

In [None]:
ds = dset_dict['CMIP.NCAR.CESM2.historical.Lmon.gn']
ds

Now we will convert soil temperatures to Celsius within the same dataset:

We can pass the variable from the dataset into an xarray.DataArray and plot it. Plus we can convert the units.

In [None]:
tsl=ds.tsl.sel(time=slice("2000-01","2000-12")).sel(depth=3.0, method="nearest").mean(dim='time',keep_attrs=True)
tsl.values=tsl.values-273.15
attrs = tsl.attrs
attrs.update({"units" : "C"})
tsl.attrs=attrs
tsl.plot(robust=True)


We can also load cartopy and make a better plot with a contour denoting extent of frozen ground in the northern hemisphere.

In [None]:
import cartopy.crs as ccrs

In [None]:
p = tsl.squeeze(drop=True).plot(subplot_kws=dict(projection=ccrs.NorthPolarStereo(), facecolor="white"), transform=ccrs.PlateCarree(),)
p.axes.gridlines()
p.axes.coastlines()
c = tsl.squeeze(drop=True).plot.contour(ax=p.axes,levels=[0],colors='blue',transform=ccrs.PlateCarree())
p.axes.set_extent((-180,180,45,90),ccrs.PlateCarree())
