<img src="../images/intake.png" width=250 alt="Intake logo"></img>
<img src="../images/cmip6-cookbook-thumbnail.png" width=250 alt="CMIP6 image"></img>

# Load CMIP6 Data with Intake-ESM

---

## Overview

[Intake-ESM](https://intake-esm.readthedocs.io/en/latest/) is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The packages is under very active development, and features may be unstable. Please report any [issues or suggestions on GitHub](https://github.com/intake/intake-esm/issues).

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Xarray](https://foundations.projectpythia.org/core/xarray/xarray-intro.html) | Necessary | |
| [Understanding of NetCDF](https://foundations.projectpythia.org/core/data-formats/netcdf-cf.html) | Helpful | Familiarity with metadata structure |

- **Time to learn**: 5 minutes

---

## Imports

In [1]:
import xarray as xr
xr.set_options(display_style='html')
import intake
%matplotlib inline

## Loading Data

Intake ESM works by parsing an [ESM Collection Spec](https://github.com/NCAR/esm-collection-spec/) and converting it to an [Intake](https://intake.readthedocs.io/en/latest/) catalog. The collection spec is stored in a `.json` file. Here we open it using Intake.

In [2]:
cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col

Unnamed: 0,unique
activity_id,18
institution_id,36
source_id,88
experiment_id,170
member_id,657
table_id,37
variable_id,700
grid_label,10
zstore,514818
dcpp_init_year,60


We can now use Intake methods to search the collection, and, if desired, export a Pandas dataframe.

In [3]:
cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',
                 grid_label='gn')
cat.df

Unnamed: 0,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,zstore,dcpp_init_year,version
0,CMIP,IPSL,IPSL-CM6A-LR,historical,r8i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor...,,20180803
1,CMIP,IPSL,IPSL-CM6A-LR,historical,r5i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor...,,20180803
2,CMIP,IPSL,IPSL-CM6A-LR,historical,r26i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor...,,20180803
3,CMIP,IPSL,IPSL-CM6A-LR,historical,r2i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor...,,20180803
4,CMIP,IPSL,IPSL-CM6A-LR,historical,r6i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/histor...,,20180803
...,...,...,...,...,...,...,...,...,...,...,...
168,CMIP,CSIRO,ACCESS-ESM1-5,historical,r11i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/hist...,,20200803
169,CMIP,EC-Earth-Consortium,EC-Earth3-CC,historical,r1i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-E...,,20210113
170,ScenarioMIP,EC-Earth-Consortium,EC-Earth3-CC,ssp585,r1i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consorti...,,20210113
171,CMIP,CMCC,CMCC-ESM2,historical,r1i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP6/CMIP/CMCC/CMCC-ESM2/historica...,,20210114


Intake knows how to automatically open the Datasets using Xarray. Furthermore, Intake-ESM contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated Xarray Datasets.

In [4]:
dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(dset_dict.keys())

  dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})



--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'


['ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'ScenarioMIP.NCC.NorESM2-MM.ssp585.Oyr.gn',
 'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'ScenarioMIP.CCCma.CanESM5-CanOE.ssp585.Oyr.gn',
 'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Oyr.gn',
 'CMIP.IPSL.IPSL-CM5A2-INCA.historical.Oyr.gn',
 'ScenarioMIP.EC-Earth-Consortium.EC-Earth3-CC.ssp585.Oyr.gn',
 'CMIP.MRI.MRI-ESM2-0.historical.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-LR.historical.Oyr.gn',
 'CMIP.CMCC.CMCC-ESM2.historical.Oyr.gn',
 'ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Oyr.gn',
 'CMIP.CSIRO.ACCESS-ESM1-5.historical.Oyr.gn',
 'CMIP.NCC.NorESM2-MM.historical.Oyr.gn',
 'ScenarioMIP.NCC.NorESM2-LM.ssp585.Oyr.gn',
 'CMIP.NCC.NorESM2-LM.historical.Oyr.gn',
 'ScenarioMIP.MRI.MRI-ESM2-0.ssp585.Oyr.gn',
 'CMIP.EC-Earth-Consortium.EC-Earth3-CC.historical.Oyr.gn',
 'CMIP.MIROC.MIROC-ES2L.historical.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-HR.historical.Oyr.gn',
 'ScenarioMIP.CMCC.CMCC-ESM2.ssp585.Oyr.gn',
 'ScenarioMIP.MIROC.MIROC-ES2L.ssp585.

In [5]:
ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds

Unnamed: 0,Array,Chunk
Bytes,818.44 kiB,818.44 kiB
Shape,"(291, 360)","(291, 360)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 818.44 kiB 818.44 kiB Shape (291, 360) (291, 360) Dask graph 1 chunks in 170 graph layers Data type float64 numpy.ndarray",360  291,

Unnamed: 0,Array,Chunk
Bytes,818.44 kiB,818.44 kiB
Shape,"(291, 360)","(291, 360)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,720 B,720 B
Shape,"(45, 2)","(45, 2)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 720 B 720 B Shape (45, 2) (45, 2) Dask graph 1 chunks in 170 graph layers Data type float64 numpy.ndarray",2  45,

Unnamed: 0,Array,Chunk
Bytes,720 B,720 B
Shape,"(45, 2)","(45, 2)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,818.44 kiB,818.44 kiB
Shape,"(291, 360)","(291, 360)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 818.44 kiB 818.44 kiB Shape (291, 360) (291, 360) Dask graph 1 chunks in 170 graph layers Data type float64 numpy.ndarray",360  291,

Unnamed: 0,Array,Chunk
Bytes,818.44 kiB,818.44 kiB
Shape,"(291, 360)","(291, 360)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.58 kiB,2.58 kiB
Shape,"(165, 2)","(165, 2)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 2.58 kiB 2.58 kiB Shape (165, 2) (165, 2) Dask graph 1 chunks in 170 graph layers Data type object numpy.ndarray",2  165,

Unnamed: 0,Array,Chunk
Bytes,2.58 kiB,2.58 kiB
Shape,"(165, 2)","(165, 2)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.20 MiB,3.20 MiB
Shape,"(291, 360, 4)","(291, 360, 4)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.20 MiB 3.20 MiB Shape (291, 360, 4) (291, 360, 4) Dask graph 1 chunks in 170 graph layers Data type float64 numpy.ndarray",4  360  291,

Unnamed: 0,Array,Chunk
Bytes,3.20 MiB,3.20 MiB
Shape,"(291, 360, 4)","(291, 360, 4)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.20 MiB,3.20 MiB
Shape,"(291, 360, 4)","(291, 360, 4)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.20 MiB 3.20 MiB Shape (291, 360, 4) (291, 360, 4) Dask graph 1 chunks in 170 graph layers Data type float64 numpy.ndarray",4  360  291,

Unnamed: 0,Array,Chunk
Bytes,3.20 MiB,3.20 MiB
Shape,"(291, 360, 4)","(291, 360, 4)"
Dask graph,1 chunks in 170 graph layers,1 chunks in 170 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,101.42 GiB,215.80 MiB
Shape,"(35, 1, 165, 45, 291, 360)","(1, 1, 12, 45, 291, 360)"
Dask graph,490 chunks in 106 graph layers,490 chunks in 106 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 101.42 GiB 215.80 MiB Shape (35, 1, 165, 45, 291, 360) (1, 1, 12, 45, 291, 360) Dask graph 490 chunks in 106 graph layers Data type float32 numpy.ndarray",165  1  35  360  291  45,

Unnamed: 0,Array,Chunk
Bytes,101.42 GiB,215.80 MiB
Shape,"(35, 1, 165, 45, 291, 360)","(1, 1, 12, 45, 291, 360)"
Dask graph,490 chunks in 106 graph layers,490 chunks in 106 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


---

## Summary
In this notebook, we used Intake-ESM to open an Xarray Dataset for one particular model and experiment.

### What's next?
We will see an example of downloading a dataset with `fsspec` and `zarr`.

## Resources and references
- [Original notebook in the Pangeo Gallery](http://gallery.pangeo.io/repos/pangeo-gallery/cmip6/intake_ESM_example.html) by Henri Drake and [Ryan Abernathey](https://ocean-transport.github.io/)