In [4]:
# You don't need to run this--it's just a workaround so images show up in these Github Jupyter notebooks
from dapper.utils import display_image_gh_notebook
from IPython.display import HTML

## 4. Downloading global CMIP data
This notebook demonstrates how to download global CMIP6 files based on your criteria (variables, models, experiments, etc.). This notebook only walks through the process of downloading the raw CMIP6 files, not formatting them for ELM [funcationality does not yet exist].

`dapper` uses a Pangeo-hosted CMIP repository, as we found that ESGF was kinda tricky because of the transience and availability of nodes. The Pangeo archive standardizes everything into a quickly-searchable and downloadable archive, but it is not a perfect mirror of all the available data across ESGF. If you're not finding what you need here, you may have to look in ESGF. Note that Google Earth Engine also hosts a downscaled set of CMIP6 models/variables, but unfortunately it includes only a limited set of variables--not everything needed for ELM runs, so we do not provide functionality for sampling it.

Searching and downloading from the Pangeo archive does not require an account, so unlike ERA5-Land data that needs a Google Earth Engine account, this should work straight out of the box.

Similar to working with ERA5-Land Hourly data, here we will specify a `params` dictionary and then send our request. Let's look at these `params` a little bit here.

| Key          | Definition                                                                                                       | Examples                                                   |
|--------------|------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| `models`     | Climate models (or "sources") that produced the simulation data, each with unique physics, resolution, and configurations. | `CESM2`, `IPSL-CM6A-LR`, `CanESM5`, `MPI-ESM1-2-HR`        |
| `variables`  | Climate variables simulated by the models, including atmospheric, oceanic, and land-surface data.                | `pr`, `tas`, `psl`, `ua`                                   |
| `experiment` | Predefined scenarios that specify forcing conditions used in climate simulations.                                | `historical`, `ssp245`, `ssp370`, `ssp585`, `piControl`    |
| `table`      | Frequency and domain of the model output data.                                                                   | `Amon`, `day`, `Omon`, `Lmon`                              |
| `ensemble`   | Identifier specifying realization, initialization, physics, and forcing configurations for the model run.        | `r1i1p1f1`, `r2i1p1f1`, `r1i2p1f2`                         |

You do not need to specify all of these. For example, if you're not sure which models you want, just leave it out and you'll be returned with all the models that match your other criteria. Let's try it out.

In [None]:
from dapper.cmip import cmip_utils as cuts

# We will leave models out for now
params = {
    'variables' : ['pr', 'tas'],
    'experiment' : 'historical',
    'table' : ['Amon'],
    'ensemble' : 'r1i1p1f1',
}

available = cuts.find_available_data(params)
