# Get available CMIP6 and CORDEX simulations

`Downclim` allows you to easily access CMIP6 and CORDEX simulations. 

This notebook shows how to get a list of available simulations for a given set of variables and regions. This is a prior step to actually download the data.

In [1]:
from __future__ import annotations

from downclim.list_projections import (
    CMIP6Context,
    CORDEXContext,
    list_available_cmip6_simulations,
    list_available_cordex_simulations,
    save_projections,
)


`CMIP6Context` and `CORDEXContext` are the main classes to interact with CMIP6 and CORDEX simulations. They are containers of all the information needed to proceed to your request.

You can check what are the fields (mandatory or optional) that you can use :

## CMIP6 simulations

To request CMIP6 simulations available on Google Cloud File System (GCFS), you must use the `CMIP6Context` class.

You can use all classical fields for filtering CMIP6 simulations. You can have a look on Google Cloud CMIP6 dataset here: https://console.cloud.google.com/marketplace/details/noaa-public/cmip6?inv=1&invt=AbmZOg&project=raycast-366813. Also, even though we use `GCFS`, and thus cannot guarantee full accordance with `ESGF` server, you can also check and filter CMIP6 data, e.g. here: https://esgf-node.ipsl.upmc.fr/search/cmip6-ipsl/.

Here are the main fields you can use :

In [2]:
cmip6_context = CMIP6Context(
    activity_id = ["ScenarioMIP", "CMIP"],
    institution_id = ["NOAA-GFDL", "CMCC"],
    experiment_id = ["ssp126", "historical"],
    member_id = "r1i1p1f1",
    table_id = "Amon",
    variable_id = ["tas", "pr"],
    grid_label = "gn",
)
cmip6_context

CMIP6Context(activity_id=['ScenarioMIP', 'CMIP'], institution_id=['NOAA-GFDL', 'CMCC'], source_id=None, experiment_id=['ssp126', 'historical'], member_id='r1i1p1f1', table_id='Amon', variable_id=['tas', 'pr'], grid_label='gn')

You can have more information about the fields you can provide to your `CMIP6Context` object :

In [3]:
help(CMIP6Context)

Help on class CMIP6Context in module downclim.list_projections:

class CMIP6Context(pydantic.main.BaseModel)
 |  CMIP6Context(*, activity_id: str | list[str] | None = ['ScenarioMIP', 'CMIP'], institution_id: str | list[str] | None = ['IPSL', 'NCAR'], source_id: str | list[str] | None = None, experiment_id: str | list[str] | None = ['ssp245', 'historical'], member_id: str | list[str] | None = 'r1i1p1f1', table_id: str | None = 'Amon', variable_id: str | list[str] | None = ['tas', 'tasmin', 'tasmax', 'pr'], grid_label: str | None = None) -> None
 |  
 |  Context about the query on the CMIP6 dataset.
 |  
 |  Entries of the dictionary can be either `str` or `list` of `str` if multiple values are provided. These following keys are available. None are mandatory:
 |  - activity_id: str, e.g "ScenarioMIP", "CMIP"
 |  - institution_id: str, e.g "IPSL", "NCAR"
 |  - source_id: str, e.g "IPSL-CM6A-LR", "CMCC-CM2-HR4"
 |  - experiment_id: str, e.g "ssp126", "historical"
 |  - member_id: str, e.g 

In [4]:
cmip6_simulations = list_available_cmip6_simulations(cmip6_context)
cmip6_simulations

Unnamed: 0,project,institute,model,experiment,ensemble,table,variable,grid_label,datanode,dcpp_init_year,version,domain,product,time_frequency
0,CMIP,CMCC,CMCC-CM2-SR5,historical,r1i1p1f1,Amon,tas,gn,gs://cmip6/CMIP6/CMIP/CMCC/CMCC-CM2-SR5/histor...,,20200616,GLOBAL,output,mon
1,CMIP,CMCC,CMCC-CM2-SR5,historical,r1i1p1f1,Amon,pr,gn,gs://cmip6/CMIP6/CMIP/CMCC/CMCC-CM2-SR5/histor...,,20200616,GLOBAL,output,mon
2,ScenarioMIP,CMCC,CMCC-CM2-SR5,ssp126,r1i1p1f1,Amon,tas,gn,gs://cmip6/CMIP6/ScenarioMIP/CMCC/CMCC-CM2-SR5...,,20200717,GLOBAL,output,mon
3,ScenarioMIP,CMCC,CMCC-CM2-SR5,ssp126,r1i1p1f1,Amon,pr,gn,gs://cmip6/CMIP6/ScenarioMIP/CMCC/CMCC-CM2-SR5...,,20200717,GLOBAL,output,mon
4,CMIP,CMCC,CMCC-ESM2,historical,r1i1p1f1,Amon,tas,gn,gs://cmip6/CMIP6/CMIP/CMCC/CMCC-ESM2/historica...,,20210114,GLOBAL,output,mon
5,CMIP,CMCC,CMCC-ESM2,historical,r1i1p1f1,Amon,pr,gn,gs://cmip6/CMIP6/CMIP/CMCC/CMCC-ESM2/historica...,,20210114,GLOBAL,output,mon
6,ScenarioMIP,CMCC,CMCC-ESM2,ssp126,r1i1p1f1,Amon,pr,gn,gs://cmip6/CMIP6/ScenarioMIP/CMCC/CMCC-ESM2/ss...,,20210126,GLOBAL,output,mon
7,ScenarioMIP,CMCC,CMCC-ESM2,ssp126,r1i1p1f1,Amon,tas,gn,gs://cmip6/CMIP6/ScenarioMIP/CMCC/CMCC-ESM2/ss...,,20210126,GLOBAL,output,mon


<div class="alert alert-block alert-info"> 
<b>Note</b> 
You can also use a standard Python dictionary using the same fields as keys. However the `CMIP6Context` class provides a more user-friendly way to interact with the data, including automatic checking of the fields.

Intermediate solution consist in using a dictionary to provide the fields, and then convert it to a `CMIP6Context` object:
```
my_dict = {
    'experiment_id': 'historical',
    'variable_id': 'tas',
    'table_id': 'Amon',
    ...
}
cmip6_context = CMIP6Context(**my_dict)
```
</div>

## CORDEX simulations

Similarly, you can use the `CORDEXContext` class to request CORDEX simulations available on `ESGF` nodes.

To know more about CORDEX domains, you can check the [CORDEX domains website](https://cordex.org/domains/) to see to which domain your area of interest belongs.

To help you with the search fields, you can have a look on the `ESGF` search page, e.g. : https://esgf-node.ipsl.upmc.fr/search/cordex-ipsl/ .

In [5]:
cordex_context = CORDEXContext(
    domain = "AUS-44",
    experiment = ["rcp26", "historical"],
    ensemble = "r1i1p1",
    table_frequency = "mon",
    variable = ["tas", "tasmin", "tasmax", "pr"],
)
cordex_context

CORDEXContext(project='CORDEX', product='output', domain='AUS-44', institute=None, driving_model=None, experiment=['rcp26', 'historical'], experiment_family=None, ensemble='r1i1p1', rcm_model=None, downscaling_realisation=None, time_frequency='mon', variable=['tas', 'tasmin', 'tasmax', 'pr'], variable_long_name=None)

In [6]:
cordex_context.model_dump()

{'project': 'CORDEX',
 'product': 'output',
 'domain': 'AUS-44',
 'institute': None,
 'driving_model': None,
 'experiment': ['rcp26', 'historical'],
 'experiment_family': None,
 'ensemble': 'r1i1p1',
 'rcm_model': None,
 'downscaling_realisation': None,
 'time_frequency': 'mon',
 'variable': ['tas', 'tasmin', 'tasmax', 'pr'],
 'variable_long_name': None}

In [7]:
cordex_simulations = list_available_cordex_simulations(cordex_context, esgf_credential="../config/credentials_esgf.yml")
cordex_simulations

Unnamed: 0,project,product,domain,institute,driving_model,experiment,ensemble,rcm_model,downscaling_realisation,time_frequency,variable,version,datanode


This request returns no result. 

<div class="alert alert-block alert-info"> 
<b>Note</b> 
Actually, there are `CORDEX` simulations matching all the requirements except the "experiment" one : there is no set with both 'rcp26' and 'historical' experiments.
</div>

## Save your requested simulations

You now have two `DataFrames` containing the information about the simulations you requested. You can save them to a `csv` file to keep track of your request. This ill be useful to process download and downscaling using `Downclim` later on.

In [30]:
save_projections(cordex_simulations = cordex_simulations, cmip6_simulations = cmip6_simulations, output_file = "simulations_list.csv")