# Using CleF - Climate Finder to discover ESGF data at NCI

This notebook shows examples of how to use the CleF (Climate Finder) python module to search for ESGF data on the NCI server. <br>
Currently the tool is set up for CMIP5 and CMIP6 data, but other ESGF dataset like CORDEX will be available in the future. <br> 

CleF is currently installed in the unstable version of the CMS conda module analysis3. This is managed by the CMS and is available simply by running
  >  module use /g/data3/hh5/public/modules <br>
  >  module load conda/analysis3-unstable
  
You could use the module interactively, for the moment we will use its command line options. <br>
Let's start!

## Command syntax

In [None]:
# run this if you haven't done so already in the terminal
!module use /g/data3/hh5/public/modules
!module load conda/analysis3-unstable

In [1]:
!clef

Usage: clef [OPTIONS] COMMAND [ARGS]...

Options:
  --remote   returns only ESGF search results
  --local    returns only local files matching ESGF search
  --missing  returns only missing files matching ESGF search
  --request  send NCI request to download missing files matching ESGF search
  --debug    Show debug info
  --help     Show this message and exit.

Commands:
  cmip5  Search ESGF and local database for CMIP5 files Constraints can be...
  cmip6  Search ESGF and local database for CMIP6 files Constraints can be...
  ds     Search local database for non-ESGF datasets


By simpling running the command **clef** with no arguments, the tool shows the help message and then exits, basically it is equivalent to 
> clef --help <br>

We can see currently there are 3 sub-commands, **ds** to search for non-ESGF collections and one for each cmip dataset: **cmip5** and **cmip6**.  <br>
There are also five different options that can be passed before the sub-commands, one we have already seen is *--help*. The others are used to modify how the tool will deal with the main query output. We will have a look at them and at **ds** later. <br>
Let's start from searching some CMIP5 data, to see what we can pass to the **cmip5** sub-command we can simply run it with its *--help* option.

## CMIP5

In [2]:
!clef cmip5 --help

Usage: clef cmip5 [OPTIONS] [QUERY]...

  Search ESGF and local database for CMIP5 files

  Constraints can be specified multiple times, in which case they are
  combined    using OR: -v tas -v tasmin will return anything matching
  variable = 'tas' or variable = 'tasmin'. The --latest flag will check ESGF
  for the latest version available, this is the default behaviour

Options:
  -e, --experiment x              CMIP5 experiment: piControl, rcp85, amip ...
  --experiment_family [Atmos-only|Control|Decadal|ESM|Historical|Idealized|Paleo|RCP]
                                  CMIP5 experiment family: Decadal, RCP ...
  -m, --model x                   CMIP5 model acronym: ACCESS1.3, MIROC5 ...
  -t, --table, --mip [Amon|Omon|OImon|LImon|Lmon|6hrPlev|6hrLev|3hr|Oclim|Oyr|aero|cfOff|cfSites|cfMon|cfDay|cf3hr|day|fx|grids]
  -v, --variable x                Variable name as shown in filanames: tas,
                                  pr, sic ...
  -en, --ensemble, --member TE

### Passing arguments and options

The *help* shows all the constraints we can pass to the tool, there are also some additional options which can change the way we run our search. For the moment we can ignore these and use their default values. <br>
Some of the constraints can be passed using an abbreviation,like *-v* instead of *--variable*. This is handy once you are more familiar with the tool. <br>
The same option can have more than one name, for example *--ensemble* can also be passed as *--member*, this is because the terminology has changed between CMIP5 and CMIP6. <br>
You can pass how many constraints you want and pass the same constraint more than once. Let's see what happens though if we do not pass any constraint.

In [3]:
!clef cmip5

Too many results 2296637, try limiting your search:
  https://esgf.nci.org.au/search/esgf-nci?query=&distrib=True&replica=False&latest=True&project=CMIP5


In [4]:
!clef cmip5 --variable tasmin --experiment historical --table day --ensemble r2i1p1s

No matches found on ESGF, check at https://esgf.nci.org.au/search/esgf-nci?query=&distrib=True&replica=False&latest=True&project=CMIP5&ensemble=r2i1p1s&experiment=historical&cmor_table=day&variable=tasmin


Oops that wasn't reasonable! I mispelled the ensemble "r2i1p1s" does not exists and the tool is telling me it cannot find any matches.

In [5]:
!clef cmip5 --variable tasmin --experiment historical --table days --ensemble r2i1p1

Usage: clef cmip5 [OPTIONS] [QUERY]...
Try "clef cmip5 --help" for help.

Error: Invalid value for "--table" / "--mip" / "-t": invalid choice: days. (choose from Amon, Omon, OImon, LImon, Lmon, 6hrPlev, 6hrLev, 3hr, Oclim, Oyr, aero, cfOff, cfSites, cfMon, cfDay, cf3hr, day, fx, grids)


Made another spelling mistake, in this case the tool knows that I passed a wrong value and lists for me all the available options for the CMOR table. Eventually we are aiming to validate all the arguments we can, although for some it is no possible to pass all the possible values (ensemble for example).

In [6]:
!clef cmip5 --variable tasmin --experiment historical --table day --ensemble r2i1p1

/g/data1b/al33/replicas/CMIP5/combined/IPSL/IPSL-CM5A-LR/historical/day/atmos/day/r2i1p1/v20130506/tasmin
/g/data1b/al33/replicas/CMIP5/combined/IPSL/IPSL-CM5A-MR/historical/day/atmos/day/r2i1p1/v20130506/tasmin
/g/data1b/al33/replicas/CMIP5/combined/MOHC/HadGEM2-ES/historical/day/atmos/day/r2i1p1/v20110418/tasmin
/g/data1b/al33/replicas/CMIP5/combined/MOHC/HadCM3/historical/day/atmos/day/r2i1p1/v20140110/tasmin
/g/data1b/al33/replicas/CMIP5/combined/NOAA-GFDL/GFDL-CM3/historical/day/atmos/day/r2i1p1/v20120227/tasmin
/g/data1b/al33/replicas/CMIP5/combined/MPI-M/MPI-ESM-P/historical/day/atmos/day/r2i1p1/v20120315/tasmin
/g/data1b/al33/replicas/CMIP5/combined/MPI-M/MPI-ESM-MR/historical/day/atmos/day/r2i1p1/v20120503/tasmin
/g/data1b/al33/replicas/CMIP5/combined/MPI-M/MPI-ESM-LR/historical/day/atmos/day/r2i1p1/v20111006/tasmin
/g/data1b/al33/replicas/CMIP5/combined/CNRM-CERFACS/CNRM-CM5/historical/day/atmos/day/r2i1p1/v20120703/tasmin
/g/data1/rr3/publications/CMIP5/output1/CSIR

The tool first search on the ESGF for all the files that match the constraints we passed. It then looks for these file locally and if it finds them it returns their path on raijin.
For all the files it can't find locally, the tool check an NCI table listing the downloads they are working on. Finally it lists missing datasets which are in the download queue, followed by the datasets that are not available locally and no one has yet requested. <br>

The tool list the datasets paths and dataset_ids, if you want you can get a more detailed list by file by passing the *--format file* option. <br>

The search by default returns the latest available version. What if we want to have a look at all the available versions?

In [7]:
!clef cmip5 --variable tasmin --experiment historical --table Amon -m ACCESS1.0 --all-versions --format file

/g/data1/rr3/publications/CMIP5/output1/CSIRO-BOM/ACCESS1-0/historical/mon/atmos/Amon/r1i1p1/latest/tasmin

Everything available on ESGF is also available locally


The option *--all-versions* is the reverse of *--latest*, which is also the default, so we get a list of all available versions. <br>
Since all the ACCESS1.0 data is available on NCI (which is the authoritative source for the ACCESS models) the tool doesn't find any missing datasets and let us know about it.

## CMIP6

In [8]:
!clef cmip6 --help

Usage: clef cmip6 [OPTIONS] [QUERY]...

  Search ESGF and local database for CMIP6 files

  Constraints can be specified multiple times, in which case they are
  combined    using OR: -v tas -v tasmin will return anything matching
  variable = 'tas' or variable = 'tasmin'. The --latest flag will check ESGF
  for the latest version available, this is the default behaviour

Options:
  -mip, --activity [AerChemMIP|C4MIP|CDRMIP|CFMIP|CMIP|CORDEX|DAMIP|DCPP|DynVarMIP|FAFMIP|GMMIP|GeoMIP|HighResMIP|ISMIP6|LS3MIP|LUMIP|OMIP|PAMIP|PMIP|RFMIP|SIMIP|ScenarioMIP|VIACSAB|VolMIP]
  -e, --experiment x              CMIP6 experiment, list of available depends
                                  on activity
  --source_type [AER|AGCM|AOGCM|BGC|CHEM|ISM|LAND|OGCM|RAD|SLAB]
  -t, --table x                   CMIP6 CMOR table: Amon, SIday, Oday ...
  -m, --model, --source_id x      CMIP6 model id: GFDL-AM4, CNRM-CM6-1 ...
  -v, --variable x                CMIP6 variable name as in filenames
 

The **cmip6** sub-command works in the same way but some constraints are different. As well as changes in terminology CMIP6 has more attributes (*facets*) that can be used to search. <br>
Examples of these are the **activity** which groups experiments, **resolution** which is an approximation of the actual resolution and **grid**.

### Controlling the ouput: clef options

In [9]:
!clef --local cmip6 -e 1pctCO2 -t Amon -v tasmax -v tasmin -g gr

/g/data1b/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/1pctCO2/r1i1p1f2/Amon/tasmax/gr/v20180626/
/g/data1b/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/1pctCO2/r1i1p1f2/Amon/tasmin/gr/v20180626/
/g/data1b/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r1i1p1f2/Amon/tasmax/gr/v20181018/
/g/data1b/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r1i1p1f2/Amon/tasmin/gr/v20181018/
/g/data1b/oi10/replicas/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/1pctCO2/r1i1p1f1/Amon/tasmax/gr/v20180727/
/g/data1b/oi10/replicas/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/1pctCO2/r1i1p1f1/Amon/tasmin/gr/v20180727/


In this example we used the *--local* option for the main command **clef** to get only the local matching data path as output. <br> 
Note also that:
- we are using abbreviations for the options where available; 
- we are passing the variable *-v* option twice; 
- we used the CMIP6 specific option *-g/--grid* to search for all data that is not on the model native grid. This doesn't indicate a grid common to all the CMIP6 output only to the model itself, the same is true for member_id and other attributes.<br>

*--local* is actually executing the search directly on the NCI MAS database, which is different from the default query where the search is executed first on the ESGF and then its results are matched locally.<br>
In the example above the final result is exactly the same, whichever way we perform the query. This way of searching can give you more results if a node is offline or if a version have been unpublished from the ESGF but is still available locally. 

In [11]:
!clef --missing cmip6 -e 1pctCO2 -v clw -v clwvi -t Amon -g gr


Available on ESGF but not locally:
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clw.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clwvi.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clw.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clwvi.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clw.gr.v20190328
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clwvi.gr.v20190328
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.clw.gr.v20180727
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20180727


This time we used the *--missing* option and the tool returned only the results matching the constraints that are available on the ESGF but not locally (we changed variables to make sure to get some missing data back).

In [12]:
!clef --remote cmip6 -e 1pctCO2 -v tasmin -v tasmax -t Amon -g gr

CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.tasmax.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.tasmin.gr.v20190328
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.tasmax.gr.v20180727
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.tasmin.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.tasmax.gr.v20190328
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.tasmin.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.tasmax.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.tasmax.gr.v20181031
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.tasmin.gr.v20180727
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.tasmin.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.tasmax.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.tasmin.gr.v20180626


The *--remote* option returns the Dataset_ids of the data matching the constraints, regardless that they are available locally or not.

In [13]:
!clef --remote cmip6 -e 1pctCO2 -v tasmin -v tasmax -t Amon -g gr --format file

CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.tasmax.gr.v20180626.tasmax_Amon_CNRM-CM6-1_1pctCO2_r1i1p1f2_gr_185001-199912.nc
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.tasmin.gr.v20180626.tasmin_Amon_CNRM-CM6-1_1pctCO2_r1i1p1f2_gr_185001-199912.nc
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.tasmax.gr.v20181018.tasmax_Amon_CNRM-ESM2-1_1pctCO2_r1i1p1f2_gr_185001-199912.nc
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.tasmin.gr.v20181018.tasmin_Amon_CNRM-ESM2-1_1pctCO2_r1i1p1f2_gr_185001-199912.nc
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.tasmax.gr.v20181031.tasmax_Amon_CNRM-ESM2-1_1pctCO2_r2i1p1f2_gr_185001-199912.nc
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.tasmin.gr.v20181031.tasmin_Amon_CNRM-ESM2-1_1pctCO2_r2i1p1f2_gr_185001-199912.nc
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.tasmax.gr.v20181107.tasmax_Amon_CNRM-ESM2-1_1pctCO2_r3i1p1f2_gr_185001-199912.nc
CMIP6.CMIP.CNRM-CERFACS.

Running the same command with the option *--format file* after the sub-command, will return the File_ids instead of the default Dataset_ids. <br>
Please note that *--local*, *--remote* and *--missing* together with *--request*, which we will look at next, are all options of the main command **clef** and they need to come before any sub-commands.

## Requesting new data

What should we do if we found out there is some data we are interested to that has not been downloaded or requested yet? <br>
This is a complex data collection, NCI, in consultation with the community, decided the best way to manage it was to have one point of reference. Part of this agreement is that NCI will download the files and update the database that **clef** is interrrogating. After consultation with the community a priority list was decided and NCI has started downloading anything that falls into it as soon as become available. <br> <br>
Users can then request from the NCI helpdesk, other combinations of variables, experiments etc that do not fall into this list. <br>
The list is available from the NCI climate confluence website: <br>
Even without consulting the list you can use **clef**, as we demonstrated above, to search for the data, if it is not queued or downloaded already **clef** will give you an option to request it from NCI. <br>
Let's see how it works.

In [14]:
%%bash
clef --request cmip6 -e 1pctCO2 -v clw -v clwvi -t Amon -g gr
no


Available on ESGF but not locally:
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clw.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clwvi.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clw.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clwvi.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clw.gr.v20190328
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clwvi.gr.v20190328
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.clw.gr.v20180727
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20180727

Finished writing file: CMIP6_pxp581_20190618T082540.txt
Do you want

We run the same search which gave us as a result 4 missing datasets but this time we used the *--request* option after **clef**.<br>
The tool will execute the search remotely, then look for matches locally and on the NCI download list. Having find none gives as an option of putting in a request. <br>
It will accept any of the following as a positive answer:
> Y  YES y yes <br>

With anything else or if you don't pass anything it will assume you don't want to put in a request.<br>
It still saved the request in a file we can use later.<br>

In [15]:
!cat CMIP6_*.txt

dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20180626
dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20180626
dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20181018
dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20181018
dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clw.gr.v20181031
dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clwvi.gr.v20181031
dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clw.gr.v20181107
dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clwvi.gr.v20181107
dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clw.gr.v20190328
dataset_id=CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clwvi.gr.v20190328
dataset_id=CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.clw.gr.v20180727
dataset_id=CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1p

If I answered 'yes' the tool would have sent an e-mail to the NCI helpdesk with the text file attached, NCI can pass that file as input to their download tool and queue your request.
NB if you are running clef from raijin you cannot send an e-mail so in that case the tool will remind you to send an e-mail attaching the requested file generated to the NCI helpdesk.

## Integrating the local query in your scripts

Until now we looked at how to run queries from the command line, but you can use use the same query run by the *--local* option directly in your python code. By doing so you also get access to a lot more information on the datasets returned not only the path.<br>
To do so we have first to import some functions from the clef.code sub-module. In particular the **search** function and **connect** and **Session** that we'll use to open a connection to the database.

In [16]:
from clef.code import *
db = connect()
s = Session()

**search** takes 3 inputs: the db session, the project (i.e. currently 'cmip5' or 'cmip6') and a dictionary containing the query constraints.  
Let's start by defining some constraints.

In [17]:
constraints = {'variable': 'tas', 'model': 'MIROC5', 'cmor_table': 'day', 'experiment': 'rcp85'}

The available keys depend on the project you are querying and the attributes stored by the database. You can use any of the *facets* used for ESGF but in future we will be adding other options based on extra fields which are stored as attributes.

In [18]:
results = search(s, project='cmip5', **constraints)
results

[{'filenames': ['tas_day_MIROC5_rcp85_r1i1p1_20100101-20191231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20900101-20991231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20300101-20391231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20400101-20491231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20500101-20591231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20800101-20891231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_21000101-21001231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20060101-20091231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20600101-20691231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20700101-20791231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20200101-20291231.nc'],
  'project': 'CMIP5',
  'institute': 'MIROC',
  'model': 'MIROC5',
  'experiment': 'rcp85',
  'frequency': 'day',
  'realm': 'atmos',
  'r': '1',
  'i': '1',
  'p': '1',
  'ensemble': 'r1i1p1',
  'cmor_table': 'day',
  'version': '20120710',
  'variable': 'tas',
  'pdir': '/g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC5/rcp85/day/atmos/day/r1i1p1/v20120710/tas',
  'periods':

Both the keys and values of the constraints get checked before being passed to the query function. This means that if you passed a key or a value that doesn't exist for the chosen project, the function will print a list of valid values and then exit.<br>
Let's re-write the constraints dictionary to show an example.

In [19]:
constraints = {'v': 'tas', 'm': 'MIROC5', 'table': 'day', 'experiment': 'rcp85', 'activity': 'CMIP'}
results = search(s, project='cmip5', **constraints)

Valid constraints are:
dict_values([['source_id', 'model', 'm'], ['realm'], ['time_frequency', 'frequency', 'f'], ['variable_id', 'variable', 'v'], ['experiment_id', 'experiment', 'e'], ['table_id', 'table', 'cmor_table', 't'], ['member_id', 'member', 'ensemble', 'en', 'mi'], ['institution_id', 'institution', 'institute'], ['experiment_family']])


SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


You can see that the function told us 'activity' is not a valid constraints for CMIP5, in fact that can be used only with CMIP6<br>
NB. that the search accepted all the other abbreviations, there's a few terms that can be used for each key.<br>
The full list of valid keys is available from from the github repository:<br>
https://github.com/coecms/clef/blob/master/clef/data/valid_keys.json

In [23]:
constraints = {'v': 'tas', 'm': 'MIROC5', 'table': 'day', 'experiment': 'rcp85', 'member': 'r1i1p1'}
results = search(s, project='cmip5', **constraints)

In [24]:
results

[{'filenames': ['tas_day_MIROC5_rcp85_r1i1p1_20100101-20191231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20900101-20991231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20300101-20391231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20400101-20491231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20500101-20591231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20800101-20891231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_21000101-21001231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20060101-20091231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20600101-20691231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20700101-20791231.nc',
   'tas_day_MIROC5_rcp85_r1i1p1_20200101-20291231.nc'],
  'project': 'CMIP5',
  'institute': 'MIROC',
  'model': 'MIROC5',
  'experiment': 'rcp85',
  'frequency': 'day',
  'realm': 'atmos',
  'r': '1',
  'i': '1',
  'p': '1',
  'ensemble': 'r1i1p1',
  'cmor_table': 'day',
  'version': '20120710',
  'variable': 'tas',
  'pdir': '/g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC5/rcp85/day/atmos/day/r1i1p1/v20120710/tas',
  'periods':