# Clef Search and Access CMIP5 Data

In this notebook we demonstrate how to search CMIP datasets using Clef and programmatic access data.

* Use Clef CMIPs data searching tool 
* Programmatic access CMIP5 data

This example uses Coupled Model Intercomparison Project (CMIP5) collections. For more information, please visit [data catalogue](https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f3525_9322_8600_7716) and [terms of use](https://cmip.llnl.gov/cmip5/terms.html).

---

- Authors: NCI Virtual Research Environment Team
- Keywords: CMIP, Clef, Xarray
- Create Date: 2019-Nov; Update Date: 2020-Apr

## Clef

Clef searches the Earth System Grid Federation datasets stored at the Australian National Computational Infrastructure, both data published on the NCI ESGF node as well as files that are locally replicated from other ESGF nodes. For more information about the tool, please visit [Clef's documentation site](https://clef.readthedocs.io/en/latest/gettingstarted.html).

We use Clef to search CMIP5 data path. Clef is available within /g/data/hh5/public/modules analysis3 packages. 

#### Load Clef 

In [None]:
!module use /g/data/hh5/public/modules
!module load conda/analysis3-unstable

### Import python packages

In [None]:
import xarray as xr
%matplotlib inline

### Use Clef -cmip5 to serach data

First, check out help information to see search options in clef cmip5.

In [None]:
!clef cmip5 --help

Then, we will search available temperature and precipitation data by defining all the parameters. See example below:

In [None]:
!clef cmip5  --experiment rcp26  --ensemble r1i1p1 --table Amon   --variable tas --variable pr

We can then set up values for CMIP5 attributions according to the CleF search results.

CMIP5 data are organised according to its global attributes. We can access different data by changing attributes in the directory below:
**/g/data1b/al33/replicas/CMIP5/product/institute/model/experiment/frequency/realm/table/ensemble/version/variable**

There are four Representative Concentration Pathways (RCPs) in CMIP5. These are greenhouse gas concentration (not emissions) trajectory adopted by the IPCC for its fifth Assessment Report (AR5) in 2014. It supersedes the Special Report on Emissions Scenarios (SRES) projections published in 2000. For more information, see [here]( https://sedac.ciesin.columbia.edu/ddc/ar5_scenario_process/RCPs.html).
 

Below, we have set up these attributes in order to get the future projection data under the rcp26 senario using member 'r1i1p1' of 'MIROC-ESM' model simulations as an example. 

<div class="alert alert-warning">
<b>NOTE: </b>Due to different DRS (Directory Reference Structure) between CMIP5 and CMIP6, search syntax using clef is slightly different between two datasets. They have to be strictly consistent with each DRS tree and they are also case sensitive.   
</div>

Below is an example of using wrong DRS structure to search CMIP data.

In [None]:
!clef cmip5  --activity ScenarioMIP  --source_id  CNRM-CM6-1 --table Amon  --variable tas  --variable pr   --grid gr 

### Show data file names

In [6]:
!ls /g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/mon/atmos/Amon/r1i1p1/v20120710/tas/

tas_Amon_MIROC-ESM_rcp26_r1i1p1_200601-210012.nc


In [7]:
!ls /g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/mon/atmos/Amon/r1i1p1/v20120710/pr/

pr_Amon_MIROC-ESM_rcp26_r1i1p1_200601-210012.nc


### Use Xarray to open data

#### First the near surface temperature data

In [2]:
cmip5Dir='/g/data1b/al33/replicas/CMIP5'
product='combined'
institute='MIROC'
model='MIROC-ESM'
experiment='rcp26' 
frenquency='mon'
realm='atmos'
table='Amon'
ensemble='r1i1p1'
version='v20120710'
variable='tas'  
period='200601-210012'
ds=xr.open_dataset(''+cmip5Dir+'/'+product+'/'+institute+'/'+model+'/'+experiment+'/'+frenquency+'/'+realm+'/'+table+'/'+ensemble+'/'+version+'/'+variable+'/'+variable+'_'+table+'_'+model+'_'+experiment+'_'+ensemble+'_'+period+'.nc')
tas=ds.tas
tas

#### Then the precipitation data

In [3]:
cmip5Dir='/g/data1b/al33/replicas/CMIP5'
product='combined'
institute='MIROC'
model='MIROC-ESM'
experiment='rcp26' 
frenquency='mon'
realm='atmos'
table='Amon'
ensemble='r1i1p1'
version='v20120710'
variable='pr'  
period='200601-210012'
ds=xr.open_dataset(''+cmip5Dir+'/'+product+'/'+institute+'/'+model+'/'+experiment+'/'+frenquency+'/'+realm+'/'+table+'/'+ensemble+'/'+version+'/'+variable+'/'+variable+'_'+table+'_'+model+'_'+experiment+'_'+ensemble+'_'+period+'.nc')
pr=ds.pr
pr

#### You can then loop over multiple datasets using the formular above.

Now show the multiple files:

In [None]:
!ls /g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/day/ocean/day/r1i1p1/v20111129/tos/

In [4]:
cmip5Dir='/g/data1b/al33/replicas/CMIP5'
product='combined'
institute='MIROC'
model='MIROC-ESM'
experiment='rcp26' 
frenquency='day'
realm='ocean'
table='day'
ensemble='r1i1p1'
version='v20111129'
variable='tos'  
files=[f'{cmip5Dir}/{product}/{institute}/{model}/{experiment}/{frenquency}/{realm}/{table}/{ensemble}/{version}/{variable}/{variable}_{frenquency}_{model}_{experiment}_{ensemble}_{year}0101-{year+19}1231.nc' for year in range(2006, 2086, 20)]
files

['/g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/day/ocean/day/r1i1p1/v20111129/tos/tos_day_MIROC-ESM_rcp26_r1i1p1_20060101-20251231.nc',
 '/g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/day/ocean/day/r1i1p1/v20111129/tos/tos_day_MIROC-ESM_rcp26_r1i1p1_20260101-20451231.nc',
 '/g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/day/ocean/day/r1i1p1/v20111129/tos/tos_day_MIROC-ESM_rcp26_r1i1p1_20460101-20651231.nc',
 '/g/data1b/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/rcp26/day/ocean/day/r1i1p1/v20111129/tos/tos_day_MIROC-ESM_rcp26_r1i1p1_20660101-20851231.nc']

#### read multiple files into one dataset

In [None]:
ds=xr.open_mfdataset(files)
ds

#### let's look at the sea surface tempreture

In [None]:
tos=ds.tos
tos

### Summary

In this example, we show how to use Clef tool to search the available CMIP5 data on Gadi and use xarray to open one dataset or programmatically open multiple datasets at onece based on CMIP5 Directory Reference Structure.