# Xarray Access CMIP6 Data from NCI



In this notebook we demonstrate:

* Use Clef CMIPs data searching tool 
* Programmatic access CMIP6 data

This example uses Coupled Model Intercomparison Project (CMIP6) collections. For more information, please visit [data catalogue]( https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f6600_2266_8675_3563) and [terms of use]( https://pcmdi.llnl.gov/CMIP6/TermsOfUse/TermsOfUse6-1.html).

---

- Authors: NCI Virtual Research Environment Team
- Keywords: CMIP, Clef, Xarray
- Create Date: 2019-Nov; Update Date: 2020-Apr

### Prerequisite

To run this notebook on Gadi/VDI, or on your local computer. The following modules are needed:

* Clef
* Xarray

You also need to be a member of the following data project to access the data:
* oi10

You can request to join the project through [NCI's user account management system](https://my.nci.org.au). 

## Clef

Clef searches the Earth System Grid Federation datasets stored at the Australian National Computational Infrastructure, both data published on the NCI ESGF node as well as files that are locally replicated from other ESGF nodes. For more information about the tool, please visit [Clef's documentation site](https://clef.readthedocs.io/en/latest/gettingstarted.html).

Currently it searches for the following datasets:

- CMIP5 raijin projects: rr3, where NCI is the primary publisher and al33 for replicas
- CMIP6 raijin projects: oi10 for replicas

We use Clef to search CMIP5 data path. Clef is available within /g/data/hh5/public/modules analysis3 packages. 

#### Load Clef 

In [None]:
!module use /g/data3/hh5/public/modules
!module load analysis3-20.04

### Import python package 

In [1]:
import xarray as xr
%matplotlib inline

### Use Clef -cmip6 to serach data

First, check out help information to see search options in clef cmip6.

In [49]:
!clef cmip6 --help

Usage: clef cmip6 [OPTIONS] [QUERY]...

  Search ESGF and local database for CMIP6 files

  Constraints can be specified multiple times, in which case they are
  combined    using OR: -v tas -v tasmin will return anything matching
  variable = 'tas' or variable = 'tasmin'. The --latest flag will check ESGF
  for the latest version available, this is the default behaviour

Options:
  -mip, --activity [AerChemMIP|C4MIP|CDRMIP|CFMIP|CMIP|CORDEX|DAMIP|DCPP|DynVarMIP|FAFMIP|GMMIP|GeoMIP|HighResMIP|ISMIP6|LS3MIP|LUMIP|OMIP|PAMIP|PMIP|RFMIP|SIMIP|ScenarioMIP|VIACSAB|VolMIP]
  -e, --experiment x              CMIP6 experiment, list of available depends
                                  on activity
  --source_type [AER|AGCM|AOGCM|BGC|CHEM|ISM|LAND|OGCM|RAD|SLAB]
  -t, --table x                   CMIP6 CMOR table: Amon, SIday, Oday ...
  -m, --model, --source_id x      CMIP6 model id: GFDL-AM4, CNRM-CM6-1 ...
  -v, --variable x                CMIP6 variable name as in filenames
  -mi, --member TE

Then, we will search available temperature and precipitation data in ScenarioMIP. See example below:

In [50]:
!clef cmip6  --activity ScenarioMIP  --experiment  ssp126 --member r1i1p1f2 --table Amon  --variable tas   --grid gr 

/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r1i1p1f2/Amon/tas/gr/v20190219/
/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-ESM2-1/ssp126/r1i1p1f2/Amon/tas/gr/v20190328/

Everything available on ESGF is also available locally


We can then set up values for CMIP6 attributions according to the CleF search results.

CMIP6 data are organised according to its global attributes. We can access different data by changing attributes from the below directory:
**/g/data1b/oi10/replicas/CMIP6/activity_id/institution_id/source_id/experiment_id
/member_id/table_id/variable/grid_label/version/**

For more information about CMIP6 drs tree (for more information, see this [documentation](https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit).

In SenarioMIP there are four future pathways of societal development, the Shared Socioeconomic Pathways (SSPs), in which ssp126 is the lowest emission scenario. 
For information about SenarioMIP scenarios, see here https://www.geosci-model-dev.net/9/3461/2016/   

Below, we have set up these attributes in order to get the future projection data under the ssp126 scenario using member 'r1i1p1f2' of CNRM_CM6-1 model simulations as an example. 

<div class="alert alert-warning">
<b>NOTE: </b>Due to different DRS (Directory Reference Structure) between CMIP5 and CMIP6, search syntax using clef is slightly different between two datasets. They have to be strictly consistent with each DRS tree and they are also case sensitive.   
</div>

Below is an example of using wrong DRS structure to search CMIP data.

In [None]:
!clef cmip5  --activity ScenarioMIP  --source_id  CNRM-CM6-1 --table Amon  --variable tas  --variable pr   --grid gr 

### Use Xarray to open data

#### temperature

In [2]:
cmip6Dir='/g/data/oi10/replicas/CMIP6'
activity='ScenarioMIP'
institute='CNRM-CERFACS'
source='CNRM-CM6-1'
experiment='ssp126' 
member='r1i1p1f2'
table='Amon'
variable='tas'  
grid='gr'
version='v20190219'
period='201501-210012'
ds=xr.open_dataset(''+cmip6Dir+'/'+activity+'/'+institute+'/'+source+'/'+experiment+'/'+member+'/'+table+'/'+variable+'/'+grid+'/'+version+'/'+variable+'_'+table+'_'+source+'_'+experiment+'_'+member+'_'+grid+'_'+period+'.nc')
tas=ds.tas
tas

#### precipitation

In [3]:
cmip6Dir='/g/data/oi10/replicas/CMIP6'
activity='ScenarioMIP'
institute='CNRM-CERFACS'
source='CNRM-CM6-1'
experiment='ssp126' 
member='r1i1p1f2'
table='Amon'
variable='pr'  
grid='gr'
version='v20190219'
period='201501-210012'
ds=xr.open_dataset(''+cmip6Dir+'/'+activity+'/'+institute+'/'+source+'/'+experiment+'/'+member+'/'+table+'/'+variable+'/'+grid+'/'+version+'/'+variable+'_'+table+'_'+source+'_'+experiment+'_'+member+'_'+grid+'_'+period+'.nc')
pr=ds.pr
pr

#### You can then loop over multiple datasets using the formular above.

Now show the multiple files:

In [None]:
!ls /g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/3hr/pr/gr/v20180917/ 

In [8]:
cmip6Dir='/g/data/oi10/replicas/CMIP6'
activity='CMIP'
institute='CNRM-CERFACS'
source='CNRM-CM6-1'
experiment='historical' 
member='r1i1p1f2'
table='3hr'
variable='pr'  
grid='gr'
version='v20180917'
files=[f'{cmip6Dir}/{activity}/{institute}/{source}/{experiment}/{member}/{table}/{variable}/{grid}/{version}/{variable}_{table}_{source}_{experiment}_{member}_{grid}_{year}01010130-{year+9}12312230.nc' for year in range(1850, 2000, 10)]
files

['/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/3hr/pr/gr/v20180917/pr_3hr_CNRM-CM6-1_historical_r1i1p1f2_gr_185001010130-185912312230.nc',
 '/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/3hr/pr/gr/v20180917/pr_3hr_CNRM-CM6-1_historical_r1i1p1f2_gr_186001010130-186912312230.nc',
 '/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/3hr/pr/gr/v20180917/pr_3hr_CNRM-CM6-1_historical_r1i1p1f2_gr_187001010130-187912312230.nc',
 '/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/3hr/pr/gr/v20180917/pr_3hr_CNRM-CM6-1_historical_r1i1p1f2_gr_188001010130-188912312230.nc',
 '/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/3hr/pr/gr/v20180917/pr_3hr_CNRM-CM6-1_historical_r1i1p1f2_gr_189001010130-189912312230.nc',
 '/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/3hr/pr/gr/v20180917/pr_3hr_CNRM-CM6-1_historical_r1i1p1f2_gr_19000

In [None]:
ds=xr.open_mfdataset(files)
pr=ds.pr
pr

### Summary

In this example, we show how to use Clef tool to search the available CMIP6 data on Gadi and use xarray to open one dataset or programmatically open multiple datasets at onece based on CMIP6 Directory Reference Structure.