# Understand the Uncertainty in CMIP6 Dataset

In this notebook we demonstrate how to calculate model uncertainty to see the different temperature trend shown among the 6 members of the ssp126 experiment of CNRM-CM6-1 model in CMIP6 archive:

* access data that include multiple ensemble members  
* make plots with multiple lines

This example uses Coupled Model Intercomparison Project (CMIP6) collections. For more information, please visit [data catalogue]( https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f6600_2266_8675_3563) and [terms of use]( https://pcmdi.llnl.gov/CMIP6/TermsOfUse/TermsOfUse6-1.html).

---

- Authors: NCI Virtual Research Environment Team
- Keywords: CMIP, xarray
- Create Date: 2020-Apr; Update Date: 2020-Apr

### Prerequisite

To run this notebook on Gadi/VDI, the following modules are needed:

* Clef
* Xarray

You also need to be a member of the following project:
* oi10
* hh5

You can request to join the project through [NCI's user account management system](https://my.nci.org.au). 

### Load libraries

In [1]:
import xarray as xr

### Use Clef to check data availability

In [18]:
!module use /g/data3/hh5/public/modules
!module load conda/analysis3
!clef cmip6 --activity ScenarioMIP \
           --table  Amon          \
           --grid   gr            \
           --variable   tas        \
           --experiment  ssp126    \
           --model      CNRM-CM6-1

None
/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r1i1p1f2/Amon/tas/gr/v20190219/
/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r2i1p1f2/Amon/tas/gr/v20190410/
/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r3i1p1f2/Amon/tas/gr/v20190410/
/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r4i1p1f2/Amon/tas/gr/v20190410/
/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r5i1p1f2/Amon/tas/gr/v20190410/
/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r6i1p1f2/Amon/tas/gr/v20190410/

Everything available on ESGF is also available locally


In [19]:
!ls /g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r2i1p1f2/Amon/tas/gr/v20190410/

tas_Amon_CNRM-CM6-1_ssp126_r2i1p1f2_gr_201501-210012.nc


### Use xarray to open ensemble data files

In [2]:
Dir='/g/data1b/oi10/replicas/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1'
Files=[Dir+'/ssp126/r1i1p1f2/Amon/tas/gr/v20190219/tas_Amon_CNRM-CM6-1_ssp126_r1i1p1f2_gr_201501-210012.nc',
      Dir+'/ssp126/r2i1p1f2/Amon/tas/gr/v20190410/tas_Amon_CNRM-CM6-1_ssp126_r2i1p1f2_gr_201501-210012.nc',
      Dir+'/ssp126/r3i1p1f2/Amon/tas/gr/v20190410/tas_Amon_CNRM-CM6-1_ssp126_r3i1p1f2_gr_201501-210012.nc',
      Dir+'/ssp126/r4i1p1f2/Amon/tas/gr/v20190410/tas_Amon_CNRM-CM6-1_ssp126_r4i1p1f2_gr_201501-210012.nc',
      Dir+'/ssp126/r5i1p1f2/Amon/tas/gr/v20190410/tas_Amon_CNRM-CM6-1_ssp126_r5i1p1f2_gr_201501-210012.nc',
      Dir+'/ssp126/r6i1p1f2/Amon/tas/gr/v20190410/tas_Amon_CNRM-CM6-1_ssp126_r6i1p1f2_gr_201501-210012.nc']

ds1=xr.open_dataset(Files[0])
ds2=xr.open_dataset(Files[1])
ds3=xr.open_dataset(Files[2])
ds4=xr.open_dataset(Files[3])
ds5=xr.open_dataset(Files[4])
ds6=xr.open_dataset(Files[5])

In [3]:
ds1.tas

### Concatenate ensemble files into one dataset

In [4]:
ds_new=xr.concat([ds1.tas, ds2.tas, ds3.tas, ds4.tas, ds5.tas, ds6.tas], 'new_dim')

Instead of reading each individual file and concatenate them, you can real them all in one dataset use open multiple datasets function. The procedure above aims to demonstrate the concatenate function in Xarray.

In [None]:
ds_all=xr.open_mfdataset(''+Dir+'/ssp126/r*i1p1f2/Amon/tas/gr/*/tas_Amon_CNRM-CM6-1_ssp126_r*i1p1f2_gr_201501-210012.nc', concat_dim='member_id')

### Data analysis and plotting

There exists uncertainty in model simulations, which is the reason that we need multipul models and multipul ensembles.

In [6]:
ds_yr=ds_all.mean(dim=('lat','lon')).resample(time='Y').mean(dim='time') #annual average data
ds_yr

### Add ensemble mean to dataset as member_id: mean

In [7]:
ds_yr_ens_mean=ds_yr.mean(dim='member_id')
ds_yr_addMean=xr.concat([ds_yr, ds_yr_ens_mean],'member_id')
ds_yr_addMean=ds_yr_addMean.assign_coords({"member_id": [1,2,3,4,5,6,'mean'] }) #change coordinates of member_id
ds_yr_addMean

In [None]:
ds_yr_addMean=xr.Dataset.to_array(ds_yr_addMean)[0,:,:]
ds_yr_addMean.plot.line(x='time', hue='member_id')

### Now we measure the average distance of individual ensemble members to the ensemble mean

In [10]:
import numpy as np
dis=np.sqrt((np.square(ds_yr-ds_yr.mean(dim='member_id'))).mean(dim='time'))
dis.values

<bound method Mapping.values of <xarray.Dataset>
Dimensions:  (member_id: 6)
Coordinates:
    height   float64 2.0
Dimensions without coordinates: member_id
Data variables:
    tas      (member_id) float32 dask.array<chunksize=(1,), meta=np.ndarray>>

Now we can see that the uncertainty is around 0.15 degree Celsius.

### Summary

This example shows how to concatenate multiple ensemble files and plot them all together to get the sense of model uncertainty. We can see different simulation members show different results regarding the future temperature projection under senario ssp126. 