# Correlation between CMIP6 and ERA5


This notebooks show the correlation of precipitation between CMIP6 model simulation and observation data from ERA5.

* Open CMIP6 and ERA5
* Data interpolation
* Calculate correlation
* Make correlation animation across 30 years
---

- Authors: NCI Virtual Research Environment Team
- Keywords: CMIP6, ERA5, correlation, animation
- Create Date: 2020-Jul
    
---

### Load libraries

In [1]:
import xarray as xr
import numpy as np
from matplotlib import pyplot as plt, animation
import os
import cftime
from IPython.display import display, HTML
%matplotlib inline

### Open CMIP6 data

In [2]:
#read a cmip6 example data
file='/g/data/oi10/replicas/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/pr/gr/v20180803/pr_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc'
if os.path.exists(file):
    ds_cmip6=xr.open_dataset(file)
units = 'months since 1850-1-15 00:00:00'
time_360_ref = cftime.num2date(np.arange(0, (2014-1850+1)*12), units, '360_day')
ds_cmip6=ds_cmip6.assign_coords(time=time_360_ref)
ds_cmip6

### Open observation data -- era5

In [3]:
ds_tp_mon = xr.open_dataset('/g/data/dk92/notebooks/demo_data/tp_era5_mon_global_197901_201812.nc')
ds_tp_mon = ds_tp_mon.rename({'latitude':'lat'}).rename({'longitude':'lon'}).rename({'tp':'pr'})
units = 'months since 1979-1-15 00:00:00'
time_360_ref = cftime.num2date(np.arange(0, (2018-1979+1)*12), units, '360_day')
ds_tp_mon=ds_tp_mon.assign_coords(time=time_360_ref)
ds_tp_mon=ds_tp_mon.assign_coords(lon=ds_tp_mon.lon+180) #lon coordinates (-180,180) to (0,360)

In [4]:
ds_tp_mon

### Data interptation -- remap ds_tp_mon to the same resolution as ds_cmip6

In [5]:
new_lon =ds_cmip6.lon
new_lat =ds_cmip6.lat
ds_tp_mon_i=ds_tp_mon.interp(lat=new_lat, lon=new_lon)
ds_tp_mon_i

### Calculate correlation

In [6]:
def get_corrcoef(inarray1, inarray2):

    return np.corrcoef(inarray1, inarray2)[0, 1]

#use year 2000 as example
cc = xr.apply_ufunc(

    get_corrcoef,    # First the function

    (ds_cmip6['pr'].sel(time=slice('2000','2000'))), (ds_tp_mon_i['pr'].sel(time=slice('2000','2000'))),        # then the input data arrays

    input_core_dims=[['time'], ['time']],

    vectorize=True

)

  c /= stddev[:, None]
  c /= stddev[None, :]


### We recommend you use dask to the correlation computation for better performance. 

In [8]:
# If you run this notebook on your local computer or NCI's VDI instance, you can create cluster
from dask.distributed import Client
client = Client()
print(client)

<Client: 'tcp://127.0.0.1:40921' processes=8 threads=48, memory=161.06 GB>


In [None]:
# If you run this notebook on Gadi under pangeo environment, you can create cluster using scheduler.json file
from dask.distributed import Client, LocalCluster
client = Client(scheduler_file='../scheduler.json')
print(client)

<div class="alert alert-info">
<b>Warning: Please make sure you specify the correct path to the schedular.json file within your environment.</b>  
</div>

Starting the Dask Client will provide a dashboard which is useful to gain insight into the computation. The link to the dashboard will become visible when you create the Client. We recommend having the Client open on one side of your screen and your notebook open on the other side, which will be useful for learning purposes.

### Plotting

Please note the following parts are not working within Pangeo environment on Gadi as it is restricted to external web access on the compute node. However, you can run them on your local computer, VDI and Gadi login node if the python environment is set up properly.

The following cell onlys work on your local computer and VDI. However, it does not affect you makeing an animation using later part of this tutorial. 

In [None]:
import cartopy.crs as ccrs
from mpl_toolkits.axes_grid1 import make_axes_locatable, axes_size
year=2000
fig = plt.figure()
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_title('Correlation of Pr CMIP6 vs ERA5 '+str(year)+'')
ax.coastlines()
colorlabel=np.linspace(-1, 1, 9)
#cc.plot(ax=ax, vmin=-1, vmax=1, cmap='coolwarm', add_colorbar=True, add_labels=True, cbar_kwargs=dict(orientation='horizontal', pad=0.05, shrink=1, label='Correlation', ticks=colorlabel, spacing='proportional'))
cc.plot(ax=ax, vmin=-1, vmax=1, cmap='coolwarm', add_colorbar=True, add_labels=True, cbar_kwargs={'orientation':'horizontal', 'pad':0.05, 'shrink':1, 'label':'Correlation', 'ticks':colorlabel, 'spacing':'proportional'})


### Make animation

In [None]:
fig = plt.figure()
ax = []
for i in range(30):
    year=1979+i
    cc = xr.apply_ufunc(

    get_corrcoef,    # First the function

    (ds_cmip6['pr'].sel(time=slice(str(year),str(year)))), (ds_tp_mon_i['pr'].sel(time=slice(str(year),str(year)))),        # then the input data arrays

    input_core_dims=[['time'], ['time']],

    vectorize=True

    )
    ax.append([])

    ax[i] = plt.axes(projection=ccrs.PlateCarree())
    ax[i].coastlines()
    cc.plot(ax=ax[i],
        vmin=-1, vmax=1,
        cmap='coolwarm',    # Change the colormap back to 'bwr'
        cbar_kwargs={
            'extend':'neither' # Don't extend the colorbar in either direction. Other possibilities
                               # would be 'both', 'min', or 'max'
        }
    )
    plt.title(f"Correlation of Pr CMIP6 vs ERA5 {year}")
    plt.savefig(f"images/Correlation_of_Pr_CMIP6_vs_ERA5_{year}.png")
    plt.close()
    

In [14]:
import glob
from PIL import Image
img, *imgs = [Image.open(f) for f in sorted(glob.glob('images/Correlation_of_Pr_CMIP6_vs_ERA5_*.png'))]
img.save(fp='images/correlation.gif', format='GIF', append_images=imgs,
         save_all=True, duration=200, loop=0)

In [17]:
display(HTML("<img src='image/correlation.gif' />"))

### Summary
This notebook shows correlation maps between CMIP6 and ERA5 data, which demonstrate the areas that model simulations are consistent to observations.

## Reference

https://vimeo.com/112794571