# Calculating mean climatology and threshold of daily tmax for Australia (based on Ritwik Misra's MHW code)

This has the code required to calculate the mean climatology and 90th percentile of daily maximum temperature

## Import modules and define functions 

Here we import the required modules, and initialize a dask cluster for parallel computing 

In [1]:
import dask as da
from dask.distributed import LocalCluster, Client
from datetime import date
import glob
import numpy as np
import time
import xarray as xr
%pylab inline
local_dir = "/g/data/e14/cp3790/dask-workers" #Replace this with your local directory 
cluster = LocalCluster(processes=False, local_dir=local_dir)
client = Client(cluster)
client

Populating the interactive namespace from numpy and matplotlib




0,1
Client  Scheduler: inproc://10.0.64.15/16484/1  Dashboard: http://10.0.64.15/16484/1:8787/status,Cluster  Workers: 1  Cores: 8  Memory: 33.67 GB


Function created by Guillaume Serazin to reshape data 

In [2]:
def reshape_data(da):
        da_groupby = list(da.groupby('time.dayofyear'))
        dayofyear = []
        da_dayofyear = []
        for item in list(da_groupby):
            dayofyear.append(item[0])
            da_tmp = item[1]
            da_tmp['time'] = da_tmp['time.year']
            da_tmp = da_tmp.rename({'time': 'year'})
            da_tmp = da_tmp.assign_coords(dayofyear=item[0])
            da_dayofyear.append(da_tmp)
        da_reshaped = xr.concat(da_dayofyear, dim='dayofyear')
        return da_reshaped

## Opening files and loading data

In [5]:
files = sorted(glob.glob('/g/data/e14/cp3790/Charuni/ERA5-new/era5_dailytmax_*.nc'))

obs_aus = (xr.open_mfdataset(files, combine='nested', concat_dim='time', chunks={'latitude': 10})
           .sel(time=slice('1983', '2012'), longitude=slice(113, 154), latitude=slice(-10, -44)))
#baseline period for my calculation is 1983-2012
baseline_tmax = obs_aus["dmax"]
baseline_tmax.attrs['units'] = 'deg C'

## Reshaping and smoothing the data

We get the data ready to have a rolling mean performed upon it. This is performed in the reshape_data function (created by Guillaume Serazin)

In [7]:
reshaped_tmax = reshape_data(baseline_tmax)

The data is 'circularized', making rolling window processes possible for first and last days.

In my analysis, I have retained the leap days, but if you wanted to drop them -->
#tmax_reshaped = reshaped_tmax.isel(dayofyear = slice(0,-1)) 

In [8]:
start = reshaped_tmax[:31] # the first 31 days 
start['dayofyear'] = range(366,397) # the first 31 days will be 'stitched' to the last 31 days 
end = reshaped_tmax[-31:] # the last 31 days 
end['dayofyear'] = range(-30, 1) # the last 31 days will be 'stitched' to the first 31 days 
circular_tmax = xr.concat([end, reshaped_tmax, start], dim = 'dayofyear').chunk({'dayofyear' : 31})

## Calculating mean climatology 

Data is smoothed once using a 15 day rolling window, and subsequently by a 31-day window

In [9]:
raw_tmax = circular_tmax.mean('year')
tmax_climatology_smooth = raw_tmax.rolling(dayofyear = 15, center = True).mean() 
tmax_climatology_smoother = tmax_climatology_smooth.rolling(dayofyear = 31, center = True).mean() 

In [10]:
tmax_climatology = tmax_climatology_smoother.isel(dayofyear = slice(31,-31)) # drop the first and last 31 days 

Create a netCDF file to store the mean climatology values

In [20]:
xr.Dataset({'climatology': tmax_climatology}).to_netcdf('/g/data/e14/cp3790/Charuni/climatology-australia.nc',
                                              encoding={'climatology': 
                                                        {'chunksizes': (100, tmax_climatology.shape[1], tmax_climatology.shape[2]),
                                                         'zlib': True,
                                                         'shuffle': True, 
                                                         'complevel': 2}})  

## Calculating threshold

Unlike in mean climatology (in which you get the rolling mean in a straightforward way) this uses np.nanpercentile,
so we have to get a rolling construct first and then do the 90th percentile calculation

In [15]:
percRolling = circular_tmax.rolling(dayofyear=15, center=True).construct('rolling_days')
# This takes in the circular_tmax, performs the percentile calculation, then creates a new dimension and coordinate names 
# to prepapre the data for the final output
stacked = percRolling.stack(z = ('rolling_days', 'year'))
rawPerc_data = da.array.apply_along_axis(np.nanpercentile, stacked.get_axis_num('z'), stacked.data, 90)
tmax_coords = circular_tmax.coords
new_coords = {name : tmax_coords[name] for name in tmax_coords if name != 'year'}
new_dims = [name for name in circular_tmax.dims if name != 'year']
rawPerc = xr.DataArray(rawPerc_data, coords = new_coords, dims = new_dims)

Final rolling mean over a month (smoothing)

In [17]:
tmax_threshold = rawPerc.rolling(dayofyear=31, center = True).mean()
print("Data smoothed, DONE.")
tmax_threshold = tmax_threshold.isel(dayofyear = slice(31,-31))
print("First and last 31 days sliced")

Data smoothed, DONE.
First and last 31 days sliced


Create a netCDF file to store the threshold values

In [24]:
xr.Dataset({'threshold': tmax_threshold}).to_netcdf('/g/data/e14/cp3790/Charuni/threshold-australia.nc',
                                              encoding={'threshold': 
                                                        {'chunksizes': (100, tmax_threshold.shape[1], tmax_threshold.shape[2]),
                                                         'zlib': True,
                                                         'shuffle': True, 
                                                         'complevel': 2}})  