# XMHW tests on the OFAM3 dataset

Purpose
-------
    The following will investigate the capability of xmhw to parallelise the MHW analysis on a subset of temperature data from the OFAM3 - 10th degree resolution global simulation from 1980-2100. The simulation runs from 1980 to 2006 under JRA55 atmospheric forcing, and thereafter the reanalysis is repeated but with the addition of the RCP8.5 climate trend.

    Contents:
        1. Load in Temperature Data and visualise (2D in space, 1D in time)
        2. Select the region around Australia to perform the heatwave analysis and throw rest away
        3. Calculate the climatology required for the heatwave analysis and save as a new netcdf file
            [ this will be read in later and in a new session for performing the heatwave analysis ]
        4. Perform heatwave analysis using xmhw by iterating around the subsetted grid

Thanks to John Reilly for sharing his [code](https://github.com/Thomas-Moore-Creative/shared_sandbox/blob/main/mhw-3d-scalingTests-gadiJup.ipynb)
    


 some sandbox edits here from Thomas Moore - 27 April 2024

### imports

In [2]:
import sys
import os

### data handling
import numpy as np
import pandas as pd
import xarray as xr
import scipy as sci

### plotting
import matplotlib.pyplot as plt
from matplotlib import ticker
from matplotlib.gridspec import GridSpec
import matplotlib.colors as mcolors
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cmocean.cm as cmo
from cmocean.tools import lighten

### marine heatwaves python package
from xmhw.xmhw import threshold, detect

# print versions of packages
print("python version =",sys.version[:5])
print("numpy version =", np.__version__)
print("pandas version =", pd.__version__)
print("xarray version =", xr.__version__)
print("scipy version =", sci.__version__)
print("matplotlib version =", sys.modules[plt.__package__].__version__)
print("cmocean version =", sys.modules[cmo.__package__].__version__)
print("cartopy version =", sys.modules[ccrs.__package__].__version__)

python version = 3.10.
numpy version = 1.26.4
pandas version = 2.2.1
xarray version = 2024.3.0
scipy version = 1.12.0
matplotlib version = 3.8.3
cmocean version = v3.0.3
cartopy version = 0.22.0


### import the dask client for assessing performance

In [3]:
from dask.distributed import Client
client = Client(threads_per_worker=1)
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/8787/status,

0,1
Dashboard: /proxy/8787/status,Workers: 28
Total threads: 28,Total memory: 251.18 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:33815,Workers: 28
Dashboard: /proxy/8787/status,Total threads: 28
Started: Just now,Total memory: 251.18 GiB

0,1
Comm: tcp://127.0.0.1:41483,Total threads: 1
Dashboard: /proxy/37817/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:42319,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-ur_uw3bh,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-ur_uw3bh

0,1
Comm: tcp://127.0.0.1:35825,Total threads: 1
Dashboard: /proxy/43821/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:44325,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-7uhzfc_t,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-7uhzfc_t

0,1
Comm: tcp://127.0.0.1:40331,Total threads: 1
Dashboard: /proxy/40377/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:33581,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-45zk_goa,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-45zk_goa

0,1
Comm: tcp://127.0.0.1:45613,Total threads: 1
Dashboard: /proxy/37121/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:40853,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-eufe6t7t,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-eufe6t7t

0,1
Comm: tcp://127.0.0.1:44647,Total threads: 1
Dashboard: /proxy/42215/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:44653,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-29m8mgt9,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-29m8mgt9

0,1
Comm: tcp://127.0.0.1:43693,Total threads: 1
Dashboard: /proxy/33833/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:32913,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-pzugshyo,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-pzugshyo

0,1
Comm: tcp://127.0.0.1:45417,Total threads: 1
Dashboard: /proxy/44207/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:34377,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-h4_jqf6t,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-h4_jqf6t

0,1
Comm: tcp://127.0.0.1:33863,Total threads: 1
Dashboard: /proxy/42571/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:33731,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-a87uxh5a,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-a87uxh5a

0,1
Comm: tcp://127.0.0.1:44291,Total threads: 1
Dashboard: /proxy/33137/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:45329,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-vkkt_2gx,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-vkkt_2gx

0,1
Comm: tcp://127.0.0.1:33173,Total threads: 1
Dashboard: /proxy/40345/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:38805,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-3tr9d0kl,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-3tr9d0kl

0,1
Comm: tcp://127.0.0.1:45583,Total threads: 1
Dashboard: /proxy/33857/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:35875,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-xsz_711o,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-xsz_711o

0,1
Comm: tcp://127.0.0.1:43551,Total threads: 1
Dashboard: /proxy/45045/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:40979,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-m64swy45,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-m64swy45

0,1
Comm: tcp://127.0.0.1:37865,Total threads: 1
Dashboard: /proxy/33243/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:44507,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-0ba0zdsp,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-0ba0zdsp

0,1
Comm: tcp://127.0.0.1:44413,Total threads: 1
Dashboard: /proxy/38991/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:34729,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-w7o02c8o,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-w7o02c8o

0,1
Comm: tcp://127.0.0.1:38481,Total threads: 1
Dashboard: /proxy/41507/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:40395,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-t0onvuyn,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-t0onvuyn

0,1
Comm: tcp://127.0.0.1:45297,Total threads: 1
Dashboard: /proxy/43015/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:39727,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-zj8iqax4,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-zj8iqax4

0,1
Comm: tcp://127.0.0.1:38071,Total threads: 1
Dashboard: /proxy/42497/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:36039,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-1lp4rdha,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-1lp4rdha

0,1
Comm: tcp://127.0.0.1:44169,Total threads: 1
Dashboard: /proxy/43525/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:46589,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-ndv0i1ju,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-ndv0i1ju

0,1
Comm: tcp://127.0.0.1:42755,Total threads: 1
Dashboard: /proxy/46515/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:42705,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-2gi829bd,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-2gi829bd

0,1
Comm: tcp://127.0.0.1:37665,Total threads: 1
Dashboard: /proxy/43295/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:32917,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-9tpx4mqs,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-9tpx4mqs

0,1
Comm: tcp://127.0.0.1:39685,Total threads: 1
Dashboard: /proxy/38767/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:44071,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-vri8yvs8,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-vri8yvs8

0,1
Comm: tcp://127.0.0.1:35735,Total threads: 1
Dashboard: /proxy/42837/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:34143,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-j6u25bdt,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-j6u25bdt

0,1
Comm: tcp://127.0.0.1:41785,Total threads: 1
Dashboard: /proxy/45901/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:33931,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-_8owh0of,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-_8owh0of

0,1
Comm: tcp://127.0.0.1:45927,Total threads: 1
Dashboard: /proxy/43923/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:40947,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-lj_t6o9w,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-lj_t6o9w

0,1
Comm: tcp://127.0.0.1:43159,Total threads: 1
Dashboard: /proxy/35817/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:33217,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-ne968rg9,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-ne968rg9

0,1
Comm: tcp://127.0.0.1:41087,Total threads: 1
Dashboard: /proxy/38417/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:44993,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-pqnvrztq,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-pqnvrztq

0,1
Comm: tcp://127.0.0.1:39515,Total threads: 1
Dashboard: /proxy/45911/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:43625,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-2k91bls0,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-2k91bls0

0,1
Comm: tcp://127.0.0.1:40325,Total threads: 1
Dashboard: /proxy/37399/status,Memory: 8.97 GiB
Nanny: tcp://127.0.0.1:41359,
Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-4nc8apdj,Local directory: /jobfs/114626156.gadi-pbs/dask-scratch-space/worker-4nc8apdj


## grab the historical temperature data from fp2

In [4]:
wrkdir = "/g/data/fp2/OFAM3" #not using os.chdir() 

### what is native chunking ???
```
(base) tm4888@gadi-login-04 /g/data/fp2/OFAM3/jra55_historical.1/surface du -hs ocean_temp_sfc_2011_12.nc
320M	ocean_temp_sfc_2011_12.nc

short temp(Time, st_ocean, yt_ocean, xt_ocean) ;
		temp:long_name = "Potential temperature" ;
		temp:units = "degrees C" ;
		temp:valid_range = -32767s, 32767s ;
		temp:missing_value = -32768s ;
		temp:_FillValue = -32768s ;
		temp:packing = 4 ;
		temp:scale_factor = 0.001678518f ;
		temp:add_offset = 45.f ;
		temp:cell_methods = "time: mean" ;
		temp:time_avg_info = "average_T1,average_T2,average_DT" ;
		temp:coordinates = "geolon_t geolat_t" ;
		temp:standard_name = "sea_water_potential_temperature" ;
```

  
#### `du -hs` reveals no chunking information. Is temp a single chunk? Short variable is 320MB so bigger float value expected once loaded

In [5]:
# preprocesser to drop unwanted variables
def drop_stuff(ds, coords_to_drop,vars_to_drop):
    """
    Preprocessor function to drop specified coordinates and variables from a dataset loaded via xr.open_mfdataset

    Parameters:
        ds (xarray.Dataset): The dataset from which coordinates & variables are to be dropped.
        coords_to_drop (list of str): List of coordinate names to drop.
        vars_to_drop(list of str): List of variable names to drop

    Returns:
        xarray.Dataset: Dataset with specified coordinates and variables dropped.
    """
    # Drop coordinates if they are in the dataset
    ds = ds.drop_vars(coords_to_drop, errors='ignore')
    ds = ds.drop_vars(vars_to_drop, errors='ignore')
    return ds

In [6]:
%%time
coords_to_drop =['st_edges_ocean','nv']
vars_to_drop =['Time_bounds','average_DT','average_T1','average_T2']
sst = xr.open_mfdataset(wrkdir + "/jra55_historical.1/surface/ocean_temp_sfc_*.nc", parallel=True,preprocess=lambda x: drop_stuff(x, coords_to_drop,vars_to_drop)).squeeze() #combine='by_coords' is default
sst

CPU times: user 4.8 s, sys: 600 ms, total: 5.4 s
Wall time: 7.38 s


Unnamed: 0,Array,Chunk
Bytes,264.51 GiB,638.58 MiB
Shape,"(13149, 1500, 3600)","(31, 1500, 3600)"
Dask graph,432 chunks in 866 graph layers,432 chunks in 866 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 264.51 GiB 638.58 MiB Shape (13149, 1500, 3600) (31, 1500, 3600) Dask graph 432 chunks in 866 graph layers Data type float32 numpy.ndarray",3600  1500  13149,

Unnamed: 0,Array,Chunk
Bytes,264.51 GiB,638.58 MiB
Shape,"(13149, 1500, 3600)","(31, 1500, 3600)"
Dask graph,432 chunks in 866 graph layers,432 chunks in 866 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## iterate around the australian continent and compute the heatwaves

### calculate the climatology
    which we will use later for calculating the marine heatwaves in a subsequent step

In [7]:
%%time

sst['doy'] = sst['time'].dt.dayofyear
sst = sst.chunk({"time":-1, "yt_ocean":10, "xt_ocean":10})
sst


CPU times: user 2.49 s, sys: 66.9 ms, total: 2.56 s
Wall time: 2.18 s


Unnamed: 0,Array,Chunk
Bytes,14.33 GiB,5.02 MiB
Shape,"(13149, 450, 650)","(13149, 10, 10)"
Dask graph,2925 chunks in 1 graph layer,2925 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 14.33 GiB 5.02 MiB Shape (13149, 450, 650) (13149, 10, 10) Dask graph 2925 chunks in 1 graph layer Data type float32 numpy.ndarray",650  450  13149,

Unnamed: 0,Array,Chunk
Bytes,14.33 GiB,5.02 MiB
Shape,"(13149, 450, 650)","(13149, 10, 10)"
Dask graph,2925 chunks in 1 graph layer,2925 chunks in 1 graph layer
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### calculate the daily climatology and 90th percentile threshold to define a MHW

In [8]:
%%time

ii = 0
jj = 0
di = 50
dj = 50

print("Calculating the climatology and threshold")
seas_list = []
thresh_list = []
for ii in np.arange(0,len(sst.coords['xt_ocean']),di):
    print(ii)
    for jj in np.arange(0,len(sst.coords['yt_ocean']),dj):
        tmp = sst.isel(xt_ocean=slice(ii,ii+di), yt_ocean=slice(jj,jj+dj))
        seas_list.append(tmp.groupby('doy').mean(dim='time').compute())
        thresh_list.append(tmp.groupby('doy').quantile(0.9, dim='time', skipna=True).compute())

        
### merge the lists into single xarrays with the results
print("Merging results")
seas_new = xr.merge(seas_list)
thresh_new = xr.merge(thresh_list)


Calculating the climatology and threshold
0
50
100
150
200
250
300
350
400
450
500
550
600
Merging results
CPU times: user 10min 51s, sys: 7min 56s, total: 18min 48s
Wall time: 17min 25s


### perform rolling mean average (moving window) across the time dimension and snip ends

In [9]:
climatology = seas_new.pad(doy=(31-1)//2, mode='wrap').rolling(doy=31, center=True).mean()
threshold90 = thresh_new.pad(doy=(31-1)//2, mode='wrap').rolling(doy=31, center=True).mean(skipna=True)

climatology = climatology.chunk({'doy':-1, 'yt_ocean':50, 'xt_ocean':50}).isel(doy=slice(15,-15))
threshold90 = threshold90.chunk({'doy':-1, 'yt_ocean':50, 'xt_ocean':50}).isel(doy=slice(15,-15)).drop_vars('quantile')



In [10]:
print("Size (Mb) of daily climatology = %i"%(climatology.nbytes/1e6))
print("Size (Mb) of daily threshold90 = %i"%(threshold90.nbytes/1e6))

Size (Mb) of daily climatology = 428
Size (Mb) of daily threshold90 = 856


### save to disk

In [11]:
%%time
os.chdir("/g/data/es60/pjb581/heatwaves")
os.getcwd()

print("Saving climatology and threshold to disk")
climatology.to_netcdf('Australian_SST_daily_climatology.nc', mode='w')
threshold90.to_netcdf('Australian_SST_daily_MHWthreshold.nc', mode='w')


Saving climatology and threshold to disk


PermissionError: [Errno 13] Permission denied: b'/g/data/es60/pjb581/heatwaves/Australian_SST_daily_climatology.nc'