# Derived indices for model predictors

Along with some base variables processed in `2_hourly_to_daily.ipynb` (https://are.nci.org.au/node/gadi-cpu-bdw-0054.gadi.nci.org.au/45142/lab/tree/g/data/w42/dr6273/work/demand_model/2_hourly_to_daily.ipynb), here we compute additional predictors that use those variables.

What climate metrics might affect energy demand?

Bloomfield et al (2019) https://rmets.onlinelibrary.wiley.com/doi/10.1002/met.1858 :
- Heating Degree Days (based on daily mean T) - Done
- Cooling Degree Days (based on daily mean T) - Done

van der Wiel et al (2019) https://www.sciencedirect.com/science/article/pii/S1364032119302862#sec3 :
- Daily mean temperature (two linear regimes combined) - Done.

Kang & Reiner (2022) https://www.sciencedirect.com/science/article/pii/S014098832200189X :
- This study is from Ireland
- Number of sun hours - radiation as proxy, Done
- Wind speed - Done
- humidity - Done
- Rainfall - Done

Me spitballing:
- Maximum daily T - Done
- Minimum daily T - Done
- Humidity (daily average and/or overnight?) - daily avg only Done
- Heatwaves (e.g. EHF or even just 3-day T?)

Also useful read: https://www.energycouncil.com.au/media/mejc2mfz/extreme-weather-and-electricity-supply.pdf
- Heatwaves in multiple states (might not affect demand though, just ability to supply)
- Demand peaks in 3rd and 4th day of a heatwave (so 4-day T?)
- Cloud cover (affects demand on grid by reducing rooftop solar's ability to smooth the peaks) - radiation as proxy, Done

Could also consider:
- weighting the selected metrics by population (we eventually want state and national figures)
- Detrending everything, including demand.

In [1]:
from dask.distributed import Client,LocalCluster
from dask_jobqueue import PBSCluster

import bottleneck

In [2]:
# client.close()
# cluster.close()

In [16]:
# One node on Gadi has 48 cores - try and use up a full node before going to multiple nodes (jobs)

walltime = "00:30:00"
cores = 48
memory = str(4 * cores) + "GB"

cluster = PBSCluster(walltime=str(walltime), cores=cores, memory=str(memory), processes=cores,
                                          
                     job_extra_directives=["-q normal",
                                           "-P w42",
                                           "-l ncpus="+str(cores),
                                           "-l mem="+str(memory),
                                           "-l storage=gdata/w42+gdata/rt52"],
                     local_directory="$TMPDIR",
                     job_directives_skip=["select"])

In [17]:
cluster.scale(jobs=1)
client = Client(cluster)

In [18]:
client

0,1
Connection method: Cluster object,Cluster type: dask_jobqueue.PBSCluster
Dashboard: /proxy/8787/status,

0,1
Dashboard: /proxy/8787/status,Workers: 0
Total threads: 0,Total memory: 0 B

0,1
Comm: tcp://10.6.121.15:39693,Workers: 0
Dashboard: /proxy/8787/status,Total threads: 0
Started: Just now,Total memory: 0 B


In [3]:
import xarray as xr
import numpy as np

In [4]:
era_path = "/g/data/w42/dr6273/work/data/era5/"

In [5]:
years = range(1959, 2023)

# Cooling degree day

Difference between daily average temperature and a comfort-level temperature of 24 degrees C, if the daily average temperature is above 24C.

http://www.bom.gov.au/climate/maps/averages/degree-days/#:~:text=The%20heating%20degree%20days%20or,24%20degrees%20Celsius%20for%20cooling.

In [19]:
def calc_cdd(T, comfort=24):
    """
    Cooling Degree Day.
    
    T: array of daily average temperature in degrees Celsius.
    """
    return (T - comfort).where(T > comfort, 0)

In [53]:
# Need to chunk when we open these files!
T = xr.open_mfdataset(era_path+"2t/daily/*.nc", chunks={"time": "200MB"})

In [54]:
T = T - 273.15

In [55]:
T.nbytes / 1024 ** 3

90.41259867325425

In [20]:
cdd = calc_cdd(T)

In [21]:
cdd = cdd.rename({"t2m": "cdd"})

In [22]:
# Need to chunk again so that we have uniform chunk sizes
cdd = cdd.chunk({"time": "200MB"})

In [24]:
cdd.to_zarr(
    era_path + "/derived/cdd_24_era5_daily_1959-2022.zarr",
    mode="w",
    consolidated=True
)

<xarray.backends.zarr.ZarrStore at 0x14ca5c219540>

# Heating degree day

Difference between daily average temperature and a comfort-level temperature of 18 degrees C, if the daily average temperature is below 24C.

http://www.bom.gov.au/climate/maps/averages/degree-days/#:~:text=The%20heating%20degree%20days%20or,24%20degrees%20Celsius%20for%20cooling.

In [56]:
def calc_hdd(T, comfort=18):
    """
    Heating Degree Day.
    
    T: array of daily average temperature in degrees Celsius.
    """
    return (comfort - T).where(T < comfort, 0)

In [57]:
hdd = calc_hdd(T)

In [58]:
hdd = hdd.rename({"t2m": "hdd"})

In [59]:
# Need to chunk again so that we have uniform chunk sizes
hdd = hdd.chunk({"time": "200MB"})

In [60]:
hdd.to_zarr(
    era_path + "derived/hdd_18_era5_daily_1959-2022.zarr",
    mode="w",
    consolidated=True
)

<xarray.backends.zarr.ZarrStore at 0x152743563ed0>

# 3- and 4-day rolling CDD and HDD

### AUS ONLY!

I can't figure out how to apply `rolling` operations on global data without killing workers, so let's do it by region for now.

In [73]:
def region_roll_and_write(region_coords, da_name, k):
    """
    Compute rolling mean of a region subset of da and write to zarr store.
    
    region_coords: dict, with 'name': str,'latitude': slice and 'longitude'" slice
    da_name: str, first part of zarr store name to read
             i.e. <da_name>_era5_daily_1959-2022.zarr
    k: int, window length for rolling
    """
    ds = xr.open_zarr(
        era_path + "derived/"+da_name+"_era5_daily_1959-2022.zarr",
        consolidated=True
    )
    
    ds = ds.sel({
        "longitude": region_coords["longitude"],
        "latitude": region_coords["latitude"]
    })
    ds_roll = ds.rolling(time=k).mean()
    ds_roll = ds_roll.chunk({"time": "200MB"})
    ds_roll.to_zarr(
        era_path + "derived/"+da_name+"_"+region_coords["name"]+"_rollmean"+str(k)+"_era5_daily_1959-2022.zarr",
        mode="w",
        consolidated=True
    )

In [74]:
aus_coords = {
    "name": "Aus",
    "longitude": slice(110, 155),
    "latitude": slice(-10, -45)
}

In [75]:
for name in ["hdd_18", "cdd_24"]:
    for k in [3, 4]:
        region_roll_and_write(aus_coords, name, k)