# Predictors of model

What climate metrics might affect energy demand?

Bloomfield et al (2019) https://rmets.onlinelibrary.wiley.com/doi/10.1002/met.1858 :
- Heating Degree Days (based on daily mean T)
- Cooling Degree Days (based on daily mean T)

van der Wiel et al (2019) https://www.sciencedirect.com/science/article/pii/S1364032119302862#sec3 :
- Daily mean temperature (two linear regimes combined)

Kang & Reiner (2022) https://www.sciencedirect.com/science/article/pii/S014098832200189X :
- This study is from Ireland
- Number of sun hours
- Wind speed, humidity (less important than T)
- Rainfall (affects length of time spent indoors)

Me spitballing:
- Maximum daily T
- Minimum daily T
- Humidity (daily average and/or overnight?)
- Heatwaves (e.g. EHF or even just 3-day T?)

Also useful read: https://www.energycouncil.com.au/media/mejc2mfz/extreme-weather-and-electricity-supply.pdf
- Heatwaves in multiple states (might not affect demand though, just ability to supply)
- Demand peaks in 3rd and 4th day of a heatwave (so 4-day T?)
- Cloud cover (affects demand on grid by reducing rooftop solar's ability to smooth the peaks)

Could also consider:
- weighting the selected metrics by population (we eventually want state and national figures)
- Detrending everything, including demand.

In [1]:
from dask.distributed import Client,LocalCluster
from dask_jobqueue import PBSCluster

In [2]:
# One node on Gadi has 48 cores - try and use up a full node before going to multiple nodes (jobs)

walltime = "05:00:00"
cores = 24
memory = str(4 * cores) + "GB"

cluster = PBSCluster(walltime=str(walltime), cores=cores, memory=str(memory), processes=cores,
                     job_extra_directives=["-q normal",
                                           "-P w42",
                                           "-l ncpus="+str(cores),
                                           "-l mem="+str(memory),
                                           "-l storage=gdata/w42+gdata/rt52"],
                     local_directory="$TMPDIR",
                     job_directives_skip=["select"])

In [3]:
cluster.scale(jobs=1)
client = Client(cluster)

In [4]:
client

0,1
Connection method: Cluster object,Cluster type: dask_jobqueue.PBSCluster
Dashboard: /proxy/8787/status,

0,1
Dashboard: /proxy/8787/status,Workers: 0
Total threads: 0,Total memory: 0 B

0,1
Comm: tcp://10.6.121.1:45233,Workers: 0
Dashboard: /proxy/8787/status,Total threads: 0
Started: Just now,Total memory: 0 B


In [5]:
import xarray as xr
import numpy as np

In [6]:
%cd /g/data/w42/dr6273/work/energy_climate_modes

import functions as fn

/g/data/w42/dr6273/work/energy_climate_modes


In [7]:
root_path = "/g/data/rt52/era5/single-levels/reanalysis/"
write_path = "/g/data/w42/dr6273/work/data/era5/"

### Note:
We will process this differently to the energy/climate modes analysis (https://github.com/dougrichardson/energy_climate_modes/blob/main/1_hourly_to_daily.ipynb), because:
- we only need Australia (for now)
- we need the correct time zone, as demand data

In [8]:
years = range(1959, 2023)

In [9]:
desired_first_hour = 0 # Set the desired first hour of the day

In [10]:
def hourly_to_daily(aggregate_function, variable, year, first_hour, data_path=root_path):
    """
    Compute 24-hour aggregates from hourly data for a given year.
    
    aggregate_function: function to aggregate hourly data (e.g. mean, max)
    variable: name of variable to process
    year:  year to process
    first_hour: desired first hour from which to compute 24-hour aggregations
    data_path: path to hourly data
    """
    # Open all hours in the year (~33 GB)
    hourly = xr.open_mfdataset(
        data_path + variable + "/" + str(year) + "/*.nc",
        chunks={"time": 24}
    )

    # Start the aggregation on the desired hour (e.g. 0000)
    data_first_hour = hourly["time"].dt.hour.item(0)
    desired_start_index = (first_hour - data_first_hour) % 24
    hourly = hourly.isel(time=range(desired_start_index, len(hourly["time"])))

    # Resample to daily means
    daily = aggregate_function(hourly)

    # Re-assign time to be 0000 hour of each day
    daily = daily.assign_coords({"time": daily["time"].dt.floor("1D")})
    
    return daily

In [27]:
def write_by_year(aggregate_function, variable, years, first_hour, data_path, write_path, encoding_name=None):
    """
    Compute 24-hour aggregations from hourly data and write to file for each year.
    
    aggregate_function: function to aggregate hourly data (e.g. mean, max)
    variable: name of variable to process
    years: range of years to process
    first_hour: desired first hour from which to compute 24-hour aggregations
    data_path: path to hourly data
    encoding_name: name of variable to encode. Usually the same as variable.
    """
    for year in years:
        print(year)

        daily = hourly_to_daily(aggregate_function, variable, year, first_hour, data_path=data_path)
        
        write_daily(daily, variable, year, first_hour, write_path=write_path, encoding_name=encoding_name)

In [28]:
def write_daily(daily, variable, year, first_hour, write_path=write_path, encoding_name=None):
    """
    Write daily data to file
    
    daily: DataArray to be written
    variable: name of variable
    year: year being written
    data_path: path to hourly data
    encoding_name: name of variable to encode. Usually the same as variable.
    """
    # Chunk
    daily = daily.chunk({"time": 24})

    # Write to netcdf
    if isinstance(encoding_name, str):
        name = encoding_name
    else:
        name = variable
        
    encoding = {
        name: {"dtype": "float32"}
    }
    daily.to_netcdf(
        write_path + variable + "/daily/" + variable + "_era5_daily_" + str(year) + ".nc",
        mode="w",
        encoding=encoding
    )

In [13]:
def daily_mean(da):
    """
    24-hour mean of da.
    """
    return da.coarsen(time=24, boundary="trim").mean(keep_attrs=True)

In [None]:
data_path = "/g/data/w42/dr6273/work/data/era5"