### Extreme Event Case Study: Labor Day 2022 Heatwave

The **Historical Observations Data Platform** is a cloud-based, historical weather observations dataset that enables access to high-quality, rigorously quality-controlled open climate and weather data. The historical weather stations included in this dataset include information that can assess the severity, duration, frequency, and rate of change over time of extreme weather events, as well as supporting projections downscaling efforts. Stringent QA/QC procedures, in-line with international protocols, are applied with custom modifications relevant to the Western US and the energy sector are included (such as temperature and precipitation extremes, winds, and solar radiation). This notebook is a detailed investigation into how the QA/QC protocol performed during a known extreme event that stressed communities and the electric grid. 


The **Labor Day 2022 Heatwave** was a 10-day significant extreme heat event impacting the Western US, particularly California, with record-breaking air temperatures that strained the power grid. The heatwave resulted in widespread power outages across the state with severe impacts including [extreme loss of life](https://www.cdph.ca.gov/Programs/OHE/CDPH%20Document%20Library/Climate-Health-Equity/CDPH-2022-Heat-Wave-Excess-Mortality-Report.pdf), [unprecedented high energy demand](https://www.sdgetoday.com/news/conservation-efforts-and-energy-storage-support-grid-reliability-during-labor-day-heatwave#:~:text=California%20was%20hot%20this%20past,the%20state%20weathered%20the%20heat.), and multiple world records broken for air temperature. <br>
Records broken: 
- Sacramento, 116°F
- Livermore, 116°F, *record last set in 1950*
- Pasadena, 103°F, *record last set in 1938*
- Long Beach, 109°F


In [1]:
# Import relevant python packages
import pandas as pd
import xarray as xr

from case_study_eval_utils import *
# event_info, event_subset

In [2]:
# Settings for Labor Day Heatwave
event = "aug2022_heatwave"
event_start_date = "2022-08-30"
event_end_date = "2022-09-09" 

# Read in all stations list
all_stns = pd.read_csv(
    "s3://wecc-historical-wx/4_merge_wx/all_network_stationlist_merge.csv"
)

In [14]:
all_stns.columns

Index(['Unnamed: 0', 'era-id', 'latitude', 'longitude', 'elevation',
       'start-date', 'end-date', 'pulled', 'time_checked', 'network',
       'cleaned', 'time_cleaned', 'tas_nobs', 'tdps_nobs', 'tdps_derived_nobs',
       'ps_nobs', 'ps_derived_nobs', 'psl_nobs', 'ps_altimeter_nobs',
       'pr_nobs', 'pr_5min_nobs', 'pr_1h_nobs', 'pr_24h_nobs',
       'pr_localmid_nobs', 'hurs_nobs', 'sfcwind_nobs', 'sfcwind_dir_nobs',
       'rsds_nobs', 'total_nobs', 'qaqc', 'time_qaqc', 'merged', 'time_merge'],
      dtype='object')

In [21]:
def event_subset(
    df: pd.DataFrame,
    event: str,
    buffer: int | None = 7,
    alt_start_date: str | None = None,
    alt_end_date: str | None = None,
) -> pd.DataFrame:
    """
    Subsets for the event itself + buffer around to identify event.

    Parameters
    ---------
    df: pd.DataFrame
        stationlist dataframe
    event : str
        name of event
    buffer : int, optional
        number of days to include as a buffer around event start/end date
    alt_start_date : str
        date of different event, must be in format "YYYY-MM-DD"
    alt_end_date : str
        date of different event, must be in format "YYYY-MM-DD"

    Returns
    -------
    event_sub : pd.DataFrame
        subset of stationlist within date range of event or alternative
    """

    print(
        f"Subsetting station record for event duration with {str(buffer)} day buffer..."
    )

    alt_start_date = datetime.datetime(alt_start_date)
    alt_end_date = datetime.datetime(alt_end_date)

    # set to searchable datetime
    df["start-date"] = pd.to_datetime(df["start-date"])
    df["end-date"] = pd.to_datetime(df["end-date"])

    # grab dates from lookup dictionary
    event_start, event_end = event_info(event, alt_start_date, alt_end_date)

    # subset for event dates + buffer
    datemask = (
        df["start-date"]
        >= (pd.Timestamp(event_start) - datetime.timedelta(days=buffer))
    ) & (df["end-date"] <= (pd.Timestamp(event_end) + datetime.timedelta(days=buffer)))
    event_sub = df.loc[datemask]

    return event_sub

In [22]:
# Retrieve all stations that report observations during the event, using a 2-week window on either side
event_stns = event_subset(
    all_stns,
    event,
    7,
    "2022-08-30",
    "2022-09-09"
) 
 

# # subset all stations for time dates, probably within a 2 week window on either side
# depending on the event, susbet for specific variables (ask Victoria)

Subsetting station record for event duration with 7 day buffer...


TypeError: 'str' object cannot be interpreted as an integer

In [None]:
# produce simple timeseries plots of variable over the event
# include QC flags
# if possible, add shaded bars or something (look at old code) to indicate the event itself

In [None]:
# some kind of map

In [None]:
# table / stats "read out" on extremes during the event

In [None]:
# table / stats "read out" on QC flags, including if we think refinement to QC tests would improve coverage

In [None]:
# some function/thing in terms of how many stations "detected" the event

In [None]:
# summary information via markdown close out of what we have learned