# Heat Index Applications
This notebook walks through the [NOAA Heat Index](https://www.weather.gov/ama/heatindex) throughout an energy service territory using climate data projections in the Analytics Engine. 

To execute a given 'cell' of this notebook, place the cursor in the cell and press the 'play' icon, or simply press shift+enter together. Some cells will take longer to run, and you will see a [$\ast$] to the left of the cell while AE is still working.

**Intended Application**: As a user, I want to **<span style="color:#FF0000">understand summer trends in Heat Index across my region</span>** by:
1. Determining the historical and future trends of extreme heat
2. Understanding the trend in the number of days of high Heat Index values

## Step 0: Set-up

First, we'll import the python library [climakitae](https://github.com/cal-adapt/climakitae), our AE toolkit for climate data analysis, along with this specific functions from that library that we'll use in this notebook, as well as any other necessary python libraries to aid in analysis.

In [None]:
import climakitae as ck
import pandas as pd

from climakitae.util.utils import read_csv_file

## Step 1: Get data across service territory

#### 1a) Grab location of interest by latitude and longitude
First we'll grab specific locations of interest, by using the latitude and longitude locations of the weather stations (approx 7 weather stations) throughout the service territory, and provide code to input a custom lat-lon location. Furthermore, we will **not** be retrieving the actual station data that is bias-corrected to that station for this example. At present bias-corrected station data on the Analytics Engine only provides air temperature as a variable, and for Heat Index we must also have either dew point temperature (coming soon!) or relative humidity. So for the time being, we will retrieve **non-bias corrected** data at the location of interest.

In [None]:
# select data
selections = ck.Select()

selections.data_type = 'Gridded'
selections.timescale = 'hourly'
selections.variable_type='Derived Index'
selections.variable='NOAA Heat Index'
selections.resolution = '9 km'
selections.time_slice = (1981, 2010)

Will use Fresno as an example first. For example, the following stations are within PG&E area: Arcata, Red Bluff, Stockton, San Jose, Fresno, San Luis Obisbpo, Bakersfield.

In [None]:
from climakitae.core.paths import stations_csv_path
wx_stns = read_csv_file(stations_csv_path, index_col=[0])
wx_stns.head(5)

In [None]:
station_name = 'Fresno Yosemite International Airport (KFAT)'
one_stn = wx_stns.loc[wx_stns['station'] == station_name]

stn_lat = one_stn.LAT_Y.values[0]
stn_lon = one_stn.LON_X.values[0]
print(stn_lat, stn_lon)

If you would like to provide your own latitude and longitude coordinates, you can also customize the cell below and pass your own values. 

In [None]:
# stn_lat = YOUR_LAT_HERE
# stn_lon = YOUR_LON_HERE

Next, we'll use the latitude and longitude values to retrieve the historical data at that gridcell. 

In [None]:
selections.latitude = (stn_lat - 0.05, stn_lat + 0.05)
selections.longitude = (stn_lon - 0.05, stn_lon + 0.05)

# because we're retrieving a single grid cell, we also take the area average
selections.area_average = 'Yes'

In [None]:
heatidx_hist_hour = selections.retrieve()

Choose a set of months to subset the data by. As we are interested in high heat events, we'll grab the May through September months. You can customize which months you may want to look at by modifying the `month_subset` object in the next cell below. This also helps trim our data size down further, which will speed up future data loading.

In [None]:
month_subset = [5, 6, 7, 8, 9] # May, June, July, August, September
heatidx_hist_hour = heatidx_hist_hour.isel(time = heatidx_hist_hour.time.dt.month.isin(month_subset))

In [None]:
heatidx_hist_hour = ck.load(heatidx_hist_hour) # taking long time ~ 4 min, might wait until after establishing daily vals

This data object `heatidx_hist` now represents hourly data throughout the summer. 

#### 1b) Calculate the daily max heat index to establish climatological trends
From the hourly heat index data, we'll now calculate the daily maximum heat index value. It is important to note that we will calculate the daily max heat index from the hourly heat index data, rather than taking the daily max air temperature and the average relative humidity, which artificially inflates the daily heat index value. 

If the daily median heat index is more relevant to your needs, we also provide the option in the cell below to calculate this instead. 

In [None]:
heatidx_hist_day = heatidx_hist_hour.resample(time='1D').max() # daily max
# heatidx_hist_day = heatidx_hist_hour.resample(time='1D').median() # daily median

Let's visualize the historical trend: 

In [None]:
ck.view(heatidx_hist_day)

In the plot above, we visualize the daily max heat index values through the historical summer period. You'll note that in Fresno, there are daily values below 80°F. [Heat Index](https://www.weather.gov/ama/heatindex) typically only becomes "noticeable" to people above 80°F in terms of heat safety caution. We'll return to this in Step 2. 

#### 1c) Retrieve and calculate the projected trends
Next we will repeat the same data retrieval and daily max process as we did for the historical data so we can compare. 

In [None]:
selections.data_type = 'Gridded'
selections.timescale = 'hourly'
selections.variable_type='Derived Index'
selections.variable='NOAA Heat Index'
selections.scenario_historical = []
selections.scenario_ssp = ['SSP 3-7.0 -- Business as Usual']
selections.time_slice = (2040, 2070) # mid-century
selections.latitude = (stn_lat - 0.05, stn_lat + 0.05)
selections.longitude = (stn_lon - 0.05, stn_lon + 0.05)
selections.area_average = 'Yes'

In [None]:
heatidx_proj_hour = selections.retrieve()
heatidx_proj_hour = heatidx_proj_hour.isel(time = heatidx_proj_hour.time.dt.month.isin(month_subset))
heatidx_proj_hour = ck.load(heatidx_proj_hour) 
heatidx_proj_day = heatidx_proj_hour.resample(time='1D').max() # daily max

Now, let's visualize the projected Heat Index:

In [None]:
ck.view(heatidx_proj_day)

Compare the median historical and projected Heat Index values.

In [None]:
# what is the projected median heat index value?
hi_hist = heatidx_hist_day.median().values
hi_proj = heatidx_proj_day.median().values
print('Historical median HI: {:.2f}'.format(hi_hist))
print('Projected median HI: {:.2f}'.format(hi_proj))
print('The projected change in the median Heat Index value from historical is: {:.2f}°F'.
      format(hi_proj - hi_hist))

### Step 2: Calculate the number of days each year above a Heat Index threshold

As we noted above, the Heat Index only "kicks in" once the Heat Index value is above 80°F: prolonged exposure to a heat index above 80°F becomes dangerous to many people and especially vulnerable communities. We'll now calculate the number of days in each year (i.e., the summer period only) that are above a specific threshold. Because there are specific thresholds used in the Heat Index [classification](https://www.noaa.gov/sites/default/files/2022-05/heatindex_chart_rh.pdfhttps://www.noaa.gov/sites/default/files/2022-05/heatindex_chart_rh.pdf) system, we'll reproduce the NOAA guidance below and we **strongly recommend** looking at multiple thresholds to understand Heat Index trends. We will start with 80°F.

| Classification | Heat Index |
|----------------|------------|
| Caution | 80 - 90°F|
| Extreme Caution | 90 - 103°F |
| Danger | 103 - 124°F |
| Extreme Danger | 125+°F |

In [None]:
hi_threshold = 80 # degF

In [None]:
num_heatidx_histdays = (heatidx_hist_day >= hi_threshold).groupby('time.year').sum('time')

Before we visualize the historical trend, we'll calculate the multi-model median trend.

In [None]:
def trendline_median(data):
    '''Calculates trendline with the multi-model median'''
    data = data.sel(simulation="simulation median")
    m, b = data.polyfit(dim='year', deg=1).polyfit_coefficients.values
    trendline = m * data.year + b # y = mx + b
    trendline.name = 'trendline'
    return trendline

In [None]:
sim_med = (num_heatidx_histdays.median(dim='simulation').assign_coords({"simulation": "simulation median"}).expand_dims("simulation"))
hist_trend = trendline_median(sim_med)

Visualize the historical trend in the number of days of Heat Index values above our designated threshold

In [None]:
num_heatidx_histdays.hvplot.line(x='year', by='simulation', ylabel='# of days above {}°F Heat Index'.format(hi_threshold)) *\
hist_trend.hvplot.line(x='year', color='black', line_dash='dashed', label='trendline')

Note, if you've modified the threshold to be a higher value, for example 105°F, if the per-model trends and the median trend return flat lines, this indicates that for this location, the Heat Index value has not exceeded the modified threshold in the modeled data.

We'll now repeat the process for the projected data.

In [None]:
num_heatidx_projdays = (heatidx_proj_day >= hi_threshold).groupby('time.year').sum('time')

In [None]:
sim_med = (num_heatidx_projdays.median(dim='simulation').assign_coords({"simulation": "simulation median"}).expand_dims("simulation"))
proj_trend = trendline_median(sim_med)

In [None]:
num_heatidx_projdays.hvplot.line(x='year', by='simulation', ylabel='# of days above {}°F Heat Index'.format(hi_threshold)) *\
proj_trend.hvplot.line(x='year', color='black', line_dash='dashed', label='trendline')

### Step 3: Export
Export any variable of interest here for your needs.

In [None]:
fn = 'heat_index_{}'.format(station_name.replace(" ", "_"))

In [None]:
# example for daily/annual data
ck.export(heatidx_hist_day, fn, 'CSV')