# Heat Index Applications
This notebook walks through the [NOAA Heat Index](https://www.weather.gov/ama/heatindex) using climate data projections in the Analytics Engine. 

To execute a given 'cell' of this notebook, place the cursor in the cell and press the 'play' icon, or simply press shift+enter together. Some cells will take longer to run, and you will see a [$\ast$] to the left of the cell while AE is still working.

**Intended Application**: As a user, I want to **<span style="color:#FF0000">understand trends in Heat Index across my region</span>** by:
1. Calculating the number of hours per day throughout the year of high Heat Index values
2. Understanding the trend in nighttime temperatures that are above an 80°F Heat Index
3. Determining the historical and projected summer trends in extreme heat

## Step 0: Set-up

First, we'll import the python library [climakitae](https://github.com/cal-adapt/climakitae), our AE toolkit for climate data analysis, along with this specific functions from that library that we'll use in this notebook, as well as any other necessary python libraries to aid in analysis.

In [None]:
import climakitae as ck
import pandas as pd

from climakitae.util.utils import read_csv_file, get_closest_gridcell, compute_multimodel_stats, trendline, summary_table

## Step 1: Select data

### 1a) Grab location of interest by latitude and longitude
First we'll grab specific locations of interest, by using the latitude and longitude locations of a weather station, and provide code to input a custom lat-lon location. Furthermore, we will **not** be retrieving the actual station data that is bias-corrected to that station for this example. At present bias-corrected station data on the Analytics Engine only provides air temperature as a variable, and for Heat Index we must also have either dew point temperature (coming soon!) or relative humidity. So for the time being, we will retrieve **non-bias corrected** data at the location of interest.

In [None]:
# select historical data
selections = ck.Select()

selections.data_type = 'Gridded'
selections.area_subset = 'CA Electric Load Serving Entities (IOU & POU)'
selections.cached_area = ['Pacific Gas & Electric Company']
selections.timescale = 'hourly'
selections.variable_type = 'Derived Index'
selections.variable = 'NOAA Heat Index'
selections.resolution = '9 km'
selections.time_slice = (1981, 2010)

We will first look at the Fresno Airport weather station.

In [None]:
from climakitae.core.paths import stations_csv_path
wx_stns = read_csv_file(stations_csv_path, index_col=[0])
wx_stns.head(5)

In [None]:
station_name = 'Fresno Yosemite International Airport (KFAT)'
one_stn = wx_stns.loc[wx_stns['station'] == station_name]

stn_lat = one_stn.LAT_Y.values[0]
stn_lon = one_stn.LON_X.values[0]
print(stn_lat, stn_lon)

If you would like to provide your own latitude and longitude coordinates, you can also customize the cell below and pass your own values. However, if your location is outside of the default cached area (for example, we're looking at the PG&E service territory), you'll also need to reset the `selections.cached_area` to one that is more appropriate. You can check which options are available in the "Subset the data by" and "Location selection" dropdown menus in `selections.show()`.

In [None]:
# stn_lat = YOUR_LAT_HERE
# stn_lon = YOUR_LON_HERE
# selections.cached_area = ["YOUR CHOICE HERE"] # if different cached area from default

Next, we'll use the latitude and longitude values to retrieve the historical data at that gridcell. 

In [None]:
heatidx_hist_hour = selections.retrieve()

In [None]:
heatidx_hist_hour = get_closest_gridcell(heatidx_hist_hour, stn_lat, stn_lon)

Next, load the data into memory. Because we are loading the entire dataset, this will take about 5 minutes, hang tight!

In [None]:
heatidx_hist_hour = ck.load(heatidx_hist_hour)

### 1b) Retrieve the projected Heat Index data
Next we will repeat the same data retrieval as we did for the historical data so we can compare. Like the historical data, retrieving the hourly projections data will also take a few minutes - hang tight!

In [None]:
# select future data
selections.data_type = 'Gridded'
selections.area_subset = 'CA Electric Load Serving Entities (IOU & POU)'
selections.cached_area = ['Pacific Gas & Electric Company']
selections.timescale = 'hourly'
selections.variable_type='Derived Index'
selections.variable='NOAA Heat Index'
selections.resolution = '9 km'
selections.scenario_historical = []
selections.scenario_ssp = ['SSP 3-7.0 -- Business as Usual']
selections.time_slice = (2040, 2070) # mid-century

In [None]:
heatidx_proj_hour = selections.retrieve()
heatidx_proj_hour = get_closest_gridcell(heatidx_proj_hour, stn_lat, stn_lon)

Now, we will load in the hourly heat index projections. Because we are loading the entire dataset, this will take a few minutes. Hang tight!

In [None]:
heatidx_proj_hour = ck.load(heatidx_proj_hour)

## Step 2: Calculate the number of hours throughout the year above a threshold

### 2a) Sum the hours per day

Let's next determine what part of the day is above a Heat Index threshold, as well as how many hours in each day are above our designated threshold. The NOAA Heat Index  "kicks in" once the [NOAA Heat Index](https://www.noaa.gov/sites/default/files/2022-05/heatindex_chart_rh.pdf) value is above 80°F: prolonged exposure to a heat index above 80°F becomes dangerous to many people and especially vulnerable communities. The Occupational Safety and Health Administration (OSHA) uses the Heat Index to determine the risk of heat-related illness and protections for outdoor workers. Below are the [specific thresholds used by OSHA](https://www.nalc.org/workplace-issues/body/OSHA-Using-the-Heat-Index-A-Guide-for-Employers.pdf). We note that these values are slightly different than those used by the [NOAA Heat Index](https://www.noaa.gov/sites/default/files/2022-05/heatindex_chart_rh.pdf) system. We **strongly recommend** looking at multiple thresholds to understand Heat Index trends. 

| Classification | Heat Index |
|----------------|------------|
| Caution | <91°F |
| Moderate | 91 - 103°F |
| High | 103 - 115°F |
| Very High to Extreme | 115+°F |

We'll start with 80°F as our default threshold.

In [None]:
hi_threshold = 80

In [None]:
# counts the number of hours in each day above the heat index threshold
num_heatidx_histhours = (heatidx_hist_hour >= hi_threshold).resample(time='1D').sum()
num_heatidx_histhours.name = 'Hours per day above Heat Index threshold of {}°F'.format(hi_threshold)

Let's identify one year to visualize the cycle in Heat Index temperatures. We are selecting 2000 here, but the commented out line of code below illustrates how to look at all of the data here. Just comment out the first line under visualize by adding a `#` symbol, and uncommenting the line below by removing the `#` symbol.

In [None]:
data_one_year = num_heatidx_histhours.sel(time="2000")

# visualize
data_one_year.hvplot.line(x='time', by='simulation') # a specific year
# num_heatidx_histhours.hvplot.line(x='time', by='simulation') # all years

Heat Index values above our selected threshold (default is 80°F) begin to pick up in March for Fresno, and tail off in November. While high Heat Index values are critical to know during the summer months, we should be aware of high Heat Index values throughout the entire year as well. 

### 2b) Sum the total hours per year
It may also be useful to know how many hours in the entire year above the threshold to see the trends over time. We'll calculate this next.

In [None]:
# sum per year
num_heatidx_hist_hours_per_year = num_heatidx_histhours.groupby('time.year').sum('time')
num_heatidx_hist_hours_per_year.name = 'Hours per year above Heat Index threshold of {}°F'.format(hi_threshold)

In [None]:
# visualize
num_heatidx_hist_hours_per_year.hvplot.line(x='year', by='simulation')

For context, there are 8760 hours in a year. Let's also look at the future data to understand the projected trends in the number of high Heat Index hours per year. 

In the following cells, we'll do all of the computation above in a single go, to condense the number of cells to run. You don't need to modify anything, unless you made changes above. 

In [None]:
# counts the number of hours in each day above the heat index threshold
num_heatidx_projhours = (heatidx_proj_hour >= hi_threshold).resample(time='1D').sum()
num_heatidx_projhours.name = 'Hours per day above Heat Index threshold of {}°F'.format(hi_threshold)

# sum per year
num_heatidx_proj_hours_per_year = num_heatidx_projhours.groupby('time.year').sum('time')
num_heatidx_proj_hours_per_year.name = 'Hours per year above Heat Index threshold of {}°F'.format(hi_threshold)

In [None]:
# visualize
num_heatidx_proj_hours_per_year.hvplot.line(x='year', by='simulation')

### 2c) Export counts of hours per year
First we'll calculate summary statistics for our data object with a handy function `compute_multimodel_stats`, which will provide the min, max, mean, and median. We'll also use a helper function `summary_table` which will transform our data objects into an easy to read dataframe table.

In [None]:
# calculate statistics
num_heatidx_hist_hours_per_year = compute_multimodel_stats(num_heatidx_hist_hours_per_year)

# table format with summary stats
df_to_export = summary_table(num_heatidx_hist_hours_per_year)
df_to_export.head(5) # see first 5 rows

In [None]:
fn = 'num_heatidx_hours_per_year_{}'.format(station_name.replace(" ", "_").replace("(", "").replace(")", ""))
df_to_export.to_csv('{}.csv'.format(fn))

## Step 3: Understand trends in nighttime temperatures
Next we'll look specifically at nighttime temperatures in order to assess when it may be too hot outside for worker safety and for assets to cool down. 

### 3a) Subset for nighttime hours
First, let's subset our hourly Heat Index data specifically for the nighttime hours. We will use 8pm-6am as "nighttime", but you can modify based on your needs as well. **Note:** The WRF data on the Cal-Adapt: Analytics Engine is in UTC time, and not in the local timezone. Timezone conversion functionality is currently in development and this notebook will be updated once available. For reference, California is -8 UTC (or -7 UTC when in Daylight Savings).

In [None]:
# help for modifications 
# 8pm Pacific is 3am UTC (0300 UTC), which is index 2
# 6am Pacific is 1pm UTC (1300 UTC), which is index 12

night_subset = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] # currently in UTC time
heatidx_hist_nighthours = heatidx_hist_hour.isel(time=heatidx_hist_hour.time.dt.hour.isin(night_subset))

### 3b) Sum the number of nights above a threshold of 80°F per day and per year
Like what we did above in Step 2a and 2b, we'll sum the nighttime Heat Index values for analysis. We'll use the same threshold as above, 80°F. But you can modify easily by setting this to any value of interest.

In [None]:
# sum per day
num_heatidx_hist_nighthours = (heatidx_hist_nighthours >= hi_threshold).resample(time='1D').sum()
num_heatidx_hist_nighthours.name = 'Nighttime hours per day above Heat Index threshold of {}°F'.format(hi_threshold)

In [None]:
# visualize
num_heatidx_hist_nighthours.hvplot.line(x='time', by='simulation')

In [None]:
# sum per year
num_heatidx_hist_nighthours_per_year = num_heatidx_hist_nighthours.groupby('time.year').sum('time')
num_heatidx_hist_nighthours_per_year.name = 'Nighttime hours per year above Heat Index threshold of {}°F'.format(hi_threshold)

In [None]:
# visualize
num_heatidx_hist_nighthours_per_year.hvplot.line(x='year', by='simulation')

We'll calculate the same steps for the projected future data, but won't visualize, just in case you need this information.

In [None]:
# subset for nighttime hours
heatidx_proj_nighthours = heatidx_proj_hour.isel(time=heatidx_proj_hour.time.dt.hour.isin(night_subset))

# sum per day
num_heatidx_proj_nighthours = (heatidx_proj_nighthours >= hi_threshold).resample(time='1D').sum()
num_heatidx_proj_nighthours.name = 'Nighttime hours per day above Heat Index threshold of {}°F'.format(hi_threshold)

# sum per year
num_heatidx_proj_nighthours_per_year = num_heatidx_proj_nighthours.groupby('time.year').sum('time')
num_heatidx_proj_nighthours_per_year.name = 'Nighttime hours per year above Heat Index threshold of {}°F'.format(hi_threshold)

### 3c) Export counts of nighttime temps per year

Similar to Step 2c, we'll export next. 

In [None]:
# calculate summary stats
num_heatidx_hist_nighthours_per_year = compute_multimodel_stats(num_heatidx_hist_nighthours_per_year)

# table format with summary stats
df_to_export = summary_table(num_heatidx_hist_nighthours_per_year)
df_to_export.head(5) # see first 5 rows

In [None]:
fn = 'num_heatidx_nighthours_per_year_{}'.format(station_name.replace(" ", "_").replace("(", "").replace(")", ""))
df_to_export.to_csv('{}.csv'.format(fn))

## Step 4: Summarize the long term trends

### 4a) Calculate the number of days in each year above a Heat Index threshold

Using the [OSHA thresholds we noted above](https://www.nalc.org/workplace-issues/body/OSHA-Using-the-Heat-Index-A-Guide-for-Employers.pdf), we'll now look at a threshold of 91°F. Again, we **strongly recommend** looking at multiple thresholds to understand Heat Index trends. We **strongly recommend** looking at multiple thresholds to understand Heat Index trends. 

In [None]:
hi_threshold = 91 # degF

If you would like to look at a specific month or season, uncomment the next cell below. We are going to look at the entire year, and leave this cell commented out. You can also skip the next cell entirely if you want to retain information throughout the entire year. 

In [None]:
# month_subset = [5, 6, 7, 8, 9] # May, June, July, August, September
# heatidx_hist_hour = heatidx_hist_hour.isel(time = heatidx_hist_hour.time.dt.month.isin(month_subset)) # historical
# heatidx_proj_hour = heatidx_proj_hour.isel(time = heatidx_proj_hour.time.dt.month.isin(month_subset)) # future

From the hourly Heat Index data, we'll now calculate the daily maximum heat index value. It is important to note that we will calculate the daily max heat index from the hourly heat index data, rather than taking the daily max air temperature and the average relative humidity, which artificially inflates the daily heat index value. 

If the daily median heat index is more relevant to your needs, we also provide the option in the cell below to calculate this instead. 

In [None]:
# historical
heatidx_hist_day = heatidx_hist_hour.resample(time='1D').max() # daily max
# heatidx_hist_day = heatidx_hist_hour.resample(time='1D').median() # daily median

# future
heatidx_proj_day = heatidx_proj_hour.resample(time='1D').max() # daily max
# heatidx_proj_day = heatidx_proj_hour.resample(time='1D').median() # daily median

Let's visualize the historical trends next. 

In [None]:
ck.view(heatidx_hist_day)

Compare the median historical and projected Heat Index values.

In [None]:
# what is the projected change in the median daily max heat index value?
hi_hist = heatidx_hist_day.median().values
hi_proj = heatidx_proj_day.median().values
print('Historical median Heat Index: {:.2f}'.format(hi_hist))
print('Projected median Heat Index: {:.2f}'.format(hi_proj))
print('The projected change in the median Heat Index value from historical is: {:.2f}°F'.
      format(hi_proj - hi_hist))

### 4b) Calculate the number of days above the threshold

In [None]:
# calculate number of days above threshold
num_heatidx_histdays = (heatidx_hist_day >= hi_threshold).groupby('time.year').sum('time')
num_heatidx_histdays.name = 'Days above Heat Index threshold of {}°F'.format(hi_threshold)

# calculate summary statistics
num_heatidx_histdays_stats = compute_multimodel_stats(num_heatidx_histdays)

Let's visualize the trends:

In [None]:
num_heatidx_histdays.hvplot.line(x='year', by='simulation', title='') *\
trendline(num_heatidx_histdays_stats, kind='median').hvplot.line(x='year', color='black', line_dash='dashed', label='trendline')

Note, if you've modified the threshold to be a higher value, for example 105°F, if the per-model trends and the median trend return flat lines, this indicates that for this location, the Heat Index value has not exceeded the modified threshold in the modeled data.

We'll now repeat the process for the projected data.

In [None]:
# calculate number of days above the threshold
num_heatidx_projdays = (heatidx_proj_day >= hi_threshold).groupby('time.year').sum('time')
num_heatidx_projdays.name = 'Days above Heat Index threshold of {}°F'.format(hi_threshold)

# calculate statistics
num_heatidx_projdays_stats = compute_multimodel_stats(num_heatidx_projdays)

In [None]:
# visualize results
num_heatidx_projdays.hvplot.line(x='year', by='simulation', title='') *\
trendline(num_heatidx_projdays_stats, kind='median').hvplot.line(x='year', color='black', line_dash='dashed', label='trendline')

### 4c) Calculate the number of Heat Index days per month
Alternatively, let's determine the number of days per month with Heat Index values above the threshold, as this may be useful for seasonal information. We'll be using the daily maximum Heat Index data.

In [None]:
num_heatidx_histmonths = (heatidx_hist_day >= hi_threshold).resample(time='1M').sum()
num_heatidx_histmonths.name = 'Days per month above Heat Index threshold of {}°F'.format(hi_threshold)

In [None]:
num_heatidx_histmonths.hvplot.line(x='time', by='simulation')

### 4d) Export summer counts of days per year

Like steps 2c and 3c, we'll calculate the min, max, mean, and median trends and format a table for easy export. 

In [None]:
# calculate summary stats
num_heatidx_histdays_stats = compute_multimodel_stats(num_heatidx_histdays_stats)

# table format with summary stats
df_to_export = summary_table(num_heatidx_histdays_stats)
df_to_export.head(5) # see first 5 rows

In [None]:
fn = 'num_heatidx_histdays_{}'.format(station_name.replace(" ", "_").replace("(", "").replace(")", ""))
df_to_export.to_csv('{}.csv'.format(fn))

**Note**: Any of the data variables that we've calculated throughout this notebook can be exported! 