# Heat Index Applications
This notebook walks through the [NOAA Heat Index](https://www.weather.gov/ama/heatindex) using climate data projections in the Analytics Engine. 

To execute a given 'cell' of this notebook, place the cursor in the cell and press the 'play' icon, or simply press shift+enter together. Some cells will take longer to run, and you will see a [$\ast$] to the left of the cell while AE is still working.

**Intended Application**: As a user, I want to **<span style="color:#FF0000">understand trends in Heat Index across my region</span>** by:
1. Calculating the number of hours per day throughout the year of high Heat Index values
2. Understanding the trend in nighttime temperatures that are above an 80°F Heat Index
3. Determining the historical and projected number of days with a high Heat Index per month

**Runtime**: With the default settings, this notebook takes approximately **16 minutes** to run from start to finish. Modifications to selections may increase the runtime. 

## Step 0: Set-up

First, we'll import the python library [climakitae](https://github.com/cal-adapt/climakitae), our AE toolkit for climate data analysis, along with this specific functions from that library that we'll use in this notebook, as well as any other necessary python libraries to aid in analysis.

In [1]:
import climakitae as ck
import climakitaegui as ckg
import pandas as pd
import panel as pn
pn.extension()

from climakitae.core.data_interface import get_data

from climakitae.util.utils import (
    read_csv_file, get_closest_gridcell, compute_multimodel_stats, 
    trendline, summary_table, convert_to_local_time
)

## Step 1: Select data

### 1a) Grab location of interest by latitude and longitude
First we'll grab specific locations of interest, by using the latitude and longitudeconvert_to_local_timeions of a weather station, and provide code to input a custom lat-lon location. Furthermore, we will **not** be retrieving the actual station data that is bias-corrected to that station for this example. At present bias-corrected station data on the Analytics Engine only provides air temperature as a variable, and for Heat Index we must also have either dew point temperature (coming soon!) or relative humidity. So for the time being, we will retrieve **non-bias corrected** data at the location of interest.

Note: For demonstration purposes we are selecting only ten years of data. We recomend using a longer period (at least 30 years) for conducting a scientific analysis.

In [2]:
# select historical data
heatidx_hist_hour = get_data(
    variable="NOAA Heat Index",
    resolution = "9 km",
    timescale = "hourly",
    data_type = "Gridded",
    area_subset = "CA Electric Load Serving Entities (IOU & POU)",
    cached_area = ["Pacific Gas & Electric Company"],
    time_slice = (1981, 1990) # Short period for quick demonstration
)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! Returned data array is huge. Operations could take 10x to infinity longer than 1GB of data !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

-------
You have retrieved data for more than one SSP, but not all ensemble members for each GCM are available for all SSPs.

As a result, some scenario and simulation combinations may contain NaN values.

If you want to remove these empty simulations, it is recommended to first subset the data object by each individual scenario and then dropping NaN values.


In [None]:
#'location_subset': ['entire domain']
#'location_subset': ['coordinate selection']

We will first look at the Fresno Airport weather station.

In [None]:
from climakitae.core.paths import stations_csv_path
wx_stns = read_csv_file(stations_csv_path, index_col=[0])
wx_stns.head(5)

In [None]:
station_name = 'Fresno Yosemite International Airport (KFAT)'
one_stn = wx_stns.loc[wx_stns['station'] == station_name]

stn_lat = one_stn.LAT_Y.values[0]
stn_lon = one_stn.LON_X.values[0]
print(stn_lat, stn_lon)

If you would like to provide your own latitude and longitude coordinates, you can also customize the cell below and pass your own values. However, if your location is outside of the default cached area (for example, we're looking at the PG&E service territory), you'll also need to reset the `selections.cached_area` to one that is more appropriate. You can check which options are available in the "Subset the data by" and "Location selection" dropdown menus in `selections.show()`.

In [None]:
# stn_lat = YOUR_LAT_HERE
# stn_lon = YOUR_LON_HERE
# selections.cached_area = ["YOUR CHOICE HERE"] # if different cached area from default

Becasuse the dynamically downscaled WRF data in the Cal-Adapt: Analytics Engine is in UTC time, we'll convert to the timezome of the station we've selected. This is particularly important for determining the timing of the daily maximum and minimum temperatures. For a station located in Pacific Time (US), UTC time places the daily minimum "in" the day prior because UTC is 8 hours ahead of Pacific! The handy `convert_to_local_time` function corrects for this, and ensures that the resulting high and low temperatures are within the same daily timestamp. 

In [4]:
heatidx_hist_hour = convert_to_local_time(heatidx_hist_hour, (1981,1990))

TypeError: Boundaries.__init__() missing 1 required positional argument: 'boundary_catalog'

Now, we'll use the latitude and longitude values to retrieve the historical data at that gridcell. 

In [None]:
heatidx_hist_hour = get_closest_gridcell(heatidx_hist_hour, stn_lat, stn_lon)

Last, we load the data into memory. Because we are loading and computing on the entire dataset, this next code cell will take about **8 minutes**, hang tight! You will also see two warnings - don't worry! These are because of "no data" outside the spatial extent selected. 

In [None]:
heatidx_hist_hour = ck.load(heatidx_hist_hour)