# Extreme Heat Day and Warm Night Likelihood

This notebook briefly walks through how to calculate the extreme heat exposure metric `% of change in extreme heat day and warm night event likelihood` from Cal-Adapt: Analytics Engine data. This notebook may be expanded upon for inclusion in cae-notebooks in the future. 

### Step 0: Import libraries

In [30]:
import climakitae as ck
import pandas as pd
import numpy as np
import geopandas as gpd

from xclim.indices import warm_night_frequency, hot_spell_frequency # extreme heat day and warm night function
from climakitae.util.utils import convert_to_local_time

## AWS CREDENTIALS ARE NECESSARY WHEN WORKING THROUGH AE HUB FOR CRI WORK

### Step 1: Retrieve data

For our purposes, we will need to retrieve the 3km spatial resolution data. In the panel that comes up with `selections.show()` make the following selections:
- Model grid spacing: 3km
- Timescale: hourly
- Variable: Air temperature
- Historical Data: Historical Climate
- Projections Data: SSP3-7.0

In [10]:
# selections = ck.Select()

# selections.area_average = 'No'
# selections.timescale = 'hourly'
# selections.variable = 'Air Temperature at 2m'
# selections.area_subset = 'states'
# selections.cached_area = ['CA']
# selections.scenario_historical = ['Historical Climate']
# selections.scenario_ssp = ['SSP3-7.0 -- Business as Usual']
# selections.time_slice = (1980, 1982)
# selections.resolution = '3 km'
# selections.units = 'degC'

ds = selections.retrieve()
# ds

Data transformation: PULL STEP


### Step 2: Subset data

In [28]:
def extreme_heat_ae_data_process(ds, varname):
    '''
    Reduces the size of the initial hourly raw temperature data in order to streamline compute time.
    Transforms the raw data into the following baseline metrics:
    * Warm night frequency
    * Extreme heat day frequency
    
    Methods
    -------
    Raw data are subsetted for bias-corrected dynamically-downscaled models.
    Model-subsetted data have num. of dimensions reduced (dropping any unnecessary dimensions).
    Metric is aggregated using xclim.indices functionality corresponding to the varname.
    
    Parameters
    ----------
    ds: xarray.Dataset
        Input data.
    varname: string
        Final metric name.
        
    Script
    ------
    cri_extreme_heat.ipynb
    
    Note
    ----
    Because the climate projections data is on the order of 2.4 TB in size, intermediary
    processed files are not produced for each stage of the metric calculation. All processing
    occurs in a single complete run in the notebook listed above.
    '''
    
    # subset for bias-corrected models
    data_models = ['WRF_EC-Earth3_r1i1p1f1', 'WRF_MPI-ESM1-2-HR_r3i1p1f1','WRF_TaiESM1_r1i1p1f1', 'WRF_MIROC6_r1i1p1f1']
    ds = ds.sel(simulation = data_models)
    print("Data transformation: dynamically-downscaled climate data subsetted for a-priori bias-corrected models.") ## metadata transformation
    ds = ds.squeeze()
    print("Data transformation: drop all singleton dimensions (scenario).") ## metadata transformation
    
    if varname == "warm_night_freq":
        # warm_night_frequency requires daily minimum temperature
        tas_min = ds.resample(time='1D').min()
        # warm_night_frequency requires daily minimum temperature, threshold default is 22degC, freq annual
        ds_processed = warm_night_frequency(tas_min) # using all other default options
        print("Data transformation: daily minimum calculated from hourly data for input into xclim.indices.warm_night_frequency.")
        
    elif varname == "hot_spell_freq":
        # hot_spell_frequency requires daily maximum temperature
        tas_max = ds.resample(time='1D').max()
        # hot_spell_frequency requires daily max temperature, threshold default is 30degC, window 3 days, freq annual
        ds_processed = hot_spell_frequency(tas_max) # using all other default options
        print("Data transformation: daily maximum calculated from hourly data for input into xclim.indices.hot_spell_frequency.") # metadata transformation

    return ds_processed

In [29]:
processed_ds = extreme_heat_ae_data_process(ds, "warm_night_freq") # varname options are warm_night_freq, hot_spell_freq

Data transformation: dynamically-downscaled climate data subsetted for a-priori bias-corrected models.
Data transformation: drop all singleton dimensions (scenario).
Data transformation: daily minimum calculated from hourly data for input into xclim.indices.warm_night_frequency.


### Step 3: Reproject data to census tract projection

In [37]:
# read in CA census tiger file -- not working from s3 link, uploading manually to keep testing
# census_shp_dir = "s3://ca-climate-index/0_map_data/2021_tiger_census_tract/2021_ca_tract/"
census_shp_dir = "tl_2021_06_tract.shp"
ca_boundaries = gpd.read_file(census_shp_dir)

# # need to rename columns so we don't have any duplicates in the final geodatabase
column_names = ca_boundaries.columns
new_column_names = ["USCB_"+column for column in column_names if column != "geometry"]
ca_boundaries = ca_boundaries.rename(columns=dict(zip(column_names, new_column_names)))

In [None]:
def reproject_ae_data(ds, ca_boundaries, varname=VARNAME, additional_comments='Data not exported to s3 bucket.'):
    '''
    Given a shapefule with California Census Tracts:
    (1) Reproject the input data to the CRS of the California Census Tracts,
    (2) Clip to California Census Tracts,
    (3) Return reprojected dataset for metric calculation.
    
    Parameters
    ----------
    ds: xarray.Dataset
        Input data.
    ca_boundaries: shapefile
        CA census tract boundary shapefile.
    varname: string
        Final metric name.
    additional_comments: string
        Additional notes for processing.
        
    Script
    ------
    cri_extreme_heat.ipynb
    
    Note
    ----
    Because the climate projections data is on the order of 2.4 TB in size, intermediary
    processed files are not produced for each stage of the metric calculation. All processing
    occurs in a single complete run in the notebook listed above.
    '''
    
    # identify CRS of data
    orig_crs = ds.attrs.grid_mapping ## not right, just placeholder for now
    print(f"Original CRS of data for {varname}: {orig_crs}")
    # cehck current coordinate system of the census tract data
    print(f"CRS of Census Tracts Shapefile: {ca_boundaries.crs}")
    
    if (orig_crs==ca_boundaries.crs):   
        print(f"Do not need to reproject {varname} since it is already in the same projection as the Census Tracts Shapefile.")
    else:       
        gdf = gdf.to_crs(ca_boundaries.crs) ## This is the big step here
        print(f"{varname} reprojected from {orig_crs} to {gdf.crs} with geopandas to_crs() function.")

    print(f"Additional comments: {additional_comments}.") # eg, code rerun, bug fix, etc
    
    return reprojected_ds

In [None]:
reprojected_ds = reproject_ae_data(processed_ds, ca_boundaries, varname=VARNAME)

### Step 4: Calculate metric

In [40]:
def clip_ae_data_to_tracts(ds):
    '''
    Ultimately need to aggregate gridded data to a single value per census tract
    Input: gridded data (3km)
    Returns: csv file one value per census tract
    '''

In [None]:
# % of change in extreme heat day and warm night event likelihood
def extreme_heat_ae_data_metric_calc(ds, varname):
    '''
    

    Methods
    -------

    Parameters
    ----------
    ds: xarray.Dataset
        Input data.
    varname: string
        Final metric name.
        
    Script
    ------
    cri_extreme_heat.ipynb
    
    Note
    ----
    Because the climate projections data is on the order of 2.4 TB in size, intermediary
    processed files are not produced for each stage of the metric calculation. All processing
    occurs in a single complete run in the notebook listed above.
    '''
        
    # historical baseline period 1980-2010 or whatever
    
    print("Data transformation: ") # metadata transformation

    
    # future period (WL?)
    print("Data transformation: ") # metadata transformation

    
    # calculate % change of likelihood
    print("Data transformation: % change of likelihood calculated.") # metadata transformation
    
    
    ## EXPORT STAGE
    # export two files here per metric
    # some sort of helper file across AE notebooks would be good
    ### Idea is to copy all functions over from this notebook into a single script to run just the metadata pieces?
    
    

    
    # Done.

In [None]:
extreme_heat_ae_data_metric_calc(reprojected_ds, VARNAME)

In [None]:
## CLOSE DATA TO SAVE MEMORY
ds.close
processed_ds.close()
reprojected_ds.close()