# Cal-CRAI Index: Climate Hazard Calculations

**Order of operations**:
1) Metric handling \
   1a - Retrieve data \
   1b - Min-max standardization \
   1c - Set hazard risk orientation (positive for when a larger value represents greater vulnerability, negative for when a larger value corresponds to decreased vulnerability)

2) Calculate indicators \
   2a - Isolate exposure and loss columns for all climate risk scenarios \
   2b - Isolate exposure and loss for each individual climate risk scenarios \
   2c - Merge the all climate risk indicator columns with the individual climate risk indicators columns
   
3) Calculate hazard score \
   3a - Exposure * Loss columns \
   3b - Outlier Handling
   
4) Mask out inland counties for Sea Level Rise (SLR) Hazard Column \
   4a - Merge with SLR masking data \
   4b - Any tract not 'SLR impacted' is changed to NaN
   
5) Finalize Hazard Score

6) Visualize, save, and export Climate Hazard Score dataframe

In [15]:
import pandas as pd
import os
import sys
import numpy as np

# suppress pandas purely educational warnings
from warnings import simplefilter
simplefilter(action="ignore", category=pd.errors.PerformanceWarning)

sys.path.append(os.path.expanduser('../../'))
from scripts.utils.file_helpers import pull_csv_from_directory, upload_csv_aws, delete_items
from scripts.utils.cal_crai_plotting import plot_hazard_score, plot_region_domain # type: ignore
from scripts.utils.cal_crai_calculations import (handle_outliers, min_max_standardize, process_domain_csv_files,  # type: ignore
                                        indicator_dicts, add_census_tracts, domain_summary_stats, compute_summed_climate_indicators)

## Metric Handling
### 1a) Retrieve metric files and process

In [None]:
# set-up
bucket_name = 'ca-climate-index'
aws_dir = '3_fair_data/index_data/'

pull_csv_from_directory(bucket_name, aws_dir, output_folder='aws_csvs', search_zipped=False, print_name=False)

Process and merge climate hazard metric files together

In [None]:
# domain-specific
domain_prefix = 'climate_'
input_folder = r'aws_csvs'
output_folder = domain_prefix + "folder"
meta_csv = r'../utils/calcrai_metrics.csv'
merged_output_file = f'concatenate_{domain_prefix}metrics.csv'

metric_vulnerable_resilient_dict = process_domain_csv_files(domain_prefix, input_folder, output_folder, meta_csv, merged_output_file)

Now, take a look at the merged singluar csv file

In [None]:
# read-in and view processed data
pd.set_option('display.max_columns', None)
cleaned_climate_df = pd.read_csv(merged_output_file)
cleaned_climate_df

Take a look at the resulting dictionary: We will use this later to refactor certain metrics!

In [None]:
metric_vulnerable_resilient_dict

### 1b) Min-max standardization
Metrics are min-max standardized on 0.01 to 0.99 scale.

In [None]:
# standardizing our df
columns_to_process = [col for col in cleaned_climate_df.columns if col != 'GEOID']
min_max_metrics = min_max_standardize(cleaned_climate_df, columns_to_process)
min_max_metrics

Isolate for GEOID and standardized columns exclusively

In [None]:
words = ['GEOID','_standardized']
selected_columns = []
for word in words:
    selected_columns.extend(min_max_metrics.columns[min_max_metrics.columns.str.endswith(word)].tolist())
min_max_standardized_climate_metrics_df = min_max_metrics[selected_columns]
min_max_standardized_climate_metrics_df.head()

### 1c) Set hazard risk orientation
* High values indicate resiliency to a climate hazard
* Low values indicate vulnerablility to a climate hazard

For the climate domain, all metrics represent a communities vulnerablity to climate hazards rather than resilience. For example, 'median_heat_warning_days' represents a communities vulnerability to extreme heat. For this metric, the higher the number, the more vulnerable. So we identify these 'vulnerable' metrics (in this case all climate metrics) with our `metric_vulnerable_resilient_dict` dictionary and subtract their values from 1 so all high values indicate resiliency

In [None]:
# Access the vulnerable column names from the dictionary
vulnerable_columns = metric_vulnerable_resilient_dict['vulnerable']

# Identify columns in the DataFrame that contain any of the vulnerable column names as substrings
vulnerable_columns_in_df = [col for col in min_max_standardized_climate_metrics_df.columns 
                           if any(resilient_col in col for resilient_col in vulnerable_columns)]

# Create a new DataFrame with the adjusted vulnerable columns
adjusted_vulnerable_df = min_max_standardized_climate_metrics_df.copy()

# Subtract the standardized vulnerable columns from one and store the result in the new DataFrame
adjusted_vulnerable_df.loc[:, vulnerable_columns_in_df] = (
    1 - adjusted_vulnerable_df.loc[:, vulnerable_columns_in_df]
)
adjusted_vulnerable_df.head()

## Step 2: Calculate Indicators
Loop to go through df columns and sum metrics that belong within an indicator based off of the metric to indicator dictionary

For the climate domain, metrics are split between 'exposure' and 'loss'

In [None]:
domain_prefix[:-1]

### 2a) Isolate exposure and loss columns for all climate risk scenarios

In [None]:
summed_indicators_climate_systems = compute_summed_climate_indicators(
    adjusted_vulnerable_df, 
    indicator_dicts(domain_prefix[:-1]), print_summary=True
)

# show resulting dataframe to highlight the indicator values
summed_indicators_climate_systems = summed_indicators_climate_systems.rename(columns={'exposure':'all_domain_exposure', 'loss':'all_domain_loss'})
summed_indicators_climate_systems

### 2b) Isolate exposure and loss for each individual climate risk scenarios
* create dictionary that separates metric columns by the five climate risks
* create another dictionary that further separates metric columns by exposure or loss
* data are then summed by climate risk and indicator type

In [25]:
standardized_climate_metrics = adjusted_vulnerable_df.copy()

# Remove '_min_max_standardized' suffix from column names
standardized_climate_metrics.columns = adjusted_vulnerable_df.columns.str.replace('_min_max_standardized', '', regex=False)

# Climate risk dictionary to group columns
climate_risk_mapping = {
    'drought': [
        'drought_coverage_percentage',
        'drought_crop_loss_acres',
        'drought_crop_loss_indemnity_amount',
        'change_in_drought_years',
        'percent_weeks_drought'
    ],
    'extreme_heat': [
        'mean_change_annual_heat_days',
        'mean_change_annual_warm_nights',
        'mean_change_cold_days',
        'heat_crop_loss_acres',
        'heat_crop_loss_indemnity_amount',
        'avg_age_adjust_heat_hospitalizations_per_10000',
        'median_heat_warning_days'
    ],
    'inland_flooding': [
        'floodplain_percentage',
        'avg_flood_insurance_payout_per_claim',
        'estimated_flood_crop_loss_cost',
        'precip_99percentile',
        'surface_runoff',
        'total_flood_fatalities',
        'median_flood_warning_days'
    ],
    'sea_level_rise': [
        'slr_vulnerable_building_content_cost',
        'building_exposed_slr_count',
        'slr_vulnerability_delta_percentage_change',
        'slr_vulnerable_wastewater_treatment_count',
        'rcp_4.5__50th_percent_change',
        'fire_stations_count_diff',
        'hospitals_count_diff',
        'police_stations_count_diff',
        'schools_count_diff'
    ],
    'wildfire': [
        'burn_area_m2',
        'change_ffwi_days',
        'average_damaged_destroyed_structures_wildfire',
        'average_annual_fatalities_wildfire',
        'median_red_flag_warning_days'
    ]
}

metric_to_indicator_climate_dict = {
                "exposure" :   ['drought_coverage_percentage',
                                'change_in_drought_years',
                                'percent_weeks_drought',
                                'precip_99percentile',
                                'surface_runoff',
                                'floodplain_percentage',
                                'median_flood_warning_days',
                                'mean_change_annual_heat_days',
                                'mean_change_annual_warm_nights',
                                'median_heat_warning_days',
                                'slr_vulnerability_delta_percentage_change',
                                'fire_stations_count_diff',
                                'police_stations_count_diff',
                                'schools_count_diff',
                                'hospitals_count_diff',
                                'slr_vulnerable_wastewater_treatment_count',
                                'building_exposed_slr_count',
                                'slr_vulnerable_building_content_cost',
                                'change_ffwi_days',
                                'median_red_flag_warning_days'
                ],
                "loss"  :  ['drought_crop_loss_acres',
                            'drought_crop_loss_indemnity_amount',
                            'avg_flood_insurance_payout_per_claim',
                            'estimated_flood_crop_loss_cost',
                            'total_flood_fatalities',
                            'mean_change_cold_days',
                            'heat_crop_loss_acres',
                            'heat_crop_loss_indemnity_amount',
                            'avg_age_adjust_heat_hospitalizations_per_10000',
                            'rcp_4.5__50th_percent_change',
                            'burn_area_m2',
                            'average_damaged_destroyed_structures_wildfire',
                            'average_annual_fatalities_wildfire'
]}

# Step 2: Group and sum the columns by climate risk and metric type
# Initialize an empty DataFrame to hold the summed data
climate_sums_df = pd.DataFrame()
climate_sums_df['GEOID'] = standardized_climate_metrics['GEOID']

# Loop over each climate risk and categorize by exposure/loss
for risk, columns in climate_risk_mapping.items():
    # Separate columns by 'exposure' and 'loss'
    exposure_columns = [col for col in columns if col in metric_to_indicator_climate_dict["exposure"]]
    loss_columns = [col for col in columns if col in metric_to_indicator_climate_dict["loss"]]
    
    # Sum the values for each category and add to the dataframe
    climate_sums_df[f'{risk}_exposure'] = standardized_climate_metrics[exposure_columns].sum(axis=1)
    climate_sums_df[f'{risk}_loss'] = standardized_climate_metrics[loss_columns].sum(axis=1)
    
for risk in climate_risk_mapping.keys():
    # Calculate product of exposure and loss for each climate risk
    # If loss indicator is zero, keep the exposure value instead of multiplying
    climate_sums_df[f'{risk}_hazard_score'] = np.where(
        climate_sums_df[f'{risk}_loss'] == 0,
        climate_sums_df[f'{risk}_exposure'],
        climate_sums_df[f'{risk}_exposure'] * climate_sums_df[f'{risk}_loss']
    )

In [None]:
climate_domain_exposure_loss = climate_sums_df.copy()

# Define the list of columns to exclude
exclude_columns = ['drought_hazard_score', 'extreme_heat_hazard_score', 
                   'inland_flooding_hazard_score', 'sea_level_rise_hazard_score', 
                   'wildfire_hazard_score']

# Drop these columns from the DataFrame
climate_domain_exposure_loss = climate_domain_exposure_loss.drop(columns=exclude_columns, errors='ignore')
climate_domain_exposure_loss.head()

### 2c) Merge the all climate risk indicator columns with the individual climate risk indicators columns

In [None]:
# Step 3: Merge the aggregated data back with the original `summed_indicators_climate_systems`
climate_exposure_loss_values = pd.merge(summed_indicators_climate_systems, climate_domain_exposure_loss, on='GEOID', how='left')
climate_exposure_loss_values

Save Indicator dataframe as a csv

In [28]:
# set-up file for export
indicator_filename = '{}domain_indicators.csv'.format(domain_prefix)
climate_exposure_loss_values.to_csv(indicator_filename, index=False)

## Step 3: Calculate Hazard Score

### 3a) Calculate the hazard score
Hazard score is: exposure * loss columns

In [None]:
climate_hazard_scores_scenarios = climate_sums_df.copy()
# Define the list of columns to exclude
keep_columns = ['GEOID', 'drought_hazard_score', 'extreme_heat_hazard_score', 
                   'inland_flooding_hazard_score', 'sea_level_rise_hazard_score', 
                   'wildfire_hazard_score']

# Drop these columns from the DataFrame
climate_hazard_scores_scenarios = climate_sums_df[keep_columns].copy()
climate_hazard_scores_scenarios

In [30]:
summed_indicators_climate_systems['hazard_score'] = summed_indicators_climate_systems['all_domain_exposure'] * summed_indicators_climate_systems['all_domain_loss']

In [None]:
climate_hazard_scores_cleaned = pd.merge(summed_indicators_climate_systems, climate_hazard_scores_scenarios, on='GEOID', how='left')
climate_hazard_scores_cleaned = climate_hazard_scores_cleaned.drop(columns={'all_domain_exposure', 'all_domain_loss'})
climate_hazard_scores_cleaned

### 3b) Outlier Handling
* set fencing for each hazard score at 25th and 75th percentiles
* reset values that exceed the fence to nearest fence value

In [None]:
climate_hazard_scores_outlier_handle = handle_outliers(climate_hazard_scores_cleaned, domain_prefix='climate', summary_stats=True)
climate_hazard_scores_outlier_handle

### 3c) Min-max standardize the product columns

In [None]:
columns_to_process = ['hazard_score'
                      ,'drought_hazard_score'
                      ,'extreme_heat_hazard_score'
                      ,'inland_flooding_hazard_score'
                      ,'sea_level_rise_hazard_score'
                      ,'wildfire_hazard_score']

min_max_domain = min_max_standardize(climate_hazard_scores_outlier_handle, columns_to_process)
min_max_domain

Isolate to census tract and product standardized columns
* add a zero at the beginning of the GEOID to match census tract that will be merged

In [None]:
keep_columns = ['GEOID', 
                'hazard_score_min_max_standardized'
                ,'drought_hazard_score_min_max_standardized'
                ,'extreme_heat_hazard_score_min_max_standardized'
                ,'inland_flooding_hazard_score_min_max_standardized'
                ,'sea_level_rise_hazard_score_min_max_standardized'
                ,'wildfire_hazard_score_min_max_standardized'
]

climate_hazard_scores = min_max_domain[keep_columns].copy()

# Rename columns by removing '_min_max_standardized' suffix
climate_hazard_scores.columns = climate_hazard_scores.columns.str.replace('_min_max_standardized', '', regex=False)
climate_hazard_scores

## Step 4) Mask out inland counties for Sea Level Rise (SLR) Hazard Column

In [None]:
slr_mask_data = '../utils/slr_mask_layer.csv'
slr_mask = pd.read_csv(slr_mask_data)
slr_mask = slr_mask.drop(columns={'county', 'geometry', 'COUNTYFP'})
slr_mask.head()

### 4a) Merge with SLR masking data

In [None]:
climate_hazard_scores['GEOID'] = climate_hazard_scores['GEOID'].astype(str)
slr_mask['GEOID'] = slr_mask['GEOID'].astype(str)

climate_hazard_scores_slr_masked = pd.merge(climate_hazard_scores, slr_mask, on='GEOID', how='left')
climate_hazard_scores_slr_masked

### 4b) Any tract not 'SLR impacted' is changed to NaN

In [None]:
climate_hazard_scores_slr_masked.loc[climate_hazard_scores_slr_masked['slr_impacted'] == 0, 'sea_level_rise_hazard_score'] = np.nan
climate_hazard_scores_slr_masked = climate_hazard_scores_slr_masked.drop(columns='slr_impacted')
climate_hazard_scores_slr_masked.head()

## Step 5) Finalize Hazard Score
* Add beginning 0's to GEOID column

In [38]:
climate_hazard_scores_final = climate_hazard_scores_slr_masked.copy()

# GEOID handling
climate_hazard_scores_final['GEOID'] = climate_hazard_scores_final['GEOID'].apply(lambda x: '0' + str(x))
climate_hazard_scores_final['GEOID'] = climate_hazard_scores_final['GEOID'].astype(str).apply(lambda x: x.rstrip('0').rstrip('.') if '.' in x else x)

In [None]:
climate_hazard_scores_final

## Step 6) Visualize, save, and export Climate Hazard Score dataframe

Let's look at some summary statistics for this domain:

In [None]:
domain_summary_stats(climate_hazard_scores_final, 'hazard_score')

Map all of the climate risk scenarios hazard scores
* these are the denominators that go into each weighted scenario
* values will be subtracted from 1 to indicate high values are high hazard

In [None]:
# Copy the dataset
flipped_climate_scenarios = climate_hazard_scores_final.copy()

# List of climate domain columns to process
climate_domain_columns = [
    'hazard_score',
    'drought_hazard_score',
    'extreme_heat_hazard_score',
    'wildfire_hazard_score',
    'sea_level_rise_hazard_score',
    'inland_flooding_hazard_score'
]

# Process each column in the list
for column in climate_domain_columns:
    # Subtract 1 from the column values
    flipped_climate_scenarios[column] = 1 - flipped_climate_scenarios[column]
    
     # Get domain name for plotting
    if column == 'hazard_score':
        domain_name = 'All Climate Scenarios'
    else:
        domain_name = column.split('_hazard_score')[0]  # Extract everything before '_hazard_score'
        domain_name = domain_name.replace('_', ' ').title()
    
    # Call the plotting function
    plot_hazard_score(flipped_climate_scenarios, column_to_plot=column, domain=domain_name)

## Step 7) Export the final domain csv file

In [42]:
# set-up file for export
climate_hazard_scores_filename = 'climate_hazard_scores.csv'
climate_hazard_scores_final.to_csv(climate_hazard_scores_filename, index=False)

Upload the indicator and hazard score csv files to AWS

In [None]:
'''# upload to aws bucket
bucket_name = 'ca-climate-index'
directory = '3_fair_data/index_data'

files_upload = indicator_filename, climate_hazard_scores_filename

for file in files_upload:
    upload_csv_aws([file], bucket_name, directory)'''

## Delete desired csv files
* all that were generated from this notebook by default

In [None]:
folders_to_delete = ["aws_csvs", "climate_folder"]
csv_files_to_delete = ["concatenate_climate_metrics.csv", "climate_hazard_scores.csv",
                       "climate_domain_indicators.csv"]

delete_items(folders_to_delete, csv_files_to_delete)