# Billion Dollar Meteorological Disasters in the United States

## Introduction

**Business Context.** While natural *events* often cannot be avoided, the risks they present can be managed, either by mitigation, avoidance, or insurance, in order to prevent them from becoming natural *disasters*. The consultancy firm you work for has been hired by an independent advocacy group that wants to conduct an analysis of the US emergency management system of preparedness, protection, mitigation, response, and recovery, with the purpose of proposing legislative reforms to make it more effective and financially efficient. Their ultimate goal is to help increase the government's ability to prevent disasters from happening and reduce the negative impact of those that cannot be completely avoided.

**Business Problem.** The client would like to know which storm event types are more likely to become disasters, and in which locations, as measured by the number of deaths, injuries, and economic damage they cause. Additionally, they would like to conduct a preliminary assessment of whether the [Post-Katrina Emergency Management Reform Act of 2006](https://www.congress.gov/bill/109th-congress/senate-bill/3721) had any impact on the severity of the disasters that occurred after the bill was signed. This Act centralized the US emergency management under the coordination of the Federal Emergency Management Agency (FEMA) as a response to the enormous human and material losses that were caused by Hurricane Katrina in August 2005.

**Analytical Context.** The dataset is a compressed GZIP file of storm events from 1970 to 2020 as recorded by the US [National Oceanic and Atmospheric Administration](https://www.ncdc.noaa.gov/stormevents/ftp.jsp). You can check the [documentation](https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/Storm-Data-Bulk-csv-Format.pdf) for more information.

In [1]:
# Importing relevant libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import pingouin as pg
from folium.plugins import HeatMap
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
from scipy.stats import chi2_contingency

## Loading in the dataset

In [2]:
#adding the `parse_dates` argument to tell pandas which columns should be interpreted as dates.
df = pd.read_csv("data/dataset.csv.gz", parse_dates=["BEGIN_DATE_TIME", "END_DATE_TIME"])
df.head()

Unnamed: 0,EPISODE_ID,EVENT_ID,STATE,EVENT_TYPE,BEGIN_DATE_TIME,BEGIN_YEAR,CZ_TIMEZONE,END_DATE_TIME,TOR_F_SCALE,BEGIN_LOCATION,END_LOCATION,BEGIN_LAT,BEGIN_LON,END_LAT,END_LON,TOTAL_DEATHS,TOTAL_INJURIES,TOTAL_DAMAGE_DEFLATED
0,,9987739,COLORADO,hail,1983-07-22 16:40:00,1983.0,CST,1983-07-22 16:40:00,,,,39.72,-104.6,,,0,0,0.0
1,,9987740,COLORADO,hail,1983-07-22 16:45:00,1983.0,CST,1983-07-22 16:45:00,,,,39.73,-104.87,,,0,0,0.0
2,,9987741,COLORADO,hail,1983-07-22 16:45:00,1983.0,CST,1983-07-22 16:45:00,,,,39.73,-104.93,,,0,0,0.0
3,,9987735,COLORADO,hail,1983-07-22 16:20:00,1983.0,CST,1983-07-22 16:20:00,,,,39.73,-104.85,,,0,0,0.0
4,,9987736,COLORADO,hail,1983-07-22 16:25:00,1983.0,CST,1983-07-22 16:25:00,,,,39.72,-104.82,,,0,0,0.0


Here is a description of the imported columns:

1. **EPISODE_ID**: The storm episode ID. A single episode can contain multiple events
2. **EVENT_ID**: This is the ID of the actual storm as such. Several storms can be grouped into an episode
3. **STATE**: The state or region where the event occurred
4. **EVENT_TYPE**: The type of the event
5. **BEGIN_DATE_TIME**: The date and time when the event started. Times and dates are in LST (Local Solar Time), which means that they reflect the local time, not a coordinated time
6. **BEGIN_YEAR**: The year in which the event begun
7. **CZ_TIMEZONE**: The timezone of the place where the event occurred
8. **END_DATE_TIME**: The date and time when the event ended. Times and dates are in LST (Local Solar Time), which means that they reflect the local time, not a coordinated time
9. **TOR_F_SCALE**: The [enhanced Fujita scale](https://en.wikipedia.org/wiki/Enhanced_Fujita_scale) (highest recorded value). This scale measures the strength of a tornado based on the amount of damage that it caused. A level of `EF0` means "light damage" (wind speeds of 40 - 72 mph), and a level of `EF5` means "incredible damage" (261 - 318 mph). `EFU` means "Unknown"
10. **BEGIN_LOCATION**: The name of the city or village where the event started
11. **END_LOCATION**: The name of the city or village where the event ended
12. **BEGIN_LAT**: The latitude of the place where the event begun
13. **BEGIN_LON**: The longitude of the place where the event begun
14. **END_LAT**: The latitude of the place where the event ended
15. **END_LON**: The longitude of the place where the event ended
16. **TOTAL_DEATHS**: Deaths directly or indirectly attributable to the event
17. **TOTAL_INJURIES**: Injuries directly or indirectly attributable to the event
18.  **TOTAL_DAMAGE_DEFLATED**: Estimated damage to property and crops in dollars. These dollars are "real" dollars, which means that the damages for all the years have been converted ([deflated](https://faculty.fuqua.duke.edu/~rnau/Decision411_2007/411infla.htm)) to the value they would have had in 1982-84. This was done to make the damages comparable across years, since dollars [change purchasing power every year](https://www.insider.com/fast-food-burgers-cost-every-year-2018-9) due to inflation. The deflation was done using the Bureau of Labor Statistics Urban Consumer Price Index, whose base period is 1982-84.

There are 62 unique storm types, and 2,483,191 occurrences from 1970 to 2020 in the dataset analyzed. In a random sample of 20,000 storms from 1970 to 2020, roughly 49% of the storms recorded begin and end in the same location with 21 unique types of 68. These include thunderstorm wind, hail, floods, tornados, marine thunderstorm wind, flash floods, waterspouts, lightning, marine hail, funnel clouds, heavy rain, dust devil, marine strong wind, debris flow, and marine high wind among the most frequent. <br><br>Additionally, a rough estimate of 49% of storms does not end in the same location, about ~46 unique types of 68. It is interesting to note the number of unique storm types that do not end in the same location. However, whether the storms begin and end in the same location, thunderstorm-wind storm type has the highest frequency. We can also note that Texas has the highest number of storm occurrences. <br><br>
Overall, in the geo scatter map in 5.1, we can see that the Plains, Northeastern, Central, South, and Southeast regions typically experience a higher frequency of disasters.<br><br>
Note: I processed the analysis on a smaller random sample to speed up processing and prevent the kernel from consistently dying. Consequently, the visualization in 5.1 may not depict the precise number of data points on the map.<br><br>
A choropleth map may be another visualization tool when considering how to effectively find patterns related to the size of risk area around storm events. 

In [None]:
#Preparing data (this df is used in various exercise cells below)

#Creating a separate df for mapping 
begin_loc_map = pd.DataFrame(df)#.dropna(subset=['BEGIN_LAT', 'BEGIN_LON',]) # 'BEGIN_LOCATION', 'END_LOCATION'

#creating a smaller random sample to speed up processing 
begin_loc_map_sample = begin_loc_map.sample(n=20000)

#subsetting only necessary columns for better readability
begin_loc_map_sample = begin_loc_map_sample[['EVENT_ID', 'STATE', 'EVENT_TYPE', 'BEGIN_DATE_TIME',
       'BEGIN_YEAR', 'BEGIN_LOCATION', 'END_LOCATION', 'BEGIN_LAT', 'BEGIN_LON', 'END_LAT',
       'END_LON', 'TOTAL_DAMAGE_DEFLATED']]

#converting YEAR to Int to add to map hover text below
begin_loc_map_sample['year'] = begin_loc_map_sample['BEGIN_DATE_TIME'].dt.year.astype('Int64')

#bool to identify which events begin and end location coincide
begin_loc_map_sample['event_location_t_f'] = begin_loc_map_sample['BEGIN_LOCATION'] == begin_loc_map_sample['END_LOCATION']

#concactinating info columns to add to map hover text below
begin_loc_map_sample['text'] = begin_loc_map_sample['STATE'] + '<br>begin location: ' + begin_loc_map_sample['BEGIN_LOCATION'] + '<br>end location: ' + begin_loc_map_sample['END_LOCATION'] +'<br>type: ' + begin_loc_map_sample['EVENT_TYPE'] + '<br>' + begin_loc_map_sample['year'].astype(str)

#sample df of storm events which locations coincide to create Plotly trace1
sample_coincide = begin_loc_map_sample.copy()
sample_coincide = sample_coincide.loc[sample_coincide['event_location_t_f'] == True]

#sample df of storm events which locations DO NOT coincide to create Plotly trace2
sample_DN_coincide = begin_loc_map_sample.copy()
sample_DN_coincide = sample_DN_coincide.loc[sample_DN_coincide['event_location_t_f'] == False]

In [None]:
#Plotly scatter mapbox map of Storm Events whose BEGIN_LOCATION and END_LOCATION do not coincide 
#vs. those in which they do coincide. 

#generating Plotly Scatter Mapbox
fig = px.scatter_mapbox( 
                        lat=['37.09'], 
                        lon=['-95.71'], 
                        zoom=2.6, 
                        height=650,
                        mapbox_style='carto-darkmatter',
                        title = '<b>Storm locations from 1970 to 2020 <br> Sample size: 20,000</b> <br> (Toggle legend to show separate groups)'
                       )

#Trace1 layer on map from 'sample_coincide' df of storm events whose begin and end location coincide
fig.add_trace(go.Scattermapbox(lon=sample_coincide['BEGIN_LON'],
                               lat=sample_coincide['END_LAT'],
                               name='Coincide',
                               hovertemplate = sample_coincide['text'],
                               opacity= .4,
                               mode='markers',
                               marker=dict(
                                       size= 5,
                                       color = 'magenta',
                                       opacity = .8,),
                                    ))

#Trace2 layer on map from 'sample_DN_coincide' df of storm events whose begin and end location do not coincide
fig.add_trace(go.Scattermapbox(lon=sample_DN_coincide['BEGIN_LON'],
                               lat=sample_DN_coincide['END_LAT'],
                               name='Do Not Coincide',
                               hovertemplate = sample_DN_coincide['text'],
                               mode='markers',
                               marker=dict(
                                       size= 5,
                                       color = '#DFFF00',
                                       opacity = 1),
                                    ))

#defining legend
fig.update_layout(legend = dict(bordercolor='black',
                                borderwidth=22,
                                itemclick= 'toggleothers',
                                bgcolor='#000003',
                                font_size=12,
                                x=0.9,
                                y=0.9,
                                traceorder='normal',
                                font=dict(family='monospace',
                                          size=12,
                                          color='white',
                                         )
                               ),
                 legend_title='<b>Begin & End<br>Locations:</b><br>'
                 )

#updating title and map background parameters
fig.update_layout(title_x=0.45,
                  title_y=0.95,
                  font_color='white',
                  title_font_size = 18,
                  title_font_family='monospace',
                  #mapbox={'style': 'carto-darkmatter', 'center': {'lon':-95.71, 'lat' : 37.09}, 'zoom': 3}, # (optoin 2)
                  margin={'r':0,'t':0,'l':0,'b':0},
                 )

#updating margin and hoverlabels
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0},  # remove the white gutter between the frame and map
                    # hover appearance
                    hoverlabel=dict(bgcolor='white',     # white background
                                    font_size=12,        # label font size
                                    font_family='monospace') # label font
                 )

#updating map background parameters
fig.update_layout(
    mapbox_style="white-bg",
    mapbox_layers=[
        {
            "below": 'traces',
            "sourcetype": "raster",
            "sourceattribution": "United States Geological Survey",
            "source": [
                "https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
            ]
        },
        {
            "sourcetype": "raster",
            "sourceattribution": "Government of Canada",
            "source": ["https://geo.weather.gc.ca/geomet/?"
                       "SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&BBOX={bbox-epsg-3857}&CRS=EPSG:3857"
                       "&WIDTH=1000&HEIGHT=1000&LAYERS=RADAR_1KM_RDBR&TILED=true&FORMAT=image/png"],
        }
      ])

fig.show()

In [None]:
#Statistical summary for categorical variables on sample set

import plotly.figure_factory as ff

table_cat = ff.create_table(begin_loc_map_sample.describe(include=['O']).T, index=True, index_title='Categorical columns')
table_cat

In [None]:
sample_coincide.EVENT_TYPE.unique()

In [None]:
sample_DN_coincide.EVENT_TYPE.unique()

In [None]:
sample_coincide[['EVENT_TYPE']].describe()

In [None]:
sample_DN_coincide.EVENT_TYPE.describe()

In [None]:
# #Frequency table of event types on entire df
# df.EVENT_TYPE.value_counts()

In [None]:
#Frequency table of event types on sample set
begin_loc_map_sample.EVENT_TYPE.value_counts()

In [None]:
# #Frequency table of BEGIN_LOCATION on entire df
# df.BEGIN_LOCATION.value_counts()

In [None]:
#Frequency table of BEGIN_LOCATION on sample set
begin_loc_map_sample.BEGIN_LOCATION.value_counts()

In [None]:
#Contingency table to validate the number of storms per month by type
crosstab = pd.crosstab(index=begin_loc_map_sample['EVENT_TYPE'], columns=begin_loc_map_sample['BEGIN_LOCATION'])
crosstab

In [None]:
chi2 = chi2_contingency(crosstab)
chi2
print('The P-value for EVENT_TYPE and BEGIN_LOCATION on sample set is: ', chi2[1])

In [None]:
begin_loc_map_sample.groupby('EVENT_TYPE')['BEGIN_LOCATION'].describe()

## Data Visualizations of Risk Assessment

Below are data visualization strategis that includes metrics and plots of different kinds to estimate risk assessment:

1. A time series barplot by month and storm type is an effective option for visualizing which storm types are most likely to happen in a given month. We can also use a heatmap to visualize the relationship between storm type per month as well as a frequency table.
<br>
<br>
2. Because of the number of unique storm types, a density heatmap can be an effective tool to visualize the economic damange caused by storms. 
<br>
<br>
3. A choropleth map can be useful for visualizing which locations the storms are most likely to happen by plotting the count per geo location identifier. A density heatmap is another tool we can use to visualize this data. 

In [3]:
#Creating df for contingency table and multiple plots below

month_event_df = df[['STATE', 'EVENT_TYPE', 'TOTAL_DAMAGE_DEFLATED', 'BEGIN_DATE_TIME','BEGIN_LOCATION', 'END_LOCATION']]#.dropna()

#Creating df copy to slice df accordingly. 
month_event_df = month_event_df.copy()

#Accessing month from BEGIN_DATE_TIME column as int64 for readibility
month_event_df['event_month'] = month_event_df['BEGIN_DATE_TIME'].dt.month.astype('Int64')

### No. 1 - Storm types likely to happen in a given month.

In [None]:
#Plotly density heatmap of storm types likely to happen in a given month

month_event_df_drop = month_event_df.dropna(subset=['event_month'])

fig = px.density_heatmap(month_event_df_drop,
                         x='event_month',
                         y='EVENT_TYPE',
                         color_continuous_scale='Viridis',
#                          nbinsx=12,
#                          nbinsy=14,
                         labels={col:col.replace('_', ' ') for col in month_event_df_drop.columns},
                         title='<b>Density heatmap of storms most likely to happen by month </b><br>1970 - 2020',
                         )
fig.show()

In [None]:
#Contingency table to validate the number of storms per month by type

month_event_cont_table = pd.crosstab(index=month_event_df['EVENT_TYPE'], columns=month_event_df['event_month'])

In [None]:
#Seaborn density heatmap of storm types likely to happen in a given month

sns.set(rc={'figure.figsize':(15,15)})
ax1 = sns.heatmap(month_event_cont_table, cmap="Blues")
ax1.set_title('Density Heatmap of Event Type vs. Month');

### No. 2 - How large the economic damages caused by the storms would be

In [None]:
#Creating a separate df for economic damages deflated

total_loss_map = df.dropna(subset=['BEGIN_LAT', 'BEGIN_LON', 'END_LAT', 'END_LON', 'TOTAL_DAMAGE_DEFLATED'])

#creating a smaller random sample to speed up processing 
total_loss_map_sample = total_loss_map.sample(n=20000)

#subsetting only necessary columns for better readability
total_loss_map_sample = total_loss_map_sample[['STATE', 'EVENT_TYPE', 'BEGIN_DATE_TIME',
       'BEGIN_YEAR', 'BEGIN_LOCATION', 'END_LOCATION', 'BEGIN_LAT', 'BEGIN_LON', 'END_LAT',
       'END_LON', 'TOTAL_DAMAGE_DEFLATED']]

#Converting total_damages_deflated from float to string to concat with hover text column only
total_loss_map_sample['total_damages_deflated'] = ["$%.4f" % i for i in total_loss_map_sample['TOTAL_DAMAGE_DEFLATED']]

total_loss_map_sample['year'] = ["%.f" % i for i in total_loss_map_sample['BEGIN_YEAR']]

#converting YEAR to Int to add to hover text column below
total_loss_map_sample['year_int'] = total_loss_map_sample['BEGIN_DATE_TIME'].dt.year.astype('Int64')

#converting TOTAL_DAMAGE_DEFLATED column from float to string to add to hover text column below
total_loss_map_sample['total_damages_deflated_f'] = ["%.f" % i for i in total_loss_map_sample['TOTAL_DAMAGE_DEFLATED']]

#concactinating info columns for hover text over map
total_loss_map_sample['text'] = total_loss_map_sample['STATE'] + '<br>type: ' + total_loss_map_sample['EVENT_TYPE'] + '<br>economic damages: ' + total_loss_map_sample['total_damages_deflated'] + '<br>' + total_loss_map_sample['year'].astype(str)

In [None]:
#Zipping lat, lon columns - preparing dataset for folium map BELOW

total_loss_map_sample_zip = list(zip(total_loss_map_sample['BEGIN_LAT'], total_loss_map_sample['BEGIN_LON'], total_loss_map_sample['TOTAL_DAMAGE_DEFLATED']))

In [None]:
#Folium  Heatmap weighted based on the total damage of events per location.
initial_coords = [44.08, -103.23]
folium_loss_hmap = folium.Map(location=initial_coords, zoom_start=2.5, tiles='CartoDB dark_matter')

#folium.Marker([25.76, -80.19], popup='Hello from Miami!').add_to(folium_loss_hmap)


hm_layer = HeatMap(total_loss_map_sample_zip,
                   #Parameters to adjust tiles color, size, blur
                   min_opacity=0.45,
                   radius=4.5,
                   blur=3.75, 
                 )
folium_loss_hmap.add_child(hm_layer)

In [None]:
#Plotly go.Scattergeo (bubble map) of economic damages caused by the storms 

df2 = total_loss_map_sample
df2['text'] = df2['BEGIN_LOCATION'] + '<br>type: ' + df2['EVENT_TYPE'] + '<br>econmonic damage: ' + '<br>' + df2['year'] + (df2['TOTAL_DAMAGE_DEFLATED']/1e6).astype(str) +' Million'
limits = [(0,1),(2,3),(4,10),(11,20),(21,50),(51,300),(301,1000),(1001,5000),(5001,10000)]
colors = ['royalblue','crimson','lightseagreen','orange','chartreuse','cadetblue', 'royalblue','indigo','limegreen']
scale = 100000

fig = go.Figure()

for i in range(len(limits)):
    lim = limits[i]
    df_sub = df2[lim[0]:lim[1]]
    fig.add_trace(go.Scattergeo(
        locationmode = 'ISO-3',
        lon = df_sub['BEGIN_LON'],
        lat = df_sub['BEGIN_LAT'],
        hovertext = df2['text'],
        marker = dict(
            size = df_sub['TOTAL_DAMAGE_DEFLATED']/scale,
            color = colors[i],
            line_color='rgb(40,40,40)',
            line_width=0.5,
            sizemode = 'area'
        ),
        name = '{0} - {1}'.format(lim[0],lim[1])
    ))
                  
fig.update_layout(
        title_text = '<b>Economic damages caused by storms from 1970 to 2020</b><br>Sample size: 20,000<br>(Hover over map to view event info)',
        showlegend = False,
        geo = dict(
            scope = 'usa',
            landcolor = 'rgb(217, 217, 217)',
        )
    )

fig.show()

In [None]:
#Plotly bar plot of economic damages by storm type

fig = px.bar(total_loss_map_sample, 
             x='EVENT_TYPE', 
             y='TOTAL_DAMAGE_DEFLATED', 
             color='EVENT_TYPE', 
             title='<b>Economic damages by storm type from 1970-2020 | Sample size: 20,000<br></b>(De-select event types to view storms separately. Hover over for event info)',
             hover_data=['STATE', 'EVENT_TYPE', 'TOTAL_DAMAGE_DEFLATED','year'],
             labels={col:col.replace('_', ' ') for col in total_loss_map_sample.columns}) # remove underscore
fig.show()

In [None]:
#Plotly bar plot of economic damages by storm type

total_damages_by_event = total_loss_map_sample.groupby('EVENT_TYPE',)['TOTAL_DAMAGE_DEFLATED'].sum().reset_index()
fig = px.bar(total_damages_by_event, 
             x='EVENT_TYPE', 
             y='TOTAL_DAMAGE_DEFLATED', 
             color='EVENT_TYPE', 
             title='<b>Economic damages by storm type from 1970-2020 | Sample size: 20,000<br></b>(Deselect event types in legend to view storms separately. Hover over for event info)',
             labels={col:col.replace('_', ' ') for col in total_damages_by_event.columns}) # remove underscore
fig.show()

### No. 3 - Locations storms are most likely to happen

In [None]:
#contingency table to validate the number of storms per month by type
event_type_cont_table = pd.crosstab(index=month_event_df['EVENT_TYPE'], columns=month_event_df['STATE'])#.apply(np.log)
event_type_cont_table

In [None]:
#Seaborn density heatmap of locations (by STATE) storms are most likely to happen

sns.set(rc={'figure.figsize':(15,15)})
ax2 = sns.heatmap(event_type_cont_table, cmap="Blues");
ax2.set_title('Event Type vs. Location');
ax2.set_title('Density Heatmap of Event Type vs. Locations');

In [None]:
#Plotly density heatmap of locations (by STATE) storms are most likely to happen

fig = px.density_heatmap(month_event_df,
                         x='STATE',
                         y='EVENT_TYPE',
                         color_continuous_scale='icefire',
#                          nbinsx=60,
#                          nbinsy=60,
                         )
fig.show()

### Hypothesis Testing with Pingouin Library

Conducting a hypothesis test for each event type to assess whether there is a difference in average total damage when comparing disasters that happened before the reform to those that happened after and keeping only the event types that result in a significant difference (using a significance threshold of $\alpha=0.01$). Since it is likely that not all events that have happened in the US are present in this dataset, we can interpret the data as being a sample (conducting hypothesis tests on population data would not make sense).

**Note:** The events that do not have associated events either before or after the the Post-Katrina Emergency Management Act of 2006 are ignored (since a $t$ - test won't be possible). 

In [None]:
df["POST_ACT"] = df["BEGIN_YEAR"] > 2006

def test_differences(df):
    """
    Conducts a t-test on TOTAL_DAMAGES comparing events
    that happened in 2006 or before with events that
    happenned after that year.
    
    Inputs:
    `df`: A pandas DataFrame
    
    Outputs:
    `p_values_signif`: A Python dictionary in which the keys are the event type
    and the values are the significant p-values that resulted from the t-test (alpha
    of 0.01)
    
    Note: If an event type does not have associated events either before or
    after the act, ignore it and don't add it to the dictionary (since a t-test
    won't be possible)
    """
      
    pre_dam = df[df["POST_ACT"]==False][["EVENT_TYPE", "TOTAL_DAMAGE_DEFLATED"]].dropna(how="any")
    post_dam = df[df["POST_ACT"]==True][["EVENT_TYPE", "TOTAL_DAMAGE_DEFLATED"]].dropna(how="any")
    
    # YOUR CODE HERE
    
    #dict to add keys produced by ttest of pre_dam and post_dam event types and p-values
    p_values_signif = {}
    
    #variable identifying unique storm event values in df
    list_of_unique_events = df['EVENT_TYPE'].unique()
    
    # creating list of pre/post dam TOTAL_DAMAGE_DEFLATED by event
    for i in list_of_unique_events:
        pre_dam_unique = pre_dam[pre_dam['EVENT_TYPE']==i].TOTAL_DAMAGE_DEFLATED
        post_dam_unique = post_dam[post_dam['EVENT_TYPE']==i].TOTAL_DAMAGE_DEFLATED
        
        #filtering out events not associated with pre/post dam events to run t-test
        if (len(pre_dam_unique)==0) or (len(post_dam_unique)==0):
            continue
        
        #perform t-test of unique events in pre/post dam df
        else:    
            ttest = pg.ttest(pre_dam_unique, post_dam_unique)
            p = ttest['p-val'].item()
            alpha =  p <= 0.01
            if alpha == True:
                p_values_signif[i] = p #adds keys to dict if p-value is not above alpha
    
    return p_values_signif
test_differences(df)

Plotting significant event types and their total deflated damages as box plots, comparing the pre-Act events with the post-Act events.

In [None]:
#Plotly box plot - comparing post-act and pre-act 
#total deflated damages for significant event types.

#dumping p-values of significant events to list
sig_events = list(test_differences(df).keys())

#filtering significant events from df 
subset_df = df[df.EVENT_TYPE.isin(sig_events)]

#df of select columns for plotting and further analysis
subset_df = subset_df[['EVENT_TYPE', 'TOTAL_DAMAGE_DEFLATED', 'POST_ACT', 'BEGIN_YEAR']]#.dropna(how='any')

#converting BEGIN_YEAR to Int then str to add to map hover text below
subset_df['year'] = subset_df['BEGIN_YEAR'].astype('Int64')
subset_df['Year'] = subset_df['year'].astype(str)
subset_df['Location'] = df['STATE'].astype(str)

for event in sig_events:
    df_plot = subset_df[subset_df['EVENT_TYPE']==event]

    fig = px.box(df_plot,
                 x = 'EVENT_TYPE',
                 y = 'TOTAL_DAMAGE_DEFLATED',
                 color= 'POST_ACT',
                 #labels={False: 'pre-Act', True: 'post-Act'},
                 template = 'plotly_dark',
                 title='<b>Total Deflated Damage of Disaster Type</b><br>post-Act   vs.  pre-Act comparison<br>',
                 points='all', #selcts between ‘outliers’, ‘suspectedoutliers’, ‘all’, or False for further analysis
                 color_discrete_sequence=['#0000FF', '#DFFF00'], #define CSS-colors
                 hover_data = ['Year'],
                 hover_name = 'Location', 
                 category_orders= {'POST_ACT': [False, True]},
                )            
    
    fig.update_layout( # customizes font,legend, orientation & position
    font_family='monospace',
    legend=dict(
        title=None, 
        orientation='h', 
        y=1, 
        yanchor='bottom', 
        x=0.5, 
        xanchor='center'),
        xaxis_title='Disaster Type',
        yaxis_title='Total Damage (Deflated)',
        hoverlabel=dict(bgcolor='black',
                        font_color = 'white',
                        font_size = 12,
                       ))
    

    fig.show()

In [None]:
# Seaborn Plots
#dumping p-values of significant events to list
sig_events = list(test_differences(df).keys())

#filtering significant events from df 
subset_df = df[df.EVENT_TYPE.isin(sig_events)]

#df of select columns for plotting and further analysis
subset_df = subset_df[['EVENT_TYPE', 'TOTAL_DAMAGE_DEFLATED', 'POST_ACT', 'BEGIN_YEAR']]#.dropna(how='any')

for event in sig_events:
    plt.figure(figsize = (15,8))
    sns.boxplot(x='EVENT_TYPE', y='TOTAL_DAMAGE_DEFLATED', hue='POST_ACT', data=subset_df[subset_df["EVENT_TYPE"]==event], showfliers=False)
    plt.title("Pre-Act vs Post-Act Event" + event)
    plt.show()

## Statistical Analysis

In [None]:
#Frequency table of categorical variable within the significant difference df
subset_df_cont = ff.create_table(subset_df.describe(include=['O']).T, index=True, index_title='Categorical columns')
subset_df_cont

In [None]:
#Frequency table of event types with a signifant difference
subset_df.EVENT_TYPE.value_counts()

In [None]:
#Count of events per type, post-Act and pre-Act
subset_df_crosstab = pd.crosstab(index=subset_df['EVENT_TYPE'], columns=subset_df['POST_ACT'])
subset_df_crosstab

In [None]:
chi2 = chi2_contingency(subset_df_crosstab)
print('The P-value for event type by post-Act and pre-Act: ', chi2[1])

In [None]:
#statistical summary of event types and post / pre act status
subset_df.groupby('EVENT_TYPE')['POST_ACT'].describe()

In [None]:
#statistical summary of event types and total damages
subset_df.groupby('EVENT_TYPE')['TOTAL_DAMAGE_DEFLATED'].describe()

In [None]:
#count and statistical summary of post and pre act status by total damages deflated
subset_df.groupby('POST_ACT')['TOTAL_DAMAGE_DEFLATED'].describe()

# Conclusion

Scientific studies indicate that extreme weather events such as heatwaves and large storms are likely to become more frequent or more intense with human-induced climate change. The Post-Katrina Emergency Management Reform Act of 2006 sought to remedy the gaps in national emergency recovery operations, which became evident after several devastating natural disasters took place. Out of 62 unique disaster types, nine events returned P-values less than 0.00. Based on the hypothesis tests results, we can conclude a statistically high significance between nine disaster types and the implementation of PKEMRA. <br><br>As we continue to slice the data for deeper analysis, we uncover several notions. For instance, flash floods have the highest frequency out of the nine disaster types, followed by tornados, heavy snow, and droughts. When analyzing the total damages for these disasters, flash floods, tornados, and droughts are the costliest weather and climate disasters. Though heavy snow may not have as high a total damage cost, it is the fourth most frequent event among the nine types. Natural disasters have cost our nation trillions of dollars, and in efforts to mitigate the risk and loss of such disasters, PKEMRA was enacted in 2006. An interesting question to consider is how PKEMRA has impacted the total damages of storms. <br><br>Natural disasters can be predicted at different levels as frequency is derived either from the number of recorded events or by developing models of events exampled in this case. The box plot visualized in 8.2 does not yet give us a complete story. Although there is high statistical significance in nine of 62 disaster types, there are still many other risk factors to consider. Nonetheless, governments must bring about awareness, preparedness, and warning systems to reduce the impact of natural disasters on communities.

By Veronica Huxley