In [1]:
import pandas as pd
import math

fire_locations = pd.read_csv('../output/frontend_fire_data.csv')

fire_locations.head()

Unnamed: 0,IncidentName,UniqueFireIdentifier,FireDiscoveryDateTime,InitialResponseAcres,CreatedOnDateTime_dt,GACC,ContainmentDateTime,ControlDateTime,FireOutDateTime,DiscoveryAcres,FinalAcres,IncidentSize,InitialLatitude,InitialLongitude,Latitude,Longitude
0,SCHWARTZ,2003-WYCMX-200237,2003-01-01 07:00:00,,2020-07-04 16:03:32,RMCC,,,,,,,,,44.603333,-105.563056
1,MENTONE,2004-CABDU-006784,2004-07-06 01:04:00,,2020-10-13 01:01:00,OSCC,,,,,,,,,34.733889,-117.076667
2,NIFC RAMP SUPPORT,2008-IDGBK-000002,2008-07-22 15:10:00,,2019-08-05 14:33:55,GBCC,,,,,,,,,43.666667,-116.216667
3,St. Charles RX,2008-IDCTF-008902,2008-09-08 13:35:00,,2019-08-28 15:23:43,GBCC,,,,1.0,,20.0,42.103889,-111.547778,42.09633,-111.4269
4,Mingus/Cherry Rx,2008-AZPNF-000975,2008-10-07 18:16:00,,2019-09-19 21:08:46,SWCC,,,,,,,34.389999,-112.080002,34.598301,-112.072403


Let's take a look at which attributes are not well tracked (i.e. which ones have null values for lots of instances)

In [2]:
total_instances = fire_locations.shape[0]
print("Total instances: " + str(total_instances))
missing_values = fire_locations.isnull().sum()
print("Missing values for each attribute:")
for attribute, count in missing_values.items():
    print(f"{attribute}: {count} ({int(count/total_instances * 10000)/100}%)")

Total instances: 247739
Missing values for each attribute:
IncidentName: 43 (0.01%)
UniqueFireIdentifier: 0 (0.0%)
FireDiscoveryDateTime: 0 (0.0%)
InitialResponseAcres: 161218 (65.07%)
CreatedOnDateTime_dt: 0 (0.0%)
GACC: 59 (0.02%)
ContainmentDateTime: 97406 (39.31%)
ControlDateTime: 111876 (45.15%)
FireOutDateTime: 103396 (41.73%)
DiscoveryAcres: 63894 (25.79%)
FinalAcres: 229701 (92.71%)
IncidentSize: 74327 (30.0%)
InitialLatitude: 65977 (26.63%)
InitialLongitude: 65977 (26.63%)
Latitude: 0 (0.0%)
Longitude: 0 (0.0%)


Remove instances with missing values for the attributes FireOutDateTime and IncidentSize because we need both for calculations

In [3]:
fire_locations.dropna(subset=["FireOutDateTime"], inplace=True)
fire_locations.dropna(subset=["IncidentSize"], inplace=True)

print("Total instances: " + str(fire_locations.shape[0]))

Total instances: 136993



Remove UniqueFireID _2014-IDNCF-000609_ and _2014-AKFAS-411093_ because they have dates in 1530.

Then, find the difference between each fire's discovery date and time and it's FireOut date and time to get the total time burned. Display the first few rows to verify the calculation. Finaly, get the hours burned from the time burned so that we're working with a consistent time unit.

In [4]:
fire_locations.drop(fire_locations[fire_locations['UniqueFireIdentifier'] == '2014-AKFAS-411093'].index, inplace=True)
fire_locations.drop(fire_locations[fire_locations['UniqueFireIdentifier'] == '2014-IDNCF-000609'].index, inplace=True)

fire_locations['FireDiscoveryDateTime'] = pd.to_datetime(fire_locations['FireDiscoveryDateTime'])
fire_locations['FireOutDateTime'] = pd.to_datetime(fire_locations['FireOutDateTime'])
fire_locations['TimeBurned'] = fire_locations['FireOutDateTime'] - fire_locations['FireDiscoveryDateTime']

print(fire_locations[['FireDiscoveryDateTime', 'FireOutDateTime', 'TimeBurned']].head())

fire_locations['HoursBurned'] = fire_locations['TimeBurned'].dt.total_seconds() / (60*60)

   FireDiscoveryDateTime     FireOutDateTime      TimeBurned
16   2009-07-06 20:00:00 2009-07-07 20:00:00 1 days 00:00:00
74   2014-02-10 07:30:00 2014-02-10 22:30:48 0 days 15:00:48
80   2014-03-13 21:06:10 2014-03-13 22:26:14 0 days 01:20:04
81   2014-03-23 22:50:58 2014-03-28 22:00:03 4 days 23:09:05
82   2014-03-25 20:38:54 2014-03-25 23:00:28 0 days 02:21:34


Now, let's get an idea of what we're working with using the range and average of both attributes we'll be working with for supression result. We want to get an idea of the distribution of our data.

In [5]:
summary = fire_locations[['HoursBurned', 'IncidentSize']].describe()

time_burned_range = summary.loc['max', 'HoursBurned'] - summary.loc['min', 'HoursBurned']
acres_burned_range = summary.loc['max', 'IncidentSize'] - summary.loc['min', 'IncidentSize']
average_time_burned = summary.loc['mean', 'HoursBurned']
average_acres_burned = summary.loc['mean', 'IncidentSize']

print("HoursBurned Range:", time_burned_range)
print("HoursBurned Average:", average_time_burned)
print("IncidentSize Range:", acres_burned_range)
print("IncidentSize Average:", average_acres_burned)

HoursBurned Range: 28090.043611111112
HoursBurned Average: 296.50350427194445
IncidentSize Range: 589368.0
IncidentSize Average: 428.8481476254088


Quite the range there, and we know from Kole's graphs that especially for acres burned we have a ton of very small fires and not many large ones. With such a dramatic of a right (or positive) skew, we won't be able to fully normallize the data.

![Acres Burned Distribution](./acres-distribution-chart.svg)

We'll use a log to normalize as much as we can. We'll also need to transform our data (x + 1) to make sure we don't get negative values from the fires that burned for less than an acre or less than an hour.

SupressionResult is calculated using the average of normalized time burned and normalized acreage burned. Acreage burned is given twice as much weight as time burned because we think it is a better indicator of how well a fire was supressed. This average is then converted to a percent.

SuppressionResult = $\left (1 - \frac{\displaystyle\left(\frac{\log(x + 1)}{\log(\text{MaxHoursBurned} + 1)} + \frac{2(\log(y + 1))}{\log(\text{MaxAcresBurned} + 1)}\right)}{\bigg(3\bigg)}\right) \times 100$

Where $x = HoursBurned$ and $y = AcresBurned$ of a given fire

In [6]:
log_time_range = math.log(time_burned_range+1)
log_acres_range = math.log(acres_burned_range+1)
print("log time: " + str(log_time_range) + " log acres: " + str(log_acres_range))

fire_locations["NormalizedTime"] = fire_locations["HoursBurned"].apply(lambda x: math.log(x+1)/log_time_range if x > 0 else 0)
fire_locations["NormalizedAcreage"] = fire_locations["IncidentSize"].apply(lambda x: math.log(x+1)/log_acres_range if x > 0 else 0)
fire_locations['SupressionResult'] = (1 - (fire_locations['NormalizedTime'] + (2 * fire_locations['NormalizedAcreage']))/3) * 100

log time: 10.243206071815102 log acres: 13.286807752042323


Finally, let's take a look how well our suppression result measures up against both the bigger fires and the rest of the data (mostly much smaller fires).

In [7]:
filtered_fire_locations = fire_locations[fire_locations['IncidentSize'] > 500]
print(filtered_fire_locations[['FireDiscoveryDateTime', 'HoursBurned', 'IncidentSize', 'NormalizedTime', 'NormalizedAcreage', 'SupressionResult']].head())

print(fire_locations[['FireDiscoveryDateTime', 'HoursBurned', 'IncidentSize', 'NormalizedTime', 'NormalizedAcreage', 'SupressionResult']].head())

    FireDiscoveryDateTime  HoursBurned  IncidentSize  NormalizedTime  \
128   2014-04-19 23:30:00  2089.500000       73622.0        0.746364   
134   2014-04-22 10:30:00  1450.500000        3828.0        0.710749   
183   2014-05-04 20:30:00   140.500000        2657.0        0.483472   
218   2014-05-11 20:59:00   412.016667        2202.0        0.588047   
219   2014-05-11 21:02:00  5274.466667        5484.0        0.836732   

     NormalizedAcreage  SupressionResult  
128           0.843447         18.891433  
134           0.620944         34.912109  
183           0.593471         44.319574  
218           0.579340         41.775782  
219           0.647994         28.909317  
   FireDiscoveryDateTime  HoursBurned  IncidentSize  NormalizedTime  \
16   2009-07-06 20:00:00    24.000000           0.1        0.314245   
74   2014-02-10 07:30:00    15.013333           0.3        0.270757   
80   2014-03-13 21:06:10     1.334444           3.0        0.082765   
81   2014-03-23 22:50:58 

In [8]:
fire_locations.to_csv('../output/suppression_stats.csv', index=False, columns=['UniqueFireIdentifier', 'HoursBurned', 'IncidentSize',
                                                                               'NormalizedTime','NormalizedAcreage', 'SupressionResult'])