# Modeling Fire Risk for Census Tracts

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Modeling-Fire-Risk-for-Census-Tracts" data-toc-modified-id="Modeling-Fire-Risk-for-Census-Tracts-1">Modeling Fire Risk for Census Tracts</a></span><ul class="toc-item"><li><span><a href="#Imports" data-toc-modified-id="Imports-1.1">Imports</a></span></li><li><span><a href="#Read-in-Datasets" data-toc-modified-id="Read-in-Datasets-1.2">Read in Datasets</a></span></li><li><span><a href="#Models-using-Monthly-Data" data-toc-modified-id="Models-using-Monthly-Data-1.3">Models using Monthly Data</a></span><ul class="toc-item"><li><span><a href="#Train-(Validation)-Test-Split" data-toc-modified-id="Train-(Validation)-Test-Split-1.3.1">Train-(Validation)-Test Split</a></span></li><li><span><a href="#RandomForestClassifier" data-toc-modified-id="RandomForestClassifier-1.3.2">RandomForestClassifier</a></span><ul class="toc-item"><li><span><a href="#Predicting-if-a-census-tract-will-have-a-fire-in-a-given-year" data-toc-modified-id="Predicting-if-a-census-tract-will-have-a-fire-in-a-given-year-1.3.2.1">Predicting if a census tract will have a fire in a given year</a></span></li></ul></li><li><span><a href="#GridSearch-(RandomForestClassifier)" data-toc-modified-id="GridSearch-(RandomForestClassifier)-1.3.3">GridSearch (RandomForestClassifier)</a></span></li><li><span><a href="#AdaBoostClassifier" data-toc-modified-id="AdaBoostClassifier-1.3.4">AdaBoostClassifier</a></span></li><li><span><a href="#XGBoostClassifier" data-toc-modified-id="XGBoostClassifier-1.3.5">XGBoostClassifier</a></span></li></ul></li><li><span><a href="#Models-using-Annual-Data" data-toc-modified-id="Models-using-Annual-Data-1.4">Models using Annual Data</a></span><ul class="toc-item"><li><span><a href="#Train-(Validation)-Test-Split" data-toc-modified-id="Train-(Validation)-Test-Split-1.4.1">Train-(Validation)-Test Split</a></span></li><li><span><a href="#RandomForestRegressor" data-toc-modified-id="RandomForestRegressor-1.4.2">RandomForestRegressor</a></span><ul class="toc-item"><li><span><a href="#Predicting-Number-of-Fire-Incidents-in-Census-Tracts-Over-a-Year" data-toc-modified-id="Predicting-Number-of-Fire-Incidents-in-Census-Tracts-Over-a-Year-1.4.2.1">Predicting Number of Fire Incidents in Census Tracts Over a Year</a></span></li></ul></li><li><span><a href="#Classification-Models" data-toc-modified-id="Classification-Models-1.4.3">Classification Models</a></span><ul class="toc-item"><li><span><a href="#Predict-if-a-census-tract-will-have-a-fire-in-a-given-year" data-toc-modified-id="Predict-if-a-census-tract-will-have-a-fire-in-a-given-year-1.4.3.1">Predict if a census tract will have a fire in a given year</a></span></li><li><span><a href="#Train-(Validation)-Test-Split-(for-Classification)" data-toc-modified-id="Train-(Validation)-Test-Split-(for-Classification)-1.4.3.2">Train-(Validation)-Test Split (for Classification)</a></span></li><li><span><a href="#RandomForestClassifier" data-toc-modified-id="RandomForestClassifier-1.4.3.3">RandomForestClassifier</a></span></li><li><span><a href="#GridSearch-(RandomForestClassifier)" data-toc-modified-id="GridSearch-(RandomForestClassifier)-1.4.3.4">GridSearch (RandomForestClassifier)</a></span></li><li><span><a href="#AdaBoostClassifier" data-toc-modified-id="AdaBoostClassifier-1.4.3.5">AdaBoostClassifier</a></span></li><li><span><a href="#XGBoostClassifier" data-toc-modified-id="XGBoostClassifier-1.4.3.6">XGBoostClassifier</a></span></li></ul></li></ul></li></ul></li></ul></div>

## Imports

In [230]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import namedtuple
from sklearn.model_selection import TimeSeriesSplit,GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import AdaBoostClassifier,AdaBoostRegressor,RandomForestRegressor,RandomForestClassifier
from xgboost import XGBClassifier,XGBRegressor
from sklearn.metrics import mean_squared_error,accuracy_score,recall_score,r2_score
from sklearn.metrics import precision_score,confusion_matrix,cohen_kappa_score
from sklearn.metrics import auc,roc_curve,roc_auc_score,classification_report

pd.options.display.max_columns = 999

## Read in Datasets

In [304]:
## Fire Incident & ACS Data Merged on Census Tracts by Month & Year
fire_tracts = pd.read_csv('./merged_acs_fire_tract_data.csv')
print(f'fire_tracts Shape: {fire_tracts.shape}')
fire_tracts.drop(columns = ['geoid'], inplace = True)
fire_tracts.head()

fire_tracts Shape: (155448, 100)


Unnamed: 0,year,month,tract,count,tot_BldgArea,tot_NumBldgs,tot_UnitsRes,tot_UnitsTotal,tot_Ext_Exten_Garage,tot_Ext_Extension,tot_Ext_Garage,tot_Ext_No_Ext_Gar,tot_ProxCode_Attached,tot_ProxCode_Detached,tot_ProxCode_NA,tot_ProxCode_Not_Provided,tot_ProxCode_Semi_Attached,tot_BsmtCode_Above_Gr_Full_Bsmt,tot_BsmtCode_Above_Gr_Part_Bsmt,tot_BsmtCode_Below_Gr_Full_Bsmt,tot_BsmtCode_Below_Gr_Part_Bsmt,tot_BsmtCode_No_Bsmt,tot_BsmtCode_Not_Provided,tot_BsmtCode_Unknown,tot_LandUse_01,tot_LandUse_02,tot_LandUse_03,tot_LandUse_04,tot_LandUse_05,tot_LandUse_06,tot_LandUse_07,tot_LandUse_08,tot_LandUse_09,tot_LandUse_Not_Provided,tot_LandUse_Parking_Garage,tot_LandUse_Vacant,tot_ComArea,tot_ResArea,tot_OfficeArea,tot_RetailArea,tot_GarageArea,tot_StrgeArea,tot_FactryArea,tot_OtherArea,ratio_ComArea,ratio_ResArea,ratio_OfficeArea,ratio_RetailArea,ratio_GarageArea,ratio_StrgeArea,ratio_FactryArea,ratio_OtherArea,avg_NumFloors,avg_YearBuilt,avg_BldgArea,avg_UnitArea,Total housing units,pct_Occupied housing units,pct_Vacant housing units,pct_1-unit detached,pct_1-unit attached,pct_2 units,pct_3 or 4 units,pct_5 to 9 units,pct_10 to 19 units,pct_20 or more units,pct_Built 2014 or later,pct_Built 2010 to 2013,pct_Built 2000 to 2009,pct_Built 1990 to 1999,pct_Built 1980 to 1989,pct_Built 1970 to 1979,pct_Built 1960 to 1969,pct_Built 1950 to 1959,pct_Built 1940 to 1949,pct_Built 1939 or earlier,pct_1 room,pct_2 rooms,pct_3 rooms,pct_4 rooms,pct_5 rooms,pct_6 rooms,pct_7 rooms,pct_8 rooms,pct_9 rooms or more,pct_Owner-occupied,pct_Renter-occupied,pct_Utility gas,pct_Bottled tank or LP gas,pct_Electricity,pct_Fuel oil kerosene etc.,pct_Coal or coke,pct_Wood,pct_Solar energy,pct_Other fuel,pct_No fuel used,pct_1.00 or less,pct_1.01 to 1.50,pct_1.51 or more
0,2013,1,1000100,0,1145016.0,24.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1145016.0,0.0,0.0,0.0,0.0,0.0,0.0,1145016.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1900.0,572508.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2013,1,1000201,0,1592632.0,43.0,1055.0,1089.0,0.0,17.0,0.0,24.0,24.0,3.0,10.0,0.0,4.0,6.0,0.0,21.0,0.0,6.0,0.0,8.0,0.0,15.0,1.0,11.0,0.0,0.0,1.0,8.0,1.0,0.0,3.0,1.0,692133.0,900498.0,1700.0,17118.0,0.0,0.0,0.0,673315.0,0.434584,0.565415,0.001067,0.010748,0.0,0.0,0.0,0.422769,4.780488,1912.138889,38844.68,35.670049,1095,0.899543,0.100457,0.0,0.0,0.006393,0.008219,0.057534,0.138813,0.789041,0.0,0.0,0.0,0.0,0.0,0.241096,0.109589,0.340639,0.049315,0.259361,0.058447,0.10411,0.292237,0.326941,0.10411,0.108676,0.0,0.0,0.005479,0.006091,0.993909,0.342132,0.04467,0.353299,0.209137,0.0,0.0,0.0,0.041624,0.009137,0.832487,0.081218,0.086294
2,2013,1,1000202,1,4111815.0,92.0,3568.0,3638.0,0.0,8.0,1.0,47.0,12.0,8.0,32.0,0.0,4.0,4.0,0.0,18.0,0.0,30.0,0.0,4.0,2.0,0.0,11.0,11.0,2.0,0.0,4.0,16.0,10.0,0.0,0.0,0.0,993682.0,2733569.0,33685.0,54640.0,9860.0,1500.0,0.0,891104.0,0.241665,0.664808,0.008192,0.013289,0.002398,0.000365,0.0,0.216718,4.915179,1928.875,73425.27,20.182866,3668,0.936205,0.063795,0.0,0.011178,0.003272,0.004362,0.041985,0.033806,0.905398,0.0,0.0,0.013904,0.01554,0.003544,0.021538,0.265812,0.19084,0.408942,0.07988,0.067067,0.066249,0.333697,0.343511,0.113686,0.059978,0.00627,0.0,0.009542,0.241118,0.758882,0.470588,0.01922,0.177053,0.210542,0.0,0.0,0.0,0.094351,0.028247,0.951951,0.042807,0.005242
3,2013,1,1000500,0,5721187.0,194.0,0.0,5.0,0.0,0.0,0.0,5.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,1.0,0.0,0.0,0.0,0.0,5721187.0,0.0,21146.0,4500.0,0.0,0.0,0.0,5695541.0,1.0,0.0,0.003696,0.000787,0.0,0.0,0.0,0.995517,1.2,1923.0,1144237.0,228847.48,40,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2013,1,1000600,0,6327663.0,189.0,6051.0,6299.0,0.0,28.0,0.0,137.0,90.0,13.0,37.0,2.0,23.0,12.0,0.0,88.0,0.0,29.0,2.0,34.0,0.0,25.0,7.0,93.0,7.0,3.0,3.0,17.0,3.0,3.0,2.0,2.0,1693672.0,4637698.0,55974.0,233245.0,34253.0,19933.0,0.0,1214990.0,0.267662,0.732924,0.008846,0.036861,0.005413,0.00315,0.0,0.192012,5.612121,1926.380645,38349.47,6.088184,4336,0.945111,0.054889,0.0,0.0,0.024677,0.038284,0.014068,0.086024,0.836946,0.0,0.0,0.007841,0.026292,0.108164,0.178275,0.174354,0.136993,0.012223,0.355858,0.083256,0.124539,0.229013,0.229705,0.271679,0.061808,0.0,0.0,0.0,0.054173,0.945827,0.442411,0.014885,0.267692,0.222792,0.0,0.0,0.0,0.014885,0.037335,0.90654,0.044656,0.048804


In [305]:
## Fire Incident & ACS Data Merged on Census Tracts by Year
fire_tracts_annual = pd.read_csv('./merged_acs_annual_fire_tract_data.csv')
print(f'fire_tracts_annual Shape: {fire_tracts_annual.shape}')
fire_tracts_annual.drop(columns = ['geoid'], inplace = True)
fire_tracts_annual.head()

fire_tracts_annual Shape: (12954, 99)


Unnamed: 0,year,tract,count,tot_BldgArea,tot_NumBldgs,tot_UnitsRes,tot_UnitsTotal,tot_Ext_Exten_Garage,tot_Ext_Extension,tot_Ext_Garage,tot_Ext_No_Ext_Gar,tot_ProxCode_Attached,tot_ProxCode_Detached,tot_ProxCode_NA,tot_ProxCode_Not_Provided,tot_ProxCode_Semi_Attached,tot_BsmtCode_Above_Gr_Full_Bsmt,tot_BsmtCode_Above_Gr_Part_Bsmt,tot_BsmtCode_Below_Gr_Full_Bsmt,tot_BsmtCode_Below_Gr_Part_Bsmt,tot_BsmtCode_No_Bsmt,tot_BsmtCode_Not_Provided,tot_BsmtCode_Unknown,tot_LandUse_01,tot_LandUse_02,tot_LandUse_03,tot_LandUse_04,tot_LandUse_05,tot_LandUse_06,tot_LandUse_07,tot_LandUse_08,tot_LandUse_09,tot_LandUse_Not_Provided,tot_LandUse_Parking_Garage,tot_LandUse_Vacant,tot_ComArea,tot_ResArea,tot_OfficeArea,tot_RetailArea,tot_GarageArea,tot_StrgeArea,tot_FactryArea,tot_OtherArea,ratio_ComArea,ratio_ResArea,ratio_OfficeArea,ratio_RetailArea,ratio_GarageArea,ratio_StrgeArea,ratio_FactryArea,ratio_OtherArea,avg_NumFloors,avg_YearBuilt,avg_BldgArea,avg_UnitArea,Total housing units,pct_Occupied housing units,pct_Vacant housing units,pct_1-unit detached,pct_1-unit attached,pct_2 units,pct_3 or 4 units,pct_5 to 9 units,pct_10 to 19 units,pct_20 or more units,pct_Built 2014 or later,pct_Built 2010 to 2013,pct_Built 2000 to 2009,pct_Built 1990 to 1999,pct_Built 1980 to 1989,pct_Built 1970 to 1979,pct_Built 1960 to 1969,pct_Built 1950 to 1959,pct_Built 1940 to 1949,pct_Built 1939 or earlier,pct_1 room,pct_2 rooms,pct_3 rooms,pct_4 rooms,pct_5 rooms,pct_6 rooms,pct_7 rooms,pct_8 rooms,pct_9 rooms or more,pct_Owner-occupied,pct_Renter-occupied,pct_Utility gas,pct_Bottled tank or LP gas,pct_Electricity,pct_Fuel oil kerosene etc.,pct_Coal or coke,pct_Wood,pct_Solar energy,pct_Other fuel,pct_No fuel used,pct_1.00 or less,pct_1.01 to 1.50,pct_1.51 or more
0,2013,1000100,0,1145016.0,24.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1145016.0,0.0,0.0,0.0,0.0,0.0,0.0,1145016.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1900.0,572508.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2013,1000201,0,1592632.0,43.0,1055.0,1089.0,0.0,17.0,0.0,24.0,24.0,3.0,10.0,0.0,4.0,6.0,0.0,21.0,0.0,6.0,0.0,8.0,0.0,15.0,1.0,11.0,0.0,0.0,1.0,8.0,1.0,0.0,3.0,1.0,692133.0,900498.0,1700.0,17118.0,0.0,0.0,0.0,673315.0,0.434584,0.565415,0.001067,0.010748,0.0,0.0,0.0,0.422769,4.780488,1912.138889,38844.68,35.670049,1095,0.899543,0.100457,0.0,0.0,0.006393,0.008219,0.057534,0.138813,0.789041,0.0,0.0,0.0,0.0,0.0,0.241096,0.109589,0.340639,0.049315,0.259361,0.058447,0.10411,0.292237,0.326941,0.10411,0.108676,0.0,0.0,0.005479,0.006091,0.993909,0.342132,0.04467,0.353299,0.209137,0.0,0.0,0.0,0.041624,0.009137,0.832487,0.081218,0.086294
2,2013,1000202,5,4111815.0,92.0,3568.0,3638.0,0.0,8.0,1.0,47.0,12.0,8.0,32.0,0.0,4.0,4.0,0.0,18.0,0.0,30.0,0.0,4.0,2.0,0.0,11.0,11.0,2.0,0.0,4.0,16.0,10.0,0.0,0.0,0.0,993682.0,2733569.0,33685.0,54640.0,9860.0,1500.0,0.0,891104.0,0.241665,0.664808,0.008192,0.013289,0.002398,0.000365,0.0,0.216718,4.915179,1928.875,73425.27,20.182866,3668,0.936205,0.063795,0.0,0.011178,0.003272,0.004362,0.041985,0.033806,0.905398,0.0,0.0,0.013904,0.01554,0.003544,0.021538,0.265812,0.19084,0.408942,0.07988,0.067067,0.066249,0.333697,0.343511,0.113686,0.059978,0.00627,0.0,0.009542,0.241118,0.758882,0.470588,0.01922,0.177053,0.210542,0.0,0.0,0.0,0.094351,0.028247,0.951951,0.042807,0.005242
3,2013,1000500,0,5721187.0,194.0,0.0,5.0,0.0,0.0,0.0,5.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,1.0,0.0,0.0,0.0,0.0,5721187.0,0.0,21146.0,4500.0,0.0,0.0,0.0,5695541.0,1.0,0.0,0.003696,0.000787,0.0,0.0,0.0,0.995517,1.2,1923.0,1144237.0,228847.48,40,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2013,1000600,14,6327663.0,189.0,6051.0,6299.0,0.0,28.0,0.0,137.0,90.0,13.0,37.0,2.0,23.0,12.0,0.0,88.0,0.0,29.0,2.0,34.0,0.0,25.0,7.0,93.0,7.0,3.0,3.0,17.0,3.0,3.0,2.0,2.0,1693672.0,4637698.0,55974.0,233245.0,34253.0,19933.0,0.0,1214990.0,0.267662,0.732924,0.008846,0.036861,0.005413,0.00315,0.0,0.192012,5.612121,1926.380645,38349.47,6.088184,4336,0.945111,0.054889,0.0,0.0,0.024677,0.038284,0.014068,0.086024,0.836946,0.0,0.0,0.007841,0.026292,0.108164,0.178275,0.174354,0.136993,0.012223,0.355858,0.083256,0.124539,0.229013,0.229705,0.271679,0.061808,0.0,0.0,0.0,0.054173,0.945827,0.442411,0.014885,0.267692,0.222792,0.0,0.0,0.0,0.014885,0.037335,0.90654,0.044656,0.048804


In [306]:
## Create tuple object called yearMonth to store year and month from fire_tracts as one variable
yearMonth = namedtuple('yearMonth', ['year','month'])
fire_tracts['year_month'] = [yearMonth(yr,mon) for yr,mon in zip(fire_tracts['year'],fire_tracts['month'])]

## Models using Monthly Data

### Train-(Validation)-Test Split

In [307]:
## Create copy of fire_tracts called monthly_data
monthly_data = fire_tracts.copy()
monthly_data.set_index(['year_month','tract'], drop= False)
monthly_data.drop(columns=['year','month'],inplace = True)

## Create binary column fire that contains a 1 if there was a fire in a census tract for a given month, 0 if not
monthly_data['fire'] = monthly_data['count'].map(lambda x: 1 if x>0 else 0)

## Split monthly_data into training, validation, and testing sets
train = monthly_data[monthly_data['year_month'] <= (2016,12)]        ## Includes all incidents that occurred in 2013, 2014, 2015, & 2016
validation = monthly_data[(monthly_data['year_month'] >= (2017,1)) &
                          (monthly_data['year_month'] <= (2017,12))] ## Includes all incidents that occurred in 2017
test = monthly_data[monthly_data['year_month'] >= (2018,1)]          ## Includes all incidents that occurred in 2018

train.drop(columns = ['year_month','tract'],inplace = True)
validation.drop(columns = ['year_month','tract'],inplace = True)
test.drop(columns = ['year_month','tract'],inplace = True)

cols = np.array(train.columns)

## Split training data into X_train (features) and y_train (target)
X_train = np.array(train.drop(columns = ['count','fire']))
y_train = np.array(train['fire'])

## Split validation data into X_validation (features) and y_validation (target)
X_validation = np.array(validation.drop(columns = ['count','fire']))
y_validation = np.array(validation['fire'])

## Split testing data into X_test (features) and y_test (target)
X_test = np.array(test.drop(columns = ['count','fire']))
y_test = np.array(test['fire'])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


In [308]:
print(f"Training Data Fire Percentage: {train['fire'].value_counts(normalize = True)[1]}")

Training Data Fire Percentage: 0.2521035973444496


In [309]:
print(f"Validation Data Fire Percentage: {validation['fire'].value_counts(normalize = True)[1]}")

Validation Data Fire Percentage: 0.23791878956306933


In [311]:
print(f"Testing Data Fire Percentage: {test['fire'].value_counts(normalize = True)[1]}")

Testing Data Fire Percentage: 0.13408985641500695


### RandomForestClassifier
#### Predicting if a census tract will have a fire in a given year

In [312]:
## Instantiate RandomForestClassifier as rf
rf = RandomForestClassifier(n_estimators=200,
                               max_depth=25,
                               random_state=42,
                               verbose = 1)
## Fit model on training data
rf.fit(X_train,y_train)

## Generate predictions on validation data
rf_pred = rf.predict(X_validation)

## Generate confusion matrix using predictions and y_validation
rf_cm = confusion_matrix(y_validation,rf_pred)
rf_cm_df = pd.DataFrame(rf_cm,
                        index = ['actual no fire', 'actual fire'],
                        columns = ['predicted no fire', 'predicted fire'])

## Calculate Kappa Coefficient (measures inter-rater agreement for categorical values)
rf_kappa = cohen_kappa_score(y_validation,rf_pred)

## Calculate false positive rate and true positive rate with roc_curve
fpr, tpr, thresholds = roc_curve(y_validation,
                                 rf_pred,
                                 pos_label=1)
## Calculate ROC_AUC
roc_auc = auc(fpr,tpr)

## Use confusion matrix (and true positive rate) to calculate accuracy, recall, and precision
rf_accuracy = float(rf_cm[0][0] + rf_cm[1][1])/ len(y_validation)
rf_recall = tpr[1]
rf_precision = float(rf_cm[1][1])/ float(rf_cm[1][1]+rf_cm[0][1])

print(f'Accuracy: {rf_accuracy}')
print(f'Recall: {rf_recall}')
print(f'Precision: {rf_precision}')
print(f'Kappa Score: {rf_kappa}')
print(f'AUC Score: {roc_auc}')
print()
print('Confusion Matrix:')
print(rf_cm_df)
print()
print()

## Generate list of feature importances and sort by importance
feature_importance = pd.Series(data = rf.feature_importances_,
                               index = train.drop(columns = ['count','fire']).columns)
print('Feature Importances:')
print(feature_importance.sort_values(ascending = False))

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:  2.5min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    3.3s finished


Accuracy: 0.7840821367917246
Recall: 0.33533419857235564
Precision: 0.57996632996633
Kappa Score: 0.3035458167698267
AUC Score: 0.6297568480706188

Confusion Matrix:
                predicted no fire  predicted fire
actual no fire              18247            1497
actual fire                  4097            2067


Feature Importances:
tot_BldgArea                       0.034827
tot_UnitsTotal                     0.033641
tot_UnitsRes                       0.031508
Total housing units                0.026858
tot_RetailArea                     0.023753
tot_LandUse_08                     0.021225
tot_ResArea                        0.020285
avg_NumFloors                      0.017623
tot_OfficeArea                     0.016336
pct_Utility gas                    0.015866
tot_ComArea                        0.015751
avg_YearBuilt                      0.015558
pct_Renter-occupied                0.015345
pct_Fuel oil kerosene etc.         0.014462
tot_LandUse_04                     0.014305
p

### GridSearch (RandomForestClassifier)

In [186]:
## TimeSeries Cross Validation
tscv = TimeSeriesSplit(n_splits = 3)

## Gridsearch Parameters for RandomForestClassifier
params = {'n_estimators':[40,50,60,100],
          'max_depth':[None,10,12,15],
          'max_features':[None,'sqrt','log2']}

## Scoring Metric is F1-Score (to balance precision & recall)
score = 'f1'

## Instantiate RandomForestClassifier as rf
rf = RandomForestClassifier()

## Instantiate GridSearchCV as gs
gs = GridSearchCV(rf, param_grid = params,
                  cv = tscv, scoring = score,
                  verbose = 2, n_jobs = 3)

## Fit GridSearch on training data
gs.fit(X_train,y_train)

print(f'Train - Best Parameters: {gs.best_params_}')
print()

means = gs.cv_results_['mean_test_score']
stds = gs.cv_results_['std_test_score']

print('GridSearch (Train) - Mean F1 Scores & Standard Deviations for Parameters:')
print()
## Print out mean F1-Scores with standard deviation of scores for each set of parameters
for mean,std,param in zip(means,stds,gs.cv_results_['params']):
    print(f'{round(mean,3)} (+/- {round(std*2,3)}) for {param}')

print()
print('Classification Report:')
y_pred = gs.predict(X_validation)
print(classification_report(y_validation,y_pred))

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:   15.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    9.8s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out 

[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    1.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    1.0s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    1.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:  1.4min
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:  5.5min
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:  5.6min finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    1.2s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    1.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks   

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  88 tasks      | elapsed:    8.1s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    9.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.5s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    8.5s
[Parallel(n_jobs=2)]: Don

[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    3.9s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    4.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.3s finished
[Parallel(n_jobs=2)]: Using ba

[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    3.5s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    6.6s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend Thre

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.1s finished
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    3.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  88 tasks      | elapsed:    5.1s
[Parallel(n_jobs=2)]: Done 100 out of 1

[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs

[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    2.6s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.1s finished
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.2s finished
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  88 tasks      | elapsed:    3.7s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    4.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.1s finish

[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    6.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:   11.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend Loky

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10

[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    0.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    0.6s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    0.6s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    3.8s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:   17.5s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:   17.9s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    0.3s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    0.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks   

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    2.8s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    6.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    3.6s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Don

[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:   43.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   13.7s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:   29.8s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    3.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    2.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.6s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.7s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10

[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    1.1s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    1.1s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    2.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    3.7s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurre

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    0.3s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   31.1s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:  2.2min
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:  2.2min finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 196 tasks      | elap

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  88 tasks      | elapsed:    6.5s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    7.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]

[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    1.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    2.6s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.1s
[Parallel(n

[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:   10.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:   18.6s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend Thre

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.6s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out 

[Parallel(n_jobs=2)]: Done  88 tasks      | elapsed:    7.2s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:   16.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    0.9s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    0.9s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    5.7s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:   24.7s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:   25.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 

[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    4.3s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    9.5s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.3s
[Parallel(n_jobs=2)]: Done 100 out of 100 | elapsed:    0.7s finished
[Parallel(n_jobs=2)]: Using ba

[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   55.4s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:  1.0min finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.3s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.3s finished
[Parallel(n_jobs=2)]: Using ba

[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    2.3s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    3.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  25 out of  25 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend Thre

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.4s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.7s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  10

[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    1.0s finished
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    4.7s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:   20.1s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:   20.6s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    0.5s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    0.5s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.3s
[Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:    1.6s
[Parallel(n_jobs=2)]: Done 200 out of 200 | elapsed:    1.7s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks   

Train - Best Parameters: {'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 50}

Train - GridSearch Scores:

0.439 (+/- 0.007) for {'max_depth': None, 'max_features': None, 'n_estimators': 5}
0.44 (+/- 0.01) for {'max_depth': None, 'max_features': None, 'n_estimators': 10}
0.441 (+/- 0.007) for {'max_depth': None, 'max_features': None, 'n_estimators': 25}
0.439 (+/- 0.009) for {'max_depth': None, 'max_features': None, 'n_estimators': 50}
0.442 (+/- 0.006) for {'max_depth': None, 'max_features': None, 'n_estimators': 100}
0.441 (+/- 0.01) for {'max_depth': None, 'max_features': None, 'n_estimators': 200}
0.44 (+/- 0.004) for {'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 5}
0.44 (+/- 0.003) for {'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 10}
0.439 (+/- 0.003) for {'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 25}
0.443 (+/- 0.005) for {'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 50}
0.443 (+/- 0.002) for {'max_depth': Non

0.391 (+/- 0.007) for {'max_depth': 12, 'max_features': 'sqrt', 'n_estimators': 10}
0.388 (+/- 0.011) for {'max_depth': 12, 'max_features': 'sqrt', 'n_estimators': 25}
0.391 (+/- 0.016) for {'max_depth': 12, 'max_features': 'sqrt', 'n_estimators': 50}
0.388 (+/- 0.022) for {'max_depth': 12, 'max_features': 'sqrt', 'n_estimators': 100}
0.391 (+/- 0.012) for {'max_depth': 12, 'max_features': 'sqrt', 'n_estimators': 200}
0.374 (+/- 0.014) for {'max_depth': 12, 'max_features': 'log2', 'n_estimators': 5}
0.384 (+/- 0.024) for {'max_depth': 12, 'max_features': 'log2', 'n_estimators': 10}
0.386 (+/- 0.02) for {'max_depth': 12, 'max_features': 'log2', 'n_estimators': 25}
0.384 (+/- 0.02) for {'max_depth': 12, 'max_features': 'log2', 'n_estimators': 50}
0.383 (+/- 0.022) for {'max_depth': 12, 'max_features': 'log2', 'n_estimators': 100}
0.384 (+/- 0.015) for {'max_depth': 12, 'max_features': 'log2', 'n_estimators': 200}
0.431 (+/- 0.016) for {'max_depth': 15, 'max_features': None, 'n_estimators

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done  50 out of  50 | elapsed:    0.2s finished


              precision    recall  f1-score   support

           0       0.82      0.92      0.87     19744
           1       0.58      0.34      0.43      6164

   micro avg       0.78      0.78      0.78     25908
   macro avg       0.70      0.63      0.65     25908
weighted avg       0.76      0.78      0.76     25908



### AdaBoostClassifier

In [313]:
## Instantiate AdaBoostClassifer as ada
ada = AdaBoostClassifier(n_estimators = 1000,
                         random_state = 42,
                         algorithm = 'SAMME.R')

## Fit AdaBoostClassifer on training data
ada.fit(X_train,y_train)

## Generate predictions on test data
ada_pred = ada.predict(X_test)

## Generate confusion matrix on test data and predictions
ada_cm = confusion_matrix(y_test,ada_pred)
ada_cm_df = pd.DataFrame(ada_cm,
                        index = ['actual no fire', 'actual fire'],
                        columns = ['predicted no fire', 'predicted fire'])

## Calculate Kappa Coefficient
ada_kappa = cohen_kappa_score(y_test,ada_pred)

## Calculate false positive rate and true positive rate using roc_curve
fpr, tpr, thresholds = roc_curve(y_test,
                                 ada_pred,
                                 pos_label=1)
## Calculate roc_auc
roc_auc = auc(fpr,tpr)

## Calculate accuracy, precision, and recall using confusion matrix and true positive rate
ada_accuracy = float(ada_cm[0][0] + ada_cm[1][1])/ len(y_test)
ada_recall = tpr[1]
ada_precision = float(ada_cm[1][1])/ float(ada_cm[1][1]+ada_cm[0][1])

print(f'Accuracy: {ada_accuracy}')
print(f'Recall: {ada_recall}')
print(f'Precision: {ada_precision}')
print(f'Kappa Score: {ada_kappa}')
print(f'AUC Score: {roc_auc}')
print()
print('Confusion Matrix:')
ada_cm_df

Accuracy: 0.8364983788791107
Recall: 0.20293609671848015
Precision: 0.324585635359116
Kappa Score: 0.1634267662118054
AUC Score: 0.5687721403624495

Confusion Matrix:


Unnamed: 0,predicted no fire,predicted fire
actual no fire,20967,1467
actual fire,2769,705


### XGBoostClassifier

In [314]:
## Instantiate XGBoostClassifier as xgb
xgb = XGBClassifier(learning_rate=0.1,
                    n_estimators=1000,
                    objective='binary:logistic',
                    n_jobs = 3,
                    random_state = 42)

## Fit XGBoostClassifier on training data
xgb.fit(X_train,y_train)

## Generate predictions using test data
xgb_pred = xgb.predict(X_test)

## Generate confusion matrix using y_test and predictions
xgb_cm = confusion_matrix(y_test,xgb_pred)
xgb_cm_df = pd.DataFrame(xgb_cm,
                        index = ['actual no fire', 'actual fire'],
                        columns = ['predicted no fire', 'predicted fire'])

## Calculate Kappa Coefficient
xgb_kappa = cohen_kappa_score(y_test,xgb_pred)

## Calculate false positive rate, true positive rate using roc_curve
fpr, tpr, thresholds = roc_curve(y_test,
                                 xgb_pred,
                                 pos_label=1)
roc_auc = auc(fpr,tpr)

## Calculate accuracy, precision, and recall using confusion matric and true positive rate
xgb_accuracy = float(xgb_cm[0][0] + xgb_cm[1][1])/ len(y_test)
xgb_recall = tpr[1]
xgb_precision = float(xgb_cm[1][1])/ float(xgb_cm[1][1]+xgb_cm[0][1])

print(f'Accuracy: {xgb_accuracy}')
print(f'Recall: {xgb_recall}')
print(f'Precision: {xgb_precision}')
print(f'Kappa Score: {xgb_kappa}')
print(f'AUC Score: {roc_auc}')
print()
print('Confusion Matrix:')
xgb_cm_df

Accuracy: 0.8252277288868303
Recall: 0.2662636729994243
Precision: 0.31852617079889806
Kappa Score: 0.19131392775029488
AUC Score: 0.5890246777228556

Confusion Matrix:


Unnamed: 0,predicted no fire,predicted fire
actual no fire,20455,1979
actual fire,2549,925


## Models using Annual Data

### Train-(Validation)-Test Split

In [317]:
## set fire_tracts_annual index as year and tract, save as annual_data
annual_data = fire_tracts_annual.set_index(['year','tract'])

## Split annual_data into training, validation, and testing sets
train = annual_data.loc[:2016] ## Includes all incidents that occurred in 2013, 2014, 2015, & 2016
validation = annual_data.loc[2017] ## Includes all incidents that occurred in 2017
test = annual_data.loc[2018]       ## Includes all incidents that occurred in 2018

## Split training data into X_train (features) and y_train (target)
X_train = train.drop(columns = 'count')
y_train = train[['count']]

## Split training data into X_validation (features) and y_validation (target)
X_validation = validation.drop(columns = 'count')
y_validation = validation[['count']]

## Split training data into X_test (features) and y_test (target)
X_test = test.drop(columns = 'count')
y_test = test[['count']]

### RandomForestRegressor
#### Predicting Number of Fire Incidents in Census Tracts Over a Year

In [316]:
## Instantiate RandomForestRegressor as rf_reg
rf_reg = RandomForestRegressor(n_estimators=1000,
                               n_jobs = 3,
                               verbose = 1)

## Fit rf_reg model on training data
rf_reg.fit(X_train,y_train)

## Generate predictions on validation data
rf_reg_pred_val = rf_reg.predict(X_validation)

## Generate list of features and their importance in model
feature_importance = pd.Series(data = rf_reg.feature_importances_,
                               index = train.drop(columns = ['count']).columns)

## Print list of features sorted by importance
print(feature_importance.sort_values(ascending = False))

## Define function to calculate Adjusted R-Squared
def r2_adj(r_sq, N, p):
    adj_score = 1 - ((1-r_sq)*(N-1)/(N-p-1))
    return adj_score

## Print out R-Squared Score and Adjusted R-Squared Score for Model
print(f'R-Squared Score: {rf_reg.score(X_validation,y_validation)}')
print(f'Adjusted R-Squared Score: {r2_adj(rf_reg.score(X_validation,y_validation),len(rf_reg_pred_val),len(train.drop(columns = ["count"]).columns))}')

  import sys
[Parallel(n_jobs=3)]: Using backend ThreadingBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done  44 tasks      | elapsed:   14.7s
[Parallel(n_jobs=3)]: Done 194 tasks      | elapsed:  1.0min
[Parallel(n_jobs=3)]: Done 444 tasks      | elapsed:  2.4min
[Parallel(n_jobs=3)]: Done 794 tasks      | elapsed:  4.2min
[Parallel(n_jobs=3)]: Done 1000 out of 1000 | elapsed:  5.3min finished
[Parallel(n_jobs=3)]: Using backend ThreadingBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done  44 tasks      | elapsed:    0.0s
[Parallel(n_jobs=3)]: Done 194 tasks      | elapsed:    0.0s
[Parallel(n_jobs=3)]: Done 444 tasks      | elapsed:    0.2s
[Parallel(n_jobs=3)]: Done 794 tasks      | elapsed:    0.3s
[Parallel(n_jobs=3)]: Done 1000 out of 1000 | elapsed:    0.4s finished


tot_UnitsTotal                     0.130093
avg_YearBuilt                      0.060655
tot_BldgArea                       0.043795
Total housing units                0.031210
tot_LandUse_08                     0.026136
pct_Fuel oil kerosene etc.         0.020206
pct_No fuel used                   0.018587
pct_Renter-occupied                0.018372
pct_4 rooms                        0.015639
ratio_StrgeArea                    0.015577
tot_LandUse_Parking_Garage         0.012876
pct_Built 1960 to 1969             0.012526
pct_Owner-occupied                 0.012460
pct_Built 1939 or earlier          0.012327
tot_LandUse_04                     0.012160
pct_Built 1970 to 1979             0.012122
pct_3 rooms                        0.012044
pct_10 to 19 units                 0.011778
tot_LandUse_06                     0.010724
pct_5 rooms                        0.010486
pct_Built 1980 to 1989             0.010354
pct_Built 1990 to 1999             0.010245
tot_ProxCode_Attached           

[Parallel(n_jobs=3)]: Using backend ThreadingBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done  44 tasks      | elapsed:    0.0s
[Parallel(n_jobs=3)]: Done 194 tasks      | elapsed:    0.0s
[Parallel(n_jobs=3)]: Done 444 tasks      | elapsed:    0.1s
[Parallel(n_jobs=3)]: Done 794 tasks      | elapsed:    0.3s
[Parallel(n_jobs=3)]: Done 1000 out of 1000 | elapsed:    0.4s finished


R-Squared Score: 0.7110997403848563


[Parallel(n_jobs=3)]: Using backend ThreadingBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done  44 tasks      | elapsed:    0.0s
[Parallel(n_jobs=3)]: Done 194 tasks      | elapsed:    0.0s
[Parallel(n_jobs=3)]: Done 444 tasks      | elapsed:    0.1s
[Parallel(n_jobs=3)]: Done 794 tasks      | elapsed:    0.3s
[Parallel(n_jobs=3)]: Done 1000 out of 1000 | elapsed:    0.4s finished


Adjusted R-Squared Score: 0.697796044474319


### Classification Models
#### Predict if a census tract will have a fire in a given year

#### Train-(Validation)-Test Split (for Classification)

In [269]:
## Create copy of annual_data for classification called annual_data_clf
annual_data_clf = annual_data

## Create binary column fire that contains a 1 if there was a fire in a census tract for a given year, 0 if not
annual_data_clf['fire'] = annual_data_clf['count'].map(lambda x: 1 if x>0 else 0)

## Split annual_data_clf into training, validation, and testing sets
train = annual_data_clf.loc[:2016]     ## Includes all incidents that occurred in 2013, 2014, 2015, & 2016
validation = annual_data_clf.loc[2017] ## Includes all incidents that occurred in 2017
test = annual_data_clf.loc[2018]       ## Includes all incidents that occurred in 2018

cols = np.array(train.columns)

## Split training data into X_train (features) and y_train (target)
X_train = np.array(train.drop(columns = ['count','fire']))
y_train = np.array(train['fire'])

## Split validation data into X_validation (features) and y_validation (target)
X_validation = np.array(validation.drop(columns = ['count','fire']))
y_validation = np.array(validation['fire'])

## Split testing data into X_test (features) and y_test (target)
X_test = np.array(test.drop(columns = ['count','fire']))
y_test = np.array(test['fire'])

In [277]:
print(f"Training Data Fire Percentage: {train['fire'].value_counts(normalize = True)[1]}")

Training Data Fire Percentage: 0.7729272811486799


In [281]:
print(f"Validation Data Fire Percentage: {validation['fire'].value_counts(normalize = True)[1]}")

Validation Data Fire Percentage: 0.7647058823529411


In [280]:
print(f"Testing Data Fire Percentage: {test['fire'].value_counts(normalize = True)[1]}")

Testing Data Fire Percentage: 0.6892079666512274


#### RandomForestClassifier

In [286]:
## Instantiate RandomForestClassifier as rf
rf = RandomForestClassifier(n_estimators=200,
                               max_depth=25,
                               random_state=42,
                               verbose = 1)
## Fit model on training data
rf.fit(X_train,y_train)

## Generate predictions on validation data
rf_pred = rf.predict(X_validation)

## Generate confusion matrix using predictions and y_validation
rf_cm = confusion_matrix(y_validation,rf_pred)
rf_cm_df = pd.DataFrame(rf_cm,
                        index = ['actual no fire', 'actual fire'],
                        columns = ['predicted no fire', 'predicted fire'])

## Calculate Kappa Coefficient (measures inter-rater agreement for categorical values)
rf_kappa = cohen_kappa_score(y_validation,rf_pred)

## Calculate false positive rate and true positive rate with roc_curve
fpr, tpr, thresholds = roc_curve(y_validation,
                                 rf_pred,
                                 pos_label=1)
## Calculate ROC_AUC
roc_auc = auc(fpr,tpr)

## Use confusion matrix (and true positive rate) to calculate accuracy, recall, and precision
rf_accuracy = float(rf_cm[0][0] + rf_cm[1][1])/ len(y_validation)
rf_recall = tpr[1]
rf_precision = float(rf_cm[1][1])/ float(rf_cm[1][1]+rf_cm[0][1])

print(f'Accuracy: {rf_accuracy}')
print(f'Recall: {rf_recall}')
print(f'Precision: {rf_precision}')
print(f'Kappa Score: {rf_kappa}')
print(f'AUC Score: {roc_auc}')
print()
print('Confusion Matrix:')
print(rf_cm_df)
print()
print()

## Generate list of feature importances and sort by importance
feature_importance = pd.Series(data = rf.feature_importances_,
                               index = train.drop(columns = ['count','fire']).columns)
print('Feature Importances:')
print(feature_importance.sort_values(ascending = False))

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:   19.7s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    0.2s finished


Accuracy: 0.8828161185734136
Recall: 0.9466989703210176
Precision: 0.9045138888888888
Kappa Score: 0.6563324011186575
AUC Score: 0.8109479103573591

Confusion Matrix:
                predicted no fire  predicted fire
actual no fire                343             165
actual fire                    88            1563


Feature Importances:
tot_BldgArea                       0.028239
tot_BsmtCode_No_Bsmt               0.022917
tot_UnitsTotal                     0.022342
tot_UnitsRes                       0.018896
tot_ResArea                        0.018308
Total housing units                0.017611
tot_ProxCode_NA                    0.016067
tot_Ext_No_Ext_Gar                 0.015100
pct_Fuel oil kerosene etc.         0.014393
tot_OtherArea                      0.014382
tot_BsmtCode_Unknown               0.014369
tot_ComArea                        0.013845
pct_5 rooms                        0.013821
avg_YearBuilt                      0.013665
tot_NumBldgs                       0.013650


#### GridSearch (RandomForestClassifier)

In [287]:
## TimeSeries Cross Validation
tscv = TimeSeriesSplit(n_splits = 3)

## Gridsearch Parameters for RandomForestClassifier
params = {'n_estimators':[40,50,60,100],
          'max_depth':[None,10,12,15],
          'max_features':[None,'sqrt','log2']}

## Scoring Metric is F1-Score (to balance precision & recall)
score = 'f1'

## Instantiate RandomForestClassifier as rf
rf = RandomForestClassifier()

## Instantiate GridSearchCV as gs
gs = GridSearchCV(rf, param_grid = params,
                  cv = tscv, scoring = score,
                  verbose = 2, n_jobs = 3)

## Fit GridSearch on training data
gs.fit(X_train,y_train)

print(f'Train - Best Parameters: {gs.best_params_}')
print()

means = gs.cv_results_['mean_test_score']
stds = gs.cv_results_['std_test_score']

print('GridSearch (Train) - Mean F1 Scores & Standard Deviations for Parameters:')
print()
## Print out mean F1-Scores with standard deviation of scores for each set of parameters
for mean,std,param in zip(means,stds,gs.cv_results_['params']):
    print(f'{round(mean,3)} (+/- {round(std*2,3)}) for {param}')

print()
print('Classification Report:')
y_pred = gs.predict(X_validation)
print(classification_report(y_validation,y_pred))

Fitting 3 folds for each of 48 candidates, totalling 144 fits


[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
[Parallel(n_jobs=3)]: Done  35 tasks      | elapsed:  3.3min
[Parallel(n_jobs=3)]: Done 144 out of 144 | elapsed: 10.3min finished


Train - Best Parameters: {'max_depth': 15, 'max_features': 'sqrt', 'n_estimators': 100}

GridSearch (Train) - Mean F1 Scores & Standard Deviations for Parameters:

0.919 (+/- 0.007) for {'max_depth': None, 'max_features': None, 'n_estimators': 40}
0.918 (+/- 0.003) for {'max_depth': None, 'max_features': None, 'n_estimators': 50}
0.918 (+/- 0.003) for {'max_depth': None, 'max_features': None, 'n_estimators': 60}
0.92 (+/- 0.009) for {'max_depth': None, 'max_features': None, 'n_estimators': 100}
0.919 (+/- 0.006) for {'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 40}
0.918 (+/- 0.003) for {'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 50}
0.919 (+/- 0.006) for {'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 60}
0.92 (+/- 0.008) for {'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 100}
0.918 (+/- 0.003) for {'max_depth': None, 'max_features': 'log2', 'n_estimators': 40}
0.92 (+/- 0.008) for {'max_depth': None, 'max_features': 'log2', 'n_es

#### AdaBoostClassifier

In [288]:
## Instantiate AdaBoostClassifer as ada
ada = AdaBoostClassifier(n_estimators = 1000,
                         random_state = 42,
                         algorithm = 'SAMME.R')

## Fit AdaBoostClassifer on training data
ada.fit(X_train,y_train)

## Generate predictions on test data
ada_pred = ada.predict(X_test)

## Generate confusion matrix on test data and predictions
ada_cm = confusion_matrix(y_test,ada_pred)
ada_cm_df = pd.DataFrame(ada_cm,
                        index = ['actual no fire', 'actual fire'],
                        columns = ['predicted no fire', 'predicted fire'])

## Calculate Kappa Coefficient
ada_kappa = cohen_kappa_score(y_test,ada_pred)

## Calculate false positive rate and true positive rate using roc_curve
fpr, tpr, thresholds = roc_curve(y_test,
                                 ada_pred,
                                 pos_label=1)
## Calculate roc_auc
roc_auc = auc(fpr,tpr)

## Calculate accuracy, precision, and recall using confusion matrix and true positive rate
ada_accuracy = float(ada_cm[0][0] + ada_cm[1][1])/ len(y_test)
ada_recall = tpr[1]
ada_precision = float(ada_cm[1][1])/ float(ada_cm[1][1]+ada_cm[0][1])

print(f'Accuracy: {ada_accuracy}')
print(f'Recall: {ada_recall}')
print(f'Precision: {ada_precision}')
print(f'Kappa Score: {ada_kappa}')
print(f'AUC Score: {roc_auc}')
print()
print('Confusion Matrix:')
ada_cm_df

Accuracy: 0.8036127836961556
Recall: 0.9603494623655914
Precision: 0.7965440356744704
Kappa Score: 0.4759750734732616
AUC Score: 0.7081926149383844

Confusion Matrix:


Unnamed: 0,predicted no fire,predicted fire
actual no fire,306,365
actual fire,59,1429


#### XGBoostClassifier

In [289]:
## Instantiate XGBoostClassifier as xgb
xgb = XGBClassifier(learning_rate=0.1,
                    n_estimators=1000,
                    objective='binary:logistic',
                    n_jobs = 3,
                    random_state = 42)

## Fit XGBoostClassifier on training data
xgb.fit(X_train,y_train)

## Generate predictions using test data
xgb_pred = xgb.predict(X_test)

## Generate confusion matrix using y_test and predictions
xgb_cm = confusion_matrix(y_test,xgb_pred)
xgb_cm_df = pd.DataFrame(xgb_cm,
                        index = ['actual no fire', 'actual fire'],
                        columns = ['predicted no fire', 'predicted fire'])

## Calculate Kappa Coefficient
xgb_kappa = cohen_kappa_score(y_test,xgb_pred)

## Calculate false positive rate, true positive rate using roc_curve
fpr, tpr, thresholds = roc_curve(y_test,
                                 xgb_pred,
                                 pos_label=1)
roc_auc = auc(fpr,tpr)

## Calculate accuracy, precision, and recall using confusion matric and true positive rate
xgb_accuracy = float(xgb_cm[0][0] + xgb_cm[1][1])/ len(y_test)
xgb_recall = tpr[1]
xgb_precision = float(xgb_cm[1][1])/ float(xgb_cm[1][1]+xgb_cm[0][1])

print(f'Accuracy: {xgb_accuracy}')
print(f'Recall: {xgb_recall}')
print(f'Precision: {xgb_precision}')
print(f'Kappa Score: {xgb_kappa}')
print(f'AUC Score: {roc_auc}')
print()
print('Confusion Matrix:')
xgb_cm_df

Accuracy: 0.8276980083371931
Recall: 0.9630376344086021
Precision: 0.8188571428571428
Kappa Score: 0.5495125198980055
AUC Score: 0.745304212137237

Confusion Matrix:


Unnamed: 0,predicted no fire,predicted fire
actual no fire,354,317
actual fire,55,1433
