<a href="https://colab.research.google.com/github/JimenaBaripatti/FeatureEngineering/blob/main/Fire_Data_Exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Data source: https://open.toronto.ca/dataset/fire-incidents/

This dataset provides information similar to what is sent to the Ontario Fire Marshal relating to only fire Incidents to which Toronto Fire responds in more detail than the dataset including all incident types. The Dataset includes only Fire incidents as defined by the Ontario Fire Marshal. For privacy purposes personal information is not provided and exact address have been aggregated to the nearest major or minor intersection. Some incidents have been excluded pursuant to exemptions under Section 8 of Municipal Freedom of Information and Protection of Privacy Act (MFIPPA).




In [1]:
# setting up libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm 
import statistics
from sklearn.metrics import matthews_corrcoef
from scipy.stats import chi2_contingency
import math
from patsy import dmatrices
from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn.ensemble import IsolationForest
from scipy.stats import zscore

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)

%matplotlib inline

  import pandas.util.testing as tm


In [7]:
#Reading Data from GitHub
fire = pd.read_csv('Fire_Incidents_Data.csv')

In [8]:
fire.head()

Unnamed: 0,_id,Area_of_Origin,Building_Status,Business_Impact,Civilian_Casualties,Count_of_Persons_Rescued,Estimated_Dollar_Loss,Estimated_Number_Of_Persons_Displaced,Exposures,Ext_agent_app_or_defer_time,Extent_Of_Fire,Final_Incident_Type,Fire_Alarm_System_Impact_on_Evacuation,Fire_Alarm_System_Operation,Fire_Alarm_System_Presence,Fire_Under_Control_Time,Ignition_Source,Incident_Number,Incident_Station_Area,Incident_Ward,Initial_CAD_Event_Type,Intersection,Last_TFS_Unit_Clear_Time,Latitude,Level_Of_Origin,Longitude,Material_First_Ignited,Method_Of_Fire_Control,Number_of_responding_apparatus,Number_of_responding_personnel,Possible_Cause,Property_Use,Smoke_Alarm_at_Fire_Origin,Smoke_Alarm_at_Fire_Origin_Alarm_Failure,Smoke_Alarm_at_Fire_Origin_Alarm_Type,Smoke_Alarm_Impact_on_Persons_Evacuating_Impact_on_Evacuation,Smoke_Spread,Sprinkler_System_Operation,Sprinkler_System_Presence,Status_of_Fire_On_Arrival,TFS_Alarm_Time,TFS_Arrival_Time,TFS_Firefighter_Casualties
0,1946929,81 - Engine Area,,,0,0,15000.0,,,2018-02-25T02:12:00,,01 - Fire,,,,2018-02-25T02:15:40,999 - Undetermined,F18020956,441,1.0,Vehicle Fire,Dixon Rd / 427 N Dixon Ramp,2018-02-25T02:38:31,43.686558,,-79.599419,47 - Vehicle,1 - Extinguished by fire department,1,4,99 - Undetermined,"896 - Sidewalk, street, roadway, highway, hwy (do not use for fire incidents)",,,,,,,,"7 - Fully involved (total structure, vehicle, spreading outdoor fire)",2018-02-25T02:04:29,2018-02-25T02:10:11,0
1,1946930,"75 - Trash, rubbish area (outside)",,,0,0,50.0,,,2018-02-25T02:29:42,,01 - Fire,,,,2018-02-25T02:32:24,999 - Undetermined,F18020969,116,18.0,Fire - Grass/Rubbish,Sheppard Ave E / Clairtrell Rd,2018-02-25T02:35:58,43.766135,,-79.390039,97 - Other,1 - Extinguished by fire department,1,4,03 - Suspected Vandalism,"896 - Sidewalk, street, roadway, highway, hwy (do not use for fire incidents)",,,,,,,,2 - Fire with no evidence from street,2018-02-25T02:24:43,2018-02-25T02:29:31,0
2,1946931,,,,0,0,,,,,,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vandal,child playing,recycling or dump fires)",,,,,,F18021182,221,21.0,Fire - Highrise Residential,Danforth Rd / Savarin St,2018-02-25T19:14:03,43.74323,,-79.245061,,,6,22,,891 - Outdoor general auto parking,,,,,,,,,2018-02-25T18:29:59,2018-02-25T18:36:49,0
3,1946932,"75 - Trash, rubbish area (outside)",01 - Normal (no change),1 - No business interruption,0,0,0.0,0.0,,2018-02-25T19:19:25,1 - Confined to object of origin,01 - Fire,9 - Undetermined,8 - Not applicable (no system),9 - Undetermined,2018-02-25T19:20:00,999 - Undetermined,F18021192,133,5.0,Fire - Commercial/Industrial,Keele St / Lawrence Ave W,2018-02-25T20:07:42,43.708659,999.0,-79.478062,99 - Undetermined (formerly 98),1 - Extinguished by fire department,6,22,99 - Undetermined,511 - Department Store,9 - Floor/suite of fire origin: Smoke alarm presence undetermined,98 - Not applicable: Alarm operated OR presence/operation undetermined,9 - Type undetermined,"8 - Not applicable: No alarm, no persons present",99 - Undetermined,8 - Not applicable - no sprinkler system present,9 - Undetermined,"3 - Fire with smoke showing only - including vehicle, outdoor fires",2018-02-25T19:13:39,2018-02-25T19:18:07,0
4,1946933,,,,0,0,,,,,,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vandal,child playing,recycling or dump fires)",,,,,,F18021271,132,8.0,Fire - Residential,Replin Rd / Tapestry Lane,2018-02-25T23:34:24,43.718118,,-79.443184,,,6,22,,860 - Lawn around structure,,,,,,,,,2018-02-25T23:20:43,2018-02-25T23:26:19,0


In [9]:
fire.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17536 entries, 0 to 17535
Data columns (total 43 columns):
 #   Column                                                         Non-Null Count  Dtype  
---  ------                                                         --------------  -----  
 0   _id                                                            17536 non-null  int64  
 1   Area_of_Origin                                                 15623 non-null  object 
 2   Building_Status                                                11216 non-null  object 
 3   Business_Impact                                                11214 non-null  object 
 4   Civilian_Casualties                                            17536 non-null  int64  
 5   Count_of_Persons_Rescued                                       17536 non-null  int64  
 6   Estimated_Dollar_Loss                                          15627 non-null  float64
 7   Estimated_Number_Of_Persons_Displaced                     

In [10]:
# Total number of missing values for each feature
fire.isnull().sum()

_id                                                              0    
Area_of_Origin                                                   1913 
Building_Status                                                  6320 
Business_Impact                                                  6322 
Civilian_Casualties                                              0    
Count_of_Persons_Rescued                                         0    
Estimated_Dollar_Loss                                            1909 
Estimated_Number_Of_Persons_Displaced                            6321 
Exposures                                                        17203
Ext_agent_app_or_defer_time                                      1914 
Extent_Of_Fire                                                   6322 
Final_Incident_Type                                              0    
Fire_Alarm_System_Impact_on_Evacuation                           6322 
Fire_Alarm_System_Operation                                      6322 
Fire_A

In [13]:
#explore dates

fire['Ext_agent_app_or_defer_time'].sample(n=10)

12651    2012-12-29T00:26:20
2131     2019-06-10T12:45:00
15952    2016-03-10T14:00:40
9429     2011-08-17T19:52:35
9179     2012-10-20T07:57:38
6417     2011-08-29T04:17:30
4802     2018-03-22T04:52:23
8467     2016-07-21T19:00:00
10050    2015-03-12T18:49:12
9605     2016-06-16T21:26:00
Name: Ext_agent_app_or_defer_time, dtype: object

In [14]:
dates = pd.to_datetime(fire['Ext_agent_app_or_defer_time'])

In [15]:
print(dates.min(), dates.max())

2011-01-01 05:15:59 2019-07-01 02:58:00


Additional dataset - Toronto weather

https://www.weatherstats.ca/faq/#download-columns-ne

Please explain the column labels in the download data for normals and extremes
Column labels for normals and extremes:

Suffix	Meaning
_v	Calculated value (max, min or mean)

_s	Standard deviation of mean

_c	Count of (number of) values included

_d	Date range for values

_y	Years where extreme occurred (limited to first 40)

For monthly normal and extremes, the dates are always listed as the first day of the month. However, the data is for the first until the last day of the monthly (or until the current day for the ongoing month).

In [16]:
# source: https://toronto.weatherstats.ca/download.html
weather_to = pd.read_csv('weatherstats_toronto_normal_daily.csv')

In [17]:
weather_to.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30695 entries, 0 to 30694
Data columns (total 53 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   date                     30695 non-null  object 
 1   max_dew_point_v          24796 non-null  float64
 2   max_dew_point_s          24796 non-null  float64
 3   max_dew_point_c          24796 non-null  float64
 4   max_dew_point_d          24796 non-null  object 
 5   max_relative_humidity_v  24796 non-null  float64
 6   max_relative_humidity_s  24796 non-null  float64
 7   max_relative_humidity_c  24796 non-null  float64
 8   max_relative_humidity_d  24796 non-null  object 
 9   max_temperature_v        30336 non-null  float64
 10  max_temperature_s        30336 non-null  float64
 11  max_temperature_c        30336 non-null  float64
 12  max_temperature_d        30336 non-null  object 
 13  max_wind_speed_v         24796 non-null  float64
 14  max_wind_speed_s      

In [18]:
weather_dates = pd.to_datetime(weather_to['date'])

In [19]:
print(weather_dates.min(), weather_dates.max())

1937-10-31 00:00:00 2021-11-13 00:00:00
