### Data
1. Fire Incident Dispatch Data: Fire Incident Dispatch Data | NYC Open Data (cityofnewyork.us) https://data.cityofnewyork.us/Public-Safety/Fire-Incident-Dispatch-Data/8m42-w767/about_data

2. Incidents Responded to by Fire Companies:Incidents Responded to by Fire Companies | NYC Open Data (cityofnewyork.us) https://data.cityofnewyork.us/Public-Safety/Incidents-Responded-to-by-Fire-Companies/tm6d-hbzd/about_data

3. In-Service Alarm Box Locations:In-Service Alarm Box Locations | NYC Open Data (cityofnewyork.us) https://data.cityofnewyork.us/Public-Safety/In-Service-Alarm-Box-Locations/v57i-gtxb/data_preview

In [1]:
import pandas as pd
import requests
from sklearn.datasets import make_classification
from sklearn.datasets import make_blobs
from matplotlib.pylab import plt
import numpy as np
%matplotlib inline
import warnings
import geopandas as gpd
from urllib.request import urlopen 
import json
import urllib
warnings.filterwarnings('ignore')
from zipfile import ZipFile

#### Incidents Responded to by Fire Companies

In [2]:
# Read Data 
IncidentsRespTo = pd.read_csv('Incidents_Responded_to_by_Fire_Companies.csv')

In [3]:
IncidentsRespTo.head()

Unnamed: 0,IM_INCIDENT_KEY,FIRE_BOX,INCIDENT_TYPE_DESC,INCIDENT_DATE_TIME,ARRIVAL_DATE_TIME,UNITS_ONSCENE,LAST_UNIT_CLEARED_DATE_TIME,HIGHEST_LEVEL_DESC,TOTAL_INCIDENT_DURATION,ACTION_TAKEN1_DESC,...,ZIP_CODE,BOROUGH_DESC,FLOOR,CO_DETECTOR_PRESENT_DESC,FIRE_ORIGIN_BELOW_GRADE_FLAG,STORY_FIRE_ORIGIN_COUNT,FIRE_SPREAD_DESC,DETECTOR_PRESENCE_DESC,AES_PRESENCE_DESC,STANDPIPE_SYS_PRESENT_FLAG
0,55672688,2147,"300 - Rescue, EMS incident, other",01/01/2013 12:00:20 AM,01/01/2013 12:14:23 AM,1.0,01/01/2013 12:20:06 AM,"1 - More than initial alarm, less than Signal 7-5",1186.0,"00 - Action taken, other",...,10454.0,2 - Bronx,,,,,,,,
1,55672692,818,735A - Unwarranted alarm/defective condition o...,01/01/2013 12:00:37 AM,01/01/2013 12:09:03 AM,3.0,01/01/2013 12:30:06 AM,"1 - More than initial alarm, less than Signal 7-5",1769.0,86 - Investigate,...,10036.0,1 - Manhattan,,,,,,,,
2,55672693,9656,"300 - Rescue, EMS incident, other",01/01/2013 12:01:17 AM,01/01/2013 12:04:55 AM,1.0,01/01/2013 12:15:18 AM,"1 - More than initial alarm, less than Signal 7-5",841.0,"00 - Action taken, other",...,11418.0,5 - Queens,,,,,,,,
3,55672695,7412,412 - Gas leak (natural gas or LPG),01/01/2013 12:02:32 AM,01/01/2013 12:07:48 AM,4.0,01/01/2013 12:40:11 AM,"1 - More than initial alarm, less than Signal 7-5",2259.0,44 - Hazardous materials leak control & contai...,...,11103.0,5 - Queens,1.0,,,,,,,
4,55672697,4019,735A - Unwarranted alarm/defective condition o...,01/01/2013 12:01:49 AM,01/01/2013 12:06:27 AM,6.0,01/01/2013 12:24:56 AM,"1 - More than initial alarm, less than Signal 7-5",1387.0,86 - Investigate,...,11385.0,5 - Queens,,,,,,,,


In [5]:
# Check data Shape
IncidentsRespTo.shape

(5093942, 24)

In [6]:
# Check Data atrributes 
IncidentsRespTo.columns

Index(['IM_INCIDENT_KEY', 'FIRE_BOX', 'INCIDENT_TYPE_DESC',
       'INCIDENT_DATE_TIME', 'ARRIVAL_DATE_TIME', 'UNITS_ONSCENE',
       'LAST_UNIT_CLEARED_DATE_TIME', 'HIGHEST_LEVEL_DESC',
       'TOTAL_INCIDENT_DURATION', 'ACTION_TAKEN1_DESC', 'ACTION_TAKEN2_DESC',
       'ACTION_TAKEN3_DESC', 'PROPERTY_USE_DESC', 'STREET_HIGHWAY', 'ZIP_CODE',
       'BOROUGH_DESC', 'FLOOR', 'CO_DETECTOR_PRESENT_DESC',
       'FIRE_ORIGIN_BELOW_GRADE_FLAG', 'STORY_FIRE_ORIGIN_COUNT',
       'FIRE_SPREAD_DESC', 'DETECTOR_PRESENCE_DESC', 'AES_PRESENCE_DESC',
       'STANDPIPE_SYS_PRESENT_FLAG'],
      dtype='object')

In [7]:
# Checking the percentage of missing data of each column
missing_data_percentage=round(IncidentsRespTo.isnull().sum()/len(IncidentsRespTo)*100).sort_values(ascending=False)
missing_data_percentage

STANDPIPE_SYS_PRESENT_FLAG      100.0
AES_PRESENCE_DESC               100.0
DETECTOR_PRESENCE_DESC          100.0
FIRE_SPREAD_DESC                100.0
STORY_FIRE_ORIGIN_COUNT         100.0
FIRE_ORIGIN_BELOW_GRADE_FLAG    100.0
CO_DETECTOR_PRESENT_DESC         99.0
ACTION_TAKEN3_DESC               91.0
FLOOR                            79.0
ACTION_TAKEN2_DESC               78.0
ARRIVAL_DATE_TIME                 3.0
UNITS_ONSCENE                     3.0
STREET_HIGHWAY                    1.0
BOROUGH_DESC                      0.0
IM_INCIDENT_KEY                   0.0
ZIP_CODE                          0.0
FIRE_BOX                          0.0
ACTION_TAKEN1_DESC                0.0
TOTAL_INCIDENT_DURATION           0.0
HIGHEST_LEVEL_DESC                0.0
LAST_UNIT_CLEARED_DATE_TIME       0.0
INCIDENT_DATE_TIME                0.0
INCIDENT_TYPE_DESC                0.0
PROPERTY_USE_DESC                 0.0
dtype: float64

Fields with a missing data percentage over 90% are assumed to be not useful.

In [8]:
# Drop columns 
drop_cols = ['STANDPIPE_SYS_PRESENT_FLAG','AES_PRESENCE_DESC','DETECTOR_PRESENCE_DESC',
           'FIRE_SPREAD_DESC','STORY_FIRE_ORIGIN_COUNT','FIRE_ORIGIN_BELOW_GRADE_FLAG',
           'CO_DETECTOR_PRESENT_DESC','ACTION_TAKEN3_DESC',]
IncidentsRespTo.drop(drop_cols, axis=1,inplace=True)

The case: records where an on-scene unit arrived but no arrival data. 
Missing arrival data, the integrity of any analysis regarding response times becomes questionable.

In [9]:
# Drop the case: the onscene unit arrived but no arrival date recorded
IncidentsRespTo.drop(IncidentsRespTo[ (IncidentsRespTo.ARRIVAL_DATE_TIME.isnull()) & 
                    (IncidentsRespTo.UNITS_ONSCENE.notnull())].index, inplace=True )

In [10]:
# Drop dupicates
IncidentsRespTo.drop_duplicates(inplace=True)

In [12]:
# Check the Total Incident Duration in minutes 
(IncidentsRespTo['TOTAL_INCIDENT_DURATION']/60).describe().round(2)

count    5084430.00
mean          24.36
std          531.42
min            0.00
25%           12.47
50%           17.52
75%           26.08
max      1183522.73
Name: TOTAL_INCIDENT_DURATION, dtype: float64

Check records with unusually long total incident durations. If the extreme data values do not make sense, drop them.

In [13]:
IncidentsRespTo.sort_values(by = 'TOTAL_INCIDENT_DURATION',ascending=False ).head(5)

Unnamed: 0,IM_INCIDENT_KEY,FIRE_BOX,INCIDENT_TYPE_DESC,INCIDENT_DATE_TIME,ARRIVAL_DATE_TIME,UNITS_ONSCENE,LAST_UNIT_CLEARED_DATE_TIME,HIGHEST_LEVEL_DESC,TOTAL_INCIDENT_DURATION,ACTION_TAKEN1_DESC,ACTION_TAKEN2_DESC,PROPERTY_USE_DESC,STREET_HIGHWAY,ZIP_CODE,BOROUGH_DESC,FLOOR
334258,56525066,1489,"300 - Rescue, EMS incident, other",10/27/2013 07:23:53 PM,10/27/2013 07:28:01 PM,1.0,01/27/2016 03:46:37 PM,"1 - More than initial alarm, less than Signal 7-5",71011364.0,"00 - Action taken, other",,UUU - Undetermined,,10305.0,3 - Staten Island,
1761014,60649294,6512,"300 - Rescue, EMS incident, other",11/21/2016 10:36:57 AM,11/21/2016 10:39:49 AM,1.0,01/31/2017 08:29:58 PM,"1 - More than initial alarm, less than Signal 7-5",6169981.0,"00 - Action taken, other",,UUU - Undetermined,DEFAULT RECORD FOR SF,11432.0,5 - Queens,
1759672,60644546,8671,"300 - Rescue, EMS incident, other",11/20/2016 08:11:36 AM,11/20/2016 08:15:21 AM,1.0,01/30/2017 12:54:05 PM,"1 - More than initial alarm, less than Signal 7-5",6151349.0,"00 - Action taken, other",,UUU - Undetermined,DEFAULT RECORD FOR SF,11436.0,5 - Queens,
3506175,67122302,1624,"400 - Hazardous condition, other",08/04/2020 12:45:52 PM,08/04/2020 12:49:28 PM,1.0,08/24/2020 09:33:36 AM,11 - First Alarm,1716464.0,65 - Secure property,,"960 - Street, other",EASTERN PKWY,11233.0,4 - Brooklyn,
3506235,67122637,761,"445 - Arcing, shorted electrical equipment",08/04/2020 12:59:24 PM,08/04/2020 01:06:50 PM,1.0,08/24/2020 09:33:24 AM,11 - First Alarm,1715640.0,82 - Notify other agencies.,,"960 - Street, other",HARMAN ST,11237.0,4 - Brooklyn,


The first three records have total incident duration times that are extremely different from the fourth and fifth last records, where the numbers become very large. They are unusual and should be dropped.

In [14]:
# drop the top 3 unusual record 
IncidentsRespTo.drop(334258, inplace=True)
IncidentsRespTo.drop(1761014, inplace=True)
IncidentsRespTo.drop(1759672, inplace=True)

In [15]:
IncidentsRespTo.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5085953 entries, 0 to 5093941
Data columns (total 16 columns):
 #   Column                       Dtype  
---  ------                       -----  
 0   IM_INCIDENT_KEY              int64  
 1   FIRE_BOX                     object 
 2   INCIDENT_TYPE_DESC           object 
 3   INCIDENT_DATE_TIME           object 
 4   ARRIVAL_DATE_TIME            object 
 5   UNITS_ONSCENE                float64
 6   LAST_UNIT_CLEARED_DATE_TIME  object 
 7   HIGHEST_LEVEL_DESC           object 
 8   TOTAL_INCIDENT_DURATION      float64
 9   ACTION_TAKEN1_DESC           object 
 10  ACTION_TAKEN2_DESC           object 
 11  PROPERTY_USE_DESC            object 
 12  STREET_HIGHWAY               object 
 13  ZIP_CODE                     object 
 14  BOROUGH_DESC                 object 
 15  FLOOR                        object 
dtypes: float64(2), int64(1), object(13)
memory usage: 659.6+ MB


In [16]:
# Check Zip Code
IncidentsRespTo['ZIP_CODE'].unique()

array([10454.0, 10036.0, 11418.0, 11103.0, 11385.0, 11215.0, 10001.0,
       10472.0, 11219.0, 10451.0, 10016.0, 10452.0, 10458.0, 10038.0,
       10462.0, 11231.0, 10469.0, 10314.0, 99999.0, 11211.0, 11234.0,
       10304.0, 10459.0, 11214.0, 11221.0, 11201.0, 11224.0, 10307.0,
       10040.0, 10465.0, 11226.0, 10018.0, 10011.0, 11222.0, 11354.0,
       10032.0, 10460.0, 10027.0, 10035.0, 11435.0, 10456.0, 10014.0,
       11694.0, 10303.0, 11213.0, 10003.0, 11691.0, 10024.0, 11420.0,
       11104.0, 10025.0, 11206.0, 11212.0, 10023.0, 10467.0, 10474.0,
       10022.0, 11236.0, 11369.0, 10308.0, 10301.0, 11372.0, 10031.0,
       10019.0, 11374.0, 10017.0, 10002.0, 11209.0, 11207.0, 10457.0,
       11203.0, 10455.0, 10021.0, 10009.0, 11238.0, 10013.0, 11414.0,
       11421.0, 10475.0, 10005.0, 10309.0, 11208.0, 10312.0, 11356.0,
       11429.0, 11375.0, 11218.0, 10461.0, 11237.0, 11106.0, 11233.0,
       11378.0, 11432.0, 11373.0, 10463.0, 11419.0, 11102.0, 10010.0,
       10030.0, 1000

In [23]:
# Check record with unusual Zip code value
IncidentsRespTo[IncidentsRespTo['ZIP_CODE'] == '11209-0000']

Unnamed: 0,IM_INCIDENT_KEY,FIRE_BOX,INCIDENT_TYPE_DESC,INCIDENT_DATE_TIME,ARRIVAL_DATE_TIME,UNITS_ONSCENE,LAST_UNIT_CLEARED_DATE_TIME,HIGHEST_LEVEL_DESC,TOTAL_INCIDENT_DURATION,ACTION_TAKEN1_DESC,ACTION_TAKEN2_DESC,PROPERTY_USE_DESC,STREET_HIGHWAY,ZIP_CODE,BOROUGH_DESC,FLOOR


In [19]:
# Check record with unusual Zip code value
IncidentsRespTo[IncidentsRespTo['ZIP_CODE'] == '11206-7216']

Unnamed: 0,IM_INCIDENT_KEY,FIRE_BOX,INCIDENT_TYPE_DESC,INCIDENT_DATE_TIME,ARRIVAL_DATE_TIME,UNITS_ONSCENE,LAST_UNIT_CLEARED_DATE_TIME,HIGHEST_LEVEL_DESC,TOTAL_INCIDENT_DURATION,ACTION_TAKEN1_DESC,ACTION_TAKEN2_DESC,PROPERTY_USE_DESC,STREET_HIGHWAY,ZIP_CODE,BOROUGH_DESC,FLOOR
5010194,72109589,783,412 - Gas leak (natural gas or LPG),09/30/2023 10:32:14 AM,09/30/2023 10:35:20 AM,3.0,09/30/2023 10:42:43 AM,11 - First Alarm,629.0,51 - Ventilate,64 - Shut down system,429 - Multifamily dwelling,LEWIS AVE,11206-7216,4 - Brooklyn,SECOND


In [21]:
# Fix the unusual data as above
IncidentsRespTo.at[227295,'ZIP_CODE']='11209'
IncidentsRespTo.at[5010194,'ZIP_CODE']='11206'

In [22]:
# Drop null data in ZIP_CODE
IncidentsRespTo.dropna(subset=['ZIP_CODE'], inplace=True)
# Convert ZIP_CODE  to string
IncidentsRespTo['ZIP_CODE']=IncidentsRespTo['ZIP_CODE'].astype(int).astype(str)

In [24]:
# Check Zip Code after converting
IncidentsRespTo['ZIP_CODE'].unique()

array(['10454', '10036', '11418', '11103', '11385', '11215', '10001',
       '10472', '11219', '10451', '10016', '10452', '10458', '10038',
       '10462', '11231', '10469', '10314', '99999', '11211', '11234',
       '10304', '10459', '11214', '11221', '11201', '11224', '10307',
       '10040', '10465', '11226', '10018', '10011', '11222', '11354',
       '10032', '10460', '10027', '10035', '11435', '10456', '10014',
       '11694', '10303', '11213', '10003', '11691', '10024', '11420',
       '11104', '10025', '11206', '11212', '10023', '10467', '10474',
       '10022', '11236', '11369', '10308', '10301', '11372', '10031',
       '10019', '11374', '10017', '10002', '11209', '11207', '10457',
       '11203', '10455', '10021', '10009', '11238', '10013', '11414',
       '11421', '10475', '10005', '10309', '11208', '10312', '11356',
       '11429', '11375', '11218', '10461', '11237', '11106', '11233',
       '11378', '11432', '11373', '10463', '11419', '11102', '10010',
       '10030', '100

In [25]:
# Convert Data-time fields to correct data type
IncidentsRespTo['INCIDENT_DATE_TIME'] = pd.to_datetime(IncidentsRespTo['INCIDENT_DATE_TIME'])
IncidentsRespTo['ARRIVAL_DATE_TIME'] = pd.to_datetime(IncidentsRespTo['ARRIVAL_DATE_TIME'])
IncidentsRespTo['LAST_UNIT_CLEARED_DATE_TIME'] = pd.to_datetime(IncidentsRespTo['LAST_UNIT_CLEARED_DATE_TIME'])

In [43]:
# Save the cleaned data
IncidentsRespTo.to_csv("Incidents_Responded_to_by_Fire_Companies_Cleaned.csv",index=False)

#### Fire Incident Dispatch Data

In [26]:
IncidentsDispatch = pd.read_csv('Fire_Incident_Dispatch_Data.csv')

In [27]:
IncidentsDispatch.head()

Unnamed: 0,STARFIRE_INCIDENT_ID,INCIDENT_DATETIME,ALARM_BOX_BOROUGH,ALARM_BOX_NUMBER,ALARM_BOX_LOCATION,INCIDENT_BOROUGH,ZIPCODE,POLICEPRECINCT,CITYCOUNCILDISTRICT,COMMUNITYDISTRICT,...,FIRST_ACTIVATION_DATETIME,FIRST_ON_SCENE_DATETIME,INCIDENT_CLOSE_DATETIME,VALID_DISPATCH_RSPNS_TIME_INDC,VALID_INCIDENT_RSPNS_TIME_INDC,INCIDENT_RESPONSE_SECONDS_QY,INCIDENT_TRAVEL_TM_SECONDS_QY,ENGINES_ASSIGNED_QUANTITY,LADDERS_ASSIGNED_QUANTITY,OTHER_UNITS_ASSIGNED_QUANTITY
0,2100404460110002,01/04/2021 12:01:00 AM,MANHATTAN,446,3 AVE & ST. MARKS PL,MANHATTAN,10003.0,9.0,2.0,103.0,...,01/04/2021 12:02:00 AM,,01/04/2021 12:07:00 AM,N,N,0.0,0.0,1.0,0.0,0.0
1,2100433250140001,01/04/2021 12:01:00 AM,BROOKLYN,3325,AVENUE O & E 13 ST,BROOKLYN,11230.0,70.0,48.0,314.0,...,01/04/2021 12:02:00 AM,01/04/2021 12:04:00 AM,01/04/2021 12:32:00 AM,N,Y,170.0,165.0,1.0,0.0,0.0
2,2100411280150003,01/04/2021 12:01:00 AM,QUEENS,1128,MOTT AVE & DICKENS ST,QUEENS,11691.0,101.0,31.0,414.0,...,01/04/2021 12:02:00 AM,,01/04/2021 12:05:00 AM,N,N,0.0,0.0,1.0,0.0,0.0
3,2100416590110004,01/04/2021 12:02:00 AM,MANHATTAN,1659,BROADWAY & 153 ST,MANHATTAN,10031.0,30.0,7.0,109.0,...,01/04/2021 12:02:00 AM,01/04/2021 12:07:00 AM,01/04/2021 12:31:00 AM,N,Y,318.0,314.0,1.0,0.0,0.0
4,2100413490110006,01/04/2021 12:02:00 AM,MANHATTAN,1349,5 AVE & 112 ST,MANHATTAN,10026.0,28.0,9.0,110.0,...,01/04/2021 12:03:00 AM,01/04/2021 12:17:00 AM,01/04/2021 12:18:00 AM,N,Y,871.0,834.0,1.0,0.0,0.0


In [28]:
IncidentsDispatch.shape

(10275092, 29)

In [29]:
# Checking the percentage of missing data of each column
missing_data_percentage=round(IncidentsDispatch.isnull().sum()/len(IncidentsDispatch)*100).sort_values(ascending=False)
missing_data_percentage

FIRST_ON_SCENE_DATETIME           16.0
CITYCOUNCILDISTRICT                6.0
COMMUNITYDISTRICT                  6.0
CONGRESSIONALDISTRICT              6.0
ZIPCODE                            6.0
POLICEPRECINCT                     6.0
COMMUNITYSCHOOLDISTRICT            6.0
INCIDENT_TRAVEL_TM_SECONDS_QY      4.0
INCIDENT_RESPONSE_SECONDS_QY       4.0
FIRST_ACTIVATION_DATETIME          1.0
FIRST_ASSIGNMENT_DATETIME          0.0
ENGINES_ASSIGNED_QUANTITY          0.0
VALID_INCIDENT_RSPNS_TIME_INDC     0.0
VALID_DISPATCH_RSPNS_TIME_INDC     0.0
INCIDENT_CLOSE_DATETIME            0.0
LADDERS_ASSIGNED_QUANTITY          0.0
STARFIRE_INCIDENT_ID               0.0
HIGHEST_ALARM_LEVEL                0.0
DISPATCH_RESPONSE_SECONDS_QY       0.0
INCIDENT_CLASSIFICATION_GROUP      0.0
INCIDENT_CLASSIFICATION            0.0
INCIDENT_DATETIME                  0.0
ALARM_LEVEL_INDEX_DESCRIPTION      0.0
ALARM_SOURCE_DESCRIPTION_TX        0.0
INCIDENT_BOROUGH                   0.0
ALARM_BOX_LOCATION       

None of the columns shown have missing data over 90%. Keep all columns

VALID_DISPATCH_RSPNS_TIME_INDC: Indicates that the components comprising the generation of the DISPATCH_RESPONSE_SECONDS_QY are valid.

DISPATCH_RESPONSE_SECONDS_QY: The elapsed time in seconds between the incident_datetime and the first_assignment_datetime.

VALID_INCIDENT_RSPNS_TIME_INDC: Indicates that the components comprising the generation of the INCIDENT_RESPONSE_SECONDS_QY are valid.

INCIDENT_RESPONSE_SECONDS_QY: The elapsed time in seconds between the incident_datetime and the first_onscene_datetime.

Filter the record with the DISPATCH_RESPONSE_SECONDS_QY invalid

In [30]:
IncidentsDispatch['VALID_INCIDENT_RSPNS_TIME_INDC'].unique()

array(['N', 'Y'], dtype=object)

In [31]:
len(IncidentsDispatch[IncidentsDispatch['VALID_INCIDENT_RSPNS_TIME_INDC']=='N'])/len(IncidentsDispatch)*100

19.57654491074143

There are 19.58% with invalid incident response time INCIDENT_RESPONSE_SECONDS_QY. Just because the response time is invalid doesn't necessarily mean that the entire incident record is invalid. Keep them.

Considerting: whether it's possible to have a valid incident with invalid FIRST_ON_SCENE_DATETIME data? An incident can be valid in terms of its occurrence and the response it generated, but there might be issues with how the time data was recorded or transmitted, resulting in an invalid time entry.

In [32]:
IncidentsDispatch['VALID_DISPATCH_RSPNS_TIME_INDC'].unique()

array(['N'], dtype=object)

All records with an invalid dispatch response time render the VALID_INCIDENT_RSPNS_TIME_INDC field meaningless in this dataframe; therefore, no action will be taken based on this particular finding.

In [33]:
IncidentsDispatch['DISPATCH_RESPONSE_SECONDS_QY'].describe().round(2)

count    10274156.00
mean           38.67
std           296.54
min             0.00
25%             8.00
50%            16.00
75%            44.00
max        405447.00
Name: DISPATCH_RESPONSE_SECONDS_QY, dtype: float64

In [34]:
# The maximum DISPATCH_RESPONSE last for about 4.7 days, which is not reasonable
405447/60/60/24

4.692673611111111

In [35]:
# Check the proportion of incidents with a dispatch response time exceeding five minutes.
len(IncidentsDispatch[IncidentsDispatch.DISPATCH_RESPONSE_SECONDS_QY>60*5])/len(IncidentsDispatch)*100

1.2476481962399948

Keep incidents with a dispatch response time within 5 minutes

https://www.iaff.org/wp-content/uploads/Departments/Fire_EMS_Department/30541_Summary_Sheet_NFPA_1710_standard.pdf

In [36]:
IncidentsDispatch.drop(IncidentsDispatch[
    IncidentsDispatch.DISPATCH_RESPONSE_SECONDS_QY>60*5].index, inplace=True)

In [37]:
# Check ZIPCODE
IncidentsDispatch['ZIPCODE'].unique()

array([10003., 11230., 11691., 10031., 10026., 11692., 11373., 10452.,
       10304., 11207., 10473., 10002., 10468., 10451., 11356., 11206.,
       11367., 10009., 10038., 11212., 10303., 10035.,    nan, 11236.,
       11433., 11223., 10032., 10459., 11220., 10457., 10467., 10029.,
       11103., 11233., 11226., 10453., 10458., 11203., 10036., 10456.,
       10460., 11219., 10001., 10128., 11434., 10305., 10469., 10025.,
       10454., 11204., 11208., 11427., 11221., 11101., 11239., 11423.,
       11222., 10037., 11209., 10475., 11234., 10028., 11210., 10040.,
       11224., 11216., 11232., 11412., 10470., 11377., 11229., 10030.,
       10462., 10314., 10027., 11218., 11217., 11225., 10463., 11106.,
       11432., 11235., 10018., 10039., 10301., 10472., 11201., 11378.,
       10024., 11214., 11417., 10309., 10021., 11228., 11211., 10306.,
       11102., 11213., 10023., 10016., 11231., 11249., 10471., 11418.,
       10020., 10010., 10312., 10034., 11354., 11416., 11422., 10465.,
      

In [38]:
# Drop dupicates
IncidentsDispatch.drop_duplicates(inplace=True)

In [39]:
IncidentsDispatch.shape

(10146894, 29)

In [40]:
IncidentsDispatch.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10146894 entries, 0 to 10275091
Data columns (total 29 columns):
 #   Column                          Dtype  
---  ------                          -----  
 0   STARFIRE_INCIDENT_ID            object 
 1   INCIDENT_DATETIME               object 
 2   ALARM_BOX_BOROUGH               object 
 3   ALARM_BOX_NUMBER                int64  
 4   ALARM_BOX_LOCATION              object 
 5   INCIDENT_BOROUGH                object 
 6   ZIPCODE                         float64
 7   POLICEPRECINCT                  float64
 8   CITYCOUNCILDISTRICT             float64
 9   COMMUNITYDISTRICT               float64
 10  COMMUNITYSCHOOLDISTRICT         float64
 11  CONGRESSIONALDISTRICT           float64
 12  ALARM_SOURCE_DESCRIPTION_TX     object 
 13  ALARM_LEVEL_INDEX_DESCRIPTION   object 
 14  HIGHEST_ALARM_LEVEL             object 
 15  INCIDENT_CLASSIFICATION         object 
 16  INCIDENT_CLASSIFICATION_GROUP   object 
 17  DISPATCH_RESPONSE_SECONDS_QY  

In [42]:
# Convert Data-time fields to correct data type
IncidentsDispatch['INCIDENT_DATETIME'] = pd.to_datetime(IncidentsDispatch['INCIDENT_DATETIME'])
IncidentsDispatch['FIRST_ASSIGNMENT_DATETIME'] = pd.to_datetime(IncidentsDispatch['FIRST_ASSIGNMENT_DATETIME'])
IncidentsDispatch['FIRST_ACTIVATION_DATETIME'] = pd.to_datetime(IncidentsDispatch['FIRST_ACTIVATION_DATETIME'])
IncidentsDispatch['FIRST_ON_SCENE_DATETIME'] = pd.to_datetime(IncidentsDispatch['FIRST_ON_SCENE_DATETIME'])
IncidentsDispatch['INCIDENT_CLOSE_DATETIME'] = pd.to_datetime(IncidentsDispatch['INCIDENT_CLOSE_DATETIME'])

In [44]:
# Save the cleaned data
IncidentsDispatch.to_csv("Fire_Incident_Dispatch_Data_Cleaned.csv",index=False)