# Jupyter Notebook Purpose
- Run all code cells and you will have the data processed into 1 .csv or 1 .csv.gz file
    - the geopandas part can be skipped if you are unable to `pip install` or `conda install`

# Table of Contents

## 1. [Python Dependencies](#1)
### a. [Geopandas Issue](#1a)
___
## 2. [Folder Creation](#2)
- repeated just in case
___
## 3. [Toronto Fire Services Station Locations](#3)
### a. [From Shapefile to .csv](#3a)
___
## 4. [Toronto Fire Incidents merged with Toronto Weather](#4)
### a. [Merge DataFrames](#4a)
___
## 5. [Haversine Formula](#5)
### a. [Python Formula](#5a)
### b. [Closest Toronto Fire Services Station](#5b)
### c. [Closest Toronto Fire Hydrant](#5c)
- ABANDONED DUE TO LACK OF COMPUTER RESOURCES
___
## 6. [Manipulations of Merged Dataset](#6)
### a. [Create Minutes to Arrival and Minutes to Leave Columns](#6a)
### b. [Rename and Drop Columns and Change Datatypes](#6b)

# 1
# Python Dependencies

In [1]:
# Python Modules for Miscellaneous reasons
from zipfile import ZipFile  # to read and write to zipped folders
import requests  # simple HTTP library for Python
import os        # portable way to use operating system functionalities
import io        # Tool for working with streams (Input/Ouput data)
import datetime  # python classes for manipulating dates and times
import dateutil  # powerful extensions to standard datetime Python module
import time      # used for time.sleep() to delay the HTTP requests ever so slightly
import re        # used for Python regex library
import math      # radians, cos, sin, asin and sqrt are used for haversine formula
from IPython.display import display # use this to see the entire DataFrame in the right format

In [2]:
# DATA ANALYSIS Python Dependencies
import pandas as pd
import numpy as np

# 1a
# Geopandas Issue
## Geopandas has conflicts with a matplotlib library dependency
- Will need to create another Conda Virtual Environment to use it
    - Well you do not, as I have included the .csv file so you do not need to do the following steps to generate it
    - I have commented out all code cells that would cause issues
    - If you wish I have left instructions on how to do so

## Create a geopandas Conda Virtual Environment

### 1. Create another Conda Virtual Environment - [link](https://geopandas.org/install.html)

#### A. ```conda create -n geo_env```
    - press 'y' and ENTER
#### B. ```conda activate geo_env```
    - MAKE SURE YOU ARE IN THE ```geo_env``` for all the following steps
#### C. ```conda config --env --add channels conda-forge```
#### D. ```conda config --env --set channel_priority strict```
#### E. ```conda install python=3 geopandas```

### 2. Download Jupyter Lab / Notebook - [link](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)

#### A. ```conda install -c conda-forge jupyterlab```

### 3. Open up this notebook in Jupyter Notebook / Lab
- ```jupyter lab```
- ```jupyter notebook```

In [3]:
# import geopandas as gpd

# 2
# Creation of Folders
- these folders will be used to store `RAW_ZIPPED/`, `RAW_UNZIPPED/`, `PROCESSED_ZIPPED/`, `PROCESSED_UNZIPPED/` data files
- it is assumed that you do not have them so I will create them for you if you do not have them

In [3]:
# Here are the major directory names
CWD_PATH = os.getcwd() # current working directory
RAW_ZIPPED_DIRECTORY = os.path.join(CWD_PATH, "RAW_ZIPPED")
RAW_UNZIPPED_DIRECTORY = os.path.join(CWD_PATH, "RAW_UNZIPPED")
PROCESSED_ZIPPED_DIRECTORY = os.path.join(CWD_PATH, "PROCESSED_ZIPPED")
PROCESSED_UNZIPPED_DIRECTORY = os.path.join(CWD_PATH, "PROCESSED_UNZIPPED")

In [45]:
# Lets check to see if all 4 folders exist
# If they do not then create them
print(f"{RAW_ZIPPED_DIRECTORY} exists") if os.path.isdir(RAW_ZIPPED_DIRECTORY) else (os.makedirs(RAW_ZIPPED_DIRECTORY), print(f"{RAW_ZIPPED_DIRECTORY} created"));
print(f"{RAW_UNZIPPED_DIRECTORY} exists") if os.path.isdir(RAW_UNZIPPED_DIRECTORY) else (os.makedirs(RAW_UNZIPPED_DIRECTORY), print(f"{RAW_UNZIPPED_DIRECTORY} created"));
print(f"{PROCESSED_ZIPPED_DIRECTORY} exists") if os.path.isdir(PROCESSED_ZIPPED_DIRECTORY) else (os.makedirs(PROCESSED_ZIPPED_DIRECTORY), print(f"{PROCESSED_ZIPPED_DIRECTORY} created"));
print(f"{PROCESSED_UNZIPPED_DIRECTORY} exists") if os.path.isdir(PROCESSED_UNZIPPED_DIRECTORY) else (os.makedirs(PROCESSED_UNZIPPED_DIRECTORY), print(f"{PROCESSED_UNZIPPED_DIRECTORY} created"));

In [46]:
# Get the folders for fire_incidents
FIRE_INCIDENTS = "FIRE_INCIDENTS"
FIRE_RAW_ZIPPED_DIRECTORY = os.path.join(RAW_ZIPPED_DIRECTORY, FIRE_INCIDENTS)
FIRE_RAW_UNZIPPED_DIRECTORY = os.path.join(RAW_UNZIPPED_DIRECTORY, FIRE_INCIDENTS)
FIRE_PROCESSED_ZIPPED_DIRECTORY = os.path.join(PROCESSED_ZIPPED_DIRECTORY, FIRE_INCIDENTS)
FIRE_PROCESSED_UNZIPPED_DIRECTORY = os.path.join(PROCESSED_UNZIPPED_DIRECTORY, FIRE_INCIDENTS)

# Create all the subfolders for the above folder
print(f"{FIRE_RAW_ZIPPED_DIRECTORY} exists") if os.path.isdir(FIRE_RAW_ZIPPED_DIRECTORY) else (os.makedirs(FIRE_RAW_ZIPPED_DIRECTORY), print(f"{FIRE_RAW_ZIPPED_DIRECTORY} created"));
print(f"{FIRE_RAW_UNZIPPED_DIRECTORY} exists") if os.path.isdir(FIRE_RAW_UNZIPPED_DIRECTORY) else (os.makedirs(FIRE_RAW_UNZIPPED_DIRECTORY), print(f"{FIRE_RAW_UNZIPPED_DIRECTORY} created"));
print(f"{FIRE_PROCESSED_ZIPPED_DIRECTORY} exists") if os.path.isdir(FIRE_PROCESSED_ZIPPED_DIRECTORY) else (os.makedirs(FIRE_PROCESSED_ZIPPED_DIRECTORY), print(f"{FIRE_PROCESSED_ZIPPED_DIRECTORY} created"));
print(f"{FIRE_PROCESSED_UNZIPPED_DIRECTORY} exists") if os.path.isdir(FIRE_PROCESSED_UNZIPPED_DIRECTORY) else (os.makedirs(FIRE_PROCESSED_UNZIPPED_DIRECTORY), print(f"{FIRE_PROCESSED_UNZIPPED_DIRECTORY} created"));

In [47]:
# Get the folders for fire_incidents
TORONTO_WEATHER = "TORONTO_WEATHER"
WEATHER_RAW_UNZIPPED_DIRECTORY = os.path.join(RAW_UNZIPPED_DIRECTORY, TORONTO_WEATHER)
WEATHER_PROCESSED_UNZIPPED_DIRECTORY = os.path.join(PROCESSED_UNZIPPED_DIRECTORY, TORONTO_WEATHER)

# Create all the subfolders for the above folder
print(f"{WEATHER_RAW_UNZIPPED_DIRECTORY} exists") if os.path.isdir(WEATHER_RAW_UNZIPPED_DIRECTORY) else (os.makedirs(WEATHER_RAW_UNZIPPED_DIRECTORY), print(f"{WEATHER_RAW_UNZIPPED_DIRECTORY} created"));
print(f"{WEATHER_PROCESSED_UNZIPPED_DIRECTORY} exists") if os.path.isdir(WEATHER_PROCESSED_UNZIPPED_DIRECTORY) else (os.makedirs(WEATHER_PROCESSED_UNZIPPED_DIRECTORY), print(f"{WEATHER_PROCESSED_UNZIPPED_DIRECTORY} created"));

In [48]:
# Get the folders for fire_incidents
FIRE_STATIONS = "FIRE_STATIONS"
STATIONS_RAW_ZIPPED_DIRECTORY = os.path.join(RAW_ZIPPED_DIRECTORY, FIRE_STATIONS)
STATIONS_RAW_UNZIPPED_DIRECTORY = os.path.join(RAW_UNZIPPED_DIRECTORY, FIRE_STATIONS)

# Create all the subfolders for the above folder
print(f"{STATIONS_RAW_ZIPPED_DIRECTORY} exists") if os.path.isdir(STATIONS_RAW_ZIPPED_DIRECTORY) else (os.makedirs(STATIONS_RAW_ZIPPED_DIRECTORY), print(f"{STATIONS_RAW_ZIPPED_DIRECTORY} created"));
print(f"{STATIONS_RAW_UNZIPPED_DIRECTORY} exists") if os.path.isdir(STATIONS_RAW_UNZIPPED_DIRECTORY) else (os.makedirs(STATIONS_RAW_UNZIPPED_DIRECTORY), print(f"{STATIONS_RAW_UNZIPPED_DIRECTORY} created"));

In [49]:
# Get the folders for fire_incidents
FIRE_HYDRANTS = "FIRE_HYDRANTS"
HYDRANTS_RAW_UNZIPPED_DIRECTORY = os.path.join(RAW_UNZIPPED_DIRECTORY, FIRE_HYDRANTS)

# Create all the subfolders for the above folder
print(f"{HYDRANTS_RAW_UNZIPPED_DIRECTORY} exists") if os.path.isdir(HYDRANTS_RAW_UNZIPPED_DIRECTORY) else (os.makedirs(HYDRANTS_RAW_UNZIPPED_DIRECTORY), print(f"{HYDRANTS_RAW_UNZIPPED_DIRECTORY} created"));

# 3
# Toronto Fire Services Station Locations

# 3a
# From Shapefile to .csv

In [10]:
# # get the shapefile path
# SHAPEFILE_PATH = os.path.join(STATIONS_RAW_UNZIPPED_DIRECTORY, "FIRE_FACILITY_WGS84.shp")

# # allow Geopandas to read it
# shapefile = gpd.read_file(SHAPEFILE_PATH)
# shapefile

Unnamed: 0,NAME,ADDRESS,X,Y,LATITUDE,LONGITUDE,WARD_NAME,MUN_NAME,OBJECTID,geometry
0,FIRE STATION 214,745 MEADOWVALE RD,331855.873,4850304.607,43.794219,-79.163605,Scarborough East (44),Scarborough,1567567.0,POINT (-79.16360 43.79422)
1,FIRE STATION 215,5318 LAWRENCE AVE E,333114.022,4848441.177,43.777401,-79.148069,Scarborough East (44),Scarborough,2250005.0,POINT (-79.14807 43.77740)
2,FIRE STATION 221,2575 EGLINTON AVE E,324515.001,4843677.541,43.734799,-79.255066,Scarborough Southwest (35),Scarborough,2048861.0,POINT (-79.25507 43.73480)
3,FIRE STATION 222,755 WARDEN AVE,322180.852,4842072.337,43.720408,-79.284094,Scarborough Southwest (35),Scarborough,2449586.0,POINT (-79.28409 43.72041)
4,FIRE STATION 223,116 DORSET RD,326275.035,4842479.439,43.723965,-79.233264,Scarborough Southwest (36),Scarborough,2172861.0,POINT (-79.23326 43.72397)
...,...,...,...,...,...,...,...,...,...,...
79,FIRE STATION 145,20 BEFFORT RD,308003.882,4843645.466,43.734764,-79.460035,York Centre (9),North York,3186765.0,POINT (-79.46004 43.73476)
80,FIRE STATION 146,2220 JANE ST,303996.599,4842402.430,43.723581,-79.509779,York West (7),North York,3072600.0,POINT (-79.50978 43.72358)
81,FIRE STATION 211,900 TAPSCOTT RD,325466.636,4853590.517,43.824001,-79.242873,Scarborough-Rouge River (42),Scarborough,1520443.0,POINT (-79.24287 43.82400)
82,FIRE STATION 212,8500 SHEPPARD AVE E,329838.107,4851487.882,43.804941,-79.188623,Scarborough-Rouge River (42),Scarborough,2303629.0,POINT (-79.18862 43.80494)


In [11]:
# # convert from geopandas to pandas dataframe
# df_locations = pd.DataFrame(shapefile)
# df_locations.index = df_locations["NAME"].str.extract('(\d+)')[0].astype('int32')
# df_locations.index.name = "INDEX"
# df_locations

Unnamed: 0_level_0,NAME,ADDRESS,X,Y,LATITUDE,LONGITUDE,WARD_NAME,MUN_NAME,OBJECTID,geometry
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
214,FIRE STATION 214,745 MEADOWVALE RD,331855.873,4850304.607,43.794219,-79.163605,Scarborough East (44),Scarborough,1567567.0,POINT (-79.16360 43.79422)
215,FIRE STATION 215,5318 LAWRENCE AVE E,333114.022,4848441.177,43.777401,-79.148069,Scarborough East (44),Scarborough,2250005.0,POINT (-79.14807 43.77740)
221,FIRE STATION 221,2575 EGLINTON AVE E,324515.001,4843677.541,43.734799,-79.255066,Scarborough Southwest (35),Scarborough,2048861.0,POINT (-79.25507 43.73480)
222,FIRE STATION 222,755 WARDEN AVE,322180.852,4842072.337,43.720408,-79.284094,Scarborough Southwest (35),Scarborough,2449586.0,POINT (-79.28409 43.72041)
223,FIRE STATION 223,116 DORSET RD,326275.035,4842479.439,43.723965,-79.233264,Scarborough Southwest (36),Scarborough,2172861.0,POINT (-79.23326 43.72397)
...,...,...,...,...,...,...,...,...,...,...
145,FIRE STATION 145,20 BEFFORT RD,308003.882,4843645.466,43.734764,-79.460035,York Centre (9),North York,3186765.0,POINT (-79.46004 43.73476)
146,FIRE STATION 146,2220 JANE ST,303996.599,4842402.430,43.723581,-79.509779,York West (7),North York,3072600.0,POINT (-79.50978 43.72358)
211,FIRE STATION 211,900 TAPSCOTT RD,325466.636,4853590.517,43.824001,-79.242873,Scarborough-Rouge River (42),Scarborough,1520443.0,POINT (-79.24287 43.82400)
212,FIRE STATION 212,8500 SHEPPARD AVE E,329838.107,4851487.882,43.804941,-79.188623,Scarborough-Rouge River (42),Scarborough,2303629.0,POINT (-79.18862 43.80494)


In [12]:
# # get the CSV path
# SHAPEFILE_CSV_PATH = os.path.join(STATIONS_RAW_UNZIPPED_DIRECTORY,
#                                   "Toronto_Fire_Station_Locations.csv")

# # write the DataFrame to csv
# df_locations.to_csv(SHAPEFILE_CSV_PATH)
# df_locations

Unnamed: 0_level_0,NAME,ADDRESS,X,Y,LATITUDE,LONGITUDE,WARD_NAME,MUN_NAME,OBJECTID,geometry
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
214,FIRE STATION 214,745 MEADOWVALE RD,331855.873,4850304.607,43.794219,-79.163605,Scarborough East (44),Scarborough,1567567.0,POINT (-79.16360 43.79422)
215,FIRE STATION 215,5318 LAWRENCE AVE E,333114.022,4848441.177,43.777401,-79.148069,Scarborough East (44),Scarborough,2250005.0,POINT (-79.14807 43.77740)
221,FIRE STATION 221,2575 EGLINTON AVE E,324515.001,4843677.541,43.734799,-79.255066,Scarborough Southwest (35),Scarborough,2048861.0,POINT (-79.25507 43.73480)
222,FIRE STATION 222,755 WARDEN AVE,322180.852,4842072.337,43.720408,-79.284094,Scarborough Southwest (35),Scarborough,2449586.0,POINT (-79.28409 43.72041)
223,FIRE STATION 223,116 DORSET RD,326275.035,4842479.439,43.723965,-79.233264,Scarborough Southwest (36),Scarborough,2172861.0,POINT (-79.23326 43.72397)
...,...,...,...,...,...,...,...,...,...,...
145,FIRE STATION 145,20 BEFFORT RD,308003.882,4843645.466,43.734764,-79.460035,York Centre (9),North York,3186765.0,POINT (-79.46004 43.73476)
146,FIRE STATION 146,2220 JANE ST,303996.599,4842402.430,43.723581,-79.509779,York West (7),North York,3072600.0,POINT (-79.50978 43.72358)
211,FIRE STATION 211,900 TAPSCOTT RD,325466.636,4853590.517,43.824001,-79.242873,Scarborough-Rouge River (42),Scarborough,1520443.0,POINT (-79.24287 43.82400)
212,FIRE STATION 212,8500 SHEPPARD AVE E,329838.107,4851487.882,43.804941,-79.188623,Scarborough-Rouge River (42),Scarborough,2303629.0,POINT (-79.18862 43.80494)


## Read in the csv
- dont worry if you cannot do the geopandas part
- just use the "Toronto_Fire_Station_Locations.csv" I left for you

In [13]:
# read in the .csv, set the index and set the lat_long column as a tuple
df_locations = pd.read_csv(
    os.path.join(STATIONS_RAW_UNZIPPED_DIRECTORY, "Toronto_Fire_Station_Locations.csv"),
                    index_col="INDEX")
df_locations.head()

Unnamed: 0_level_0,NAME,ADDRESS,X,Y,LATITUDE,LONGITUDE,WARD_NAME,MUN_NAME,OBJECTID,geometry
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
214,FIRE STATION 214,745 MEADOWVALE RD,331855.873,4850304.607,43.794219,-79.163605,Scarborough East (44),Scarborough,1567567.0,POINT (-79.1636047829337 43.7942193852174)
215,FIRE STATION 215,5318 LAWRENCE AVE E,333114.022,4848441.177,43.777401,-79.148069,Scarborough East (44),Scarborough,2250005.0,POINT (-79.1480690620369 43.777400552994)
221,FIRE STATION 221,2575 EGLINTON AVE E,324515.001,4843677.541,43.734799,-79.255066,Scarborough Southwest (35),Scarborough,2048861.0,POINT (-79.25506590336811 43.7347987485643)
222,FIRE STATION 222,755 WARDEN AVE,322180.852,4842072.337,43.720408,-79.284094,Scarborough Southwest (35),Scarborough,2449586.0,POINT (-79.28409383366279 43.7204080292081)
223,FIRE STATION 223,116 DORSET RD,326275.035,4842479.439,43.723965,-79.233264,Scarborough Southwest (36),Scarborough,2172861.0,POINT (-79.2332642694298 43.7239653632118)


In [14]:
# data types are appropriate
df_locations.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 84 entries, 214 to 213
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   NAME       84 non-null     object 
 1   ADDRESS    84 non-null     object 
 2   X          84 non-null     float64
 3   Y          84 non-null     float64
 4   LATITUDE   84 non-null     float64
 5   LONGITUDE  84 non-null     float64
 6   WARD_NAME  84 non-null     object 
 7   MUN_NAME   84 non-null     object 
 8   OBJECTID   84 non-null     float64
 9   geometry   84 non-null     object 
dtypes: float64(5), object(5)
memory usage: 7.2+ KB


In [15]:
# all values are not null
df_locations.isnull().sum()

NAME         0
ADDRESS      0
X            0
Y            0
LATITUDE     0
LONGITUDE    0
WARD_NAME    0
MUN_NAME     0
OBJECTID     0
geometry     0
dtype: int64

In [16]:
# there are no duplicates either
df_locations["NAME"].duplicated().sum()

0

# 4
# Toronto Fire Incidents merged with Toronto Weather

# 4a
# Merge DataFrames

In [17]:
# read in the fire incident csv as a DataFrame
FIRE_PATH = os.path.join(FIRE_PROCESSED_UNZIPPED_DIRECTORY,
                         "2011-2018_Basic_Incident_Details.csv")
df_fire = pd.read_csv(FIRE_PATH, parse_dates=["Datetime"])

# read in the toronto weather csv as a DataFrame
WEATHER_PATH = os.path.join(WEATHER_PROCESSED_UNZIPPED_DIRECTORY,
                         "2010-2020_Toronto_Weather.csv")
df_weather = pd.read_csv(WEATHER_PATH, parse_dates=["Datetime"])

In [18]:
# merge fire inxidents and toronto weather data together
df_merge = df_fire.merge(df_weather, on="Datetime")
df_merge.columns

Index(['Incident Number', 'Initial CAD Event Type',
       'Initial CAD Event Call Type', 'Final Incident Type',
       'Event Alarm Level', 'Call Source', 'Incident Station Area',
       'Incident Ward', 'LATITUDE', 'Longitude', 'Intersection',
       'TFS Alarm Time', 'TFS Arrival Time', 'Last TFS Unit Clear Time',
       'Persons Rescued', 'Datetime', 'MAX_TEMP', 'MIN_TEMP', 'MEAN_TEMP',
       'HDD', 'CDD', 'RAIN_MM', 'PRECIP_MM', 'SNOW_CM'],
      dtype='object')

In [19]:
# number of rows and columns
df_merge.shape

(975175, 24)

In [20]:
# sum null values in the data but I believe they can be managed (less than <1%)
df_merge.isnull().sum()

Incident Number                    0
Initial CAD Event Type             0
Initial CAD Event Call Type        0
Final Incident Type               48
Event Alarm Level                  0
Call Source                       70
Incident Station Area             62
Incident Ward                      0
LATITUDE                          48
Longitude                         48
Intersection                      49
TFS Alarm Time                     0
TFS Arrival Time               23480
Last TFS Unit Clear Time           4
Persons Rescued                   68
Datetime                           0
MAX_TEMP                           0
MIN_TEMP                           0
MEAN_TEMP                          0
HDD                                0
CDD                                0
RAIN_MM                            0
PRECIP_MM                          0
SNOW_CM                            0
dtype: int64

In [21]:
# remove all values with null "LATITUDE" or "Longitude" through slicing
df_merge = df_merge.loc[df_merge["LATITUDE"].notnull() & df_merge["Longitude"].notnull()]
df_merge.shape

(975127, 24)

In [22]:
# 14 duplicates in Incident Number (7 rows that are replicated)
df_merge["Incident Number"].duplicated(keep=False).sum()

14

In [23]:
# Check duplicate "Incident Number" values
# It appears that Incident Types are double booked for the same event
# This may be some sort of error
df_merge.loc[df_merge["Incident Number"].duplicated(keep=False)]

Unnamed: 0,Incident Number,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,...,Persons Rescued,Datetime,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM
358649,F13096124,FAHR - Alarm Highrise Residential,Emergency Fire,34 - Human - Perceived Emergency,0,05 - Telephone from Monitoring Agency,442.0,5,43.707785,-79.533627,...,0.0,2013-11-23,1.4,-10.6,-3.53,21.53,0.0,0.0,1.03,0.0
358650,F13096124,FAHR - Alarm Highrise Residential,Emergency Fire,99 - Other Response,0,05 - Telephone from Monitoring Agency,442.0,5,43.707785,-79.533627,...,0.0,2013-11-23,1.4,-10.6,-3.53,21.53,0.0,0.0,1.03,0.0
359742,F13097259,CCNE - Check Call - Non Emergency,Non Emergency,94 - Other Public Service,0,09 - Other Alarm,134.0,15,43.705077,-79.384773,...,0.0,2013-11-27,1.0,-6.6,-2.33,20.33,0.0,0.0,0.97,1.5
359743,F13097259,CCNE - Check Call - Non Emergency,Non Emergency,99 - Other Response,0,09 - Other Alarm,134.0,15,43.705077,-79.384773,...,0.0,2013-11-27,1.0,-6.6,-2.33,20.33,0.0,0.0,0.97,1.5
391077,F14016411,FICI - Fire - Commercial/Industrial,Emergency Fire,99 - Other Response,0,09 - Other Alarm,411.0,7,43.760025,-79.532077,...,0.0,2014-02-18,0.7,-7.5,-2.57,20.57,0.0,0.0,2.17,33.5
391078,F14016411,FICI - Fire - Commercial/Industrial,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",0,02 - Telephone from Civlian (other than 911),411.0,7,43.760025,-79.532077,...,0.0,2014-02-18,0.7,-7.5,-2.57,20.57,0.0,0.0,2.17,33.5
401808,F14027496,CCNE - Check Call - Non Emergency,Non Emergency,94 - Other Public Service,0,02 - Telephone from Civlian (other than 911),223.0,21,43.741609,-79.23261,...,0.0,2014-03-28,10.2,0.0,5.57,12.43,0.0,0.4,1.97,5.0
401809,F14027496,CCNE - Check Call - Non Emergency,Non Emergency,99 - Other Response,0,02 - Telephone from Civlian (other than 911),223.0,21,43.741609,-79.23261,...,0.0,2014-03-28,10.2,0.0,5.57,12.43,0.0,0.4,1.97,5.0
459379,F14087165,Medical,Medical,89 - Other Medical,0,04 - From Police Services,146.0,6,43.724894,-79.509607,...,0.0,2014-10-04,15.4,5.9,11.13,6.87,0.0,0.3,1.93,0.0
459380,F14087165,Medical,Medical,89 - Other Medical,0,11 - No alarm received - incident discovered b...,146.0,6,43.724894,-79.509607,...,0.0,2014-10-04,15.4,5.9,11.13,6.87,0.0,0.3,1.93,0.0


In [24]:
# drop all values (that were duplicated) through slicing
df_merge = df_merge.loc[~df_merge["Incident Number"].duplicated(keep=False)]
df_merge.head()

Unnamed: 0,Incident Number,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,...,Persons Rescued,Datetime,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM
0,F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
1,F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
2,F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
3,F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
4,F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0


In [25]:
# check to see if the "Incident Number" is unique for each row
len(df_merge["Incident Number"].unique()) == len(df_merge)

True

In [26]:
# set the "Incident Number" as the index, since it is unique for each row
df_merge.index = df_merge["Incident Number"]
df_merge = df_merge.drop(columns=["Incident Number"])
df_merge.index.name = "INCIDENT_NUM"
df_merge.head()

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,...,Persons Rescued,Datetime,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,Driftwood Ave / Wilmont Dr,...,0.0,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0


In [27]:
# create "lat_long" column that is a tuple of latitude and longitude
df_merge["lat_long"] = list(zip(df_merge["LATITUDE"], df_merge["Longitude"]))
df_merge.head()

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,...,Datetime,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM,lat_long
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.679098579, -79.46176085399999)"
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.726342492, -79.396400551)"
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.66854833, -79.335324059)"
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.6571226, -79.43431345399999)"
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,Driftwood Ave / Wilmont Dr,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.759840161999996, -79.516182424)"


In [28]:
# write it to a folder
MERGED_FILE_PATH = os.path.join(FIRE_PROCESSED_UNZIPPED_DIRECTORY,
                               "2011-2018_Toronto_Fire_Incidents_Weather.csv")

df_merge.to_csv(MERGED_FILE_PATH)

# display 10 random values
df_merge.sample(10)

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,...,Datetime,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM,lat_long
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
F14087298,FAHR - Alarm Highrise Residential,Emergency Fire,35 - Human - Accidental (alarm accidentally ac...,0,05 - Telephone from Monitoring Agency,426.0,9,43.641238,-79.422854,Lisgar St / Sudbury St,...,2014-10-04,15.4,5.9,11.13,6.87,0.0,0.3,1.93,0.0,"(43.641238113, -79.42285379099998)"
F18020951,FAR - Alarm Residential,Emergency Fire,38 - CO false alarm - equipment malfunction (n...,0,01 - 911,226.0,19,43.678884,-79.306452,Firstbrooke Rd / Burgess Ave,...,2018-02-24,6.2,0.5,3.63,14.37,0.0,2.3,1.53,0.0,"(43.678884499, -79.30645246599998)"
F14088828,FAHR - Alarm Highrise Residential,Emergency Fire,35 - Human - Accidental (alarm accidentally ac...,0,05 - Telephone from Monitoring Agency,341.0,12,43.695314,-79.45012,Dufferin St / Dufferin N Eglinton E Ramp / Ln ...,...,2014-10-09,15.0,4.5,9.95,8.05,0.0,0.0,0.0,0.0,"(43.695313958999996, -79.450119966)"
F14011515,FACI - Alarm Commercial/Industrial,Emergency Fire,"33 - Human - Malicious intent, prank",0,05 - Telephone from Monitoring Agency,314.0,13,43.661372,-79.38311,College St / Yonge St / Carlton St,...,2014-02-01,1.4,-7.5,-1.47,19.47,0.0,0.0,13.9,13.5,"(43.661371813, -79.383110377)"
F11131772,Medical,Carbon Monoxide,89 - Other Medical,0,05 - Telephone from Monitoring Agency,122.0,15,43.742918,-79.367245,Caravan Dr / Bamboo Grv,...,2011-10-08,25.5,6.5,16.87,1.13,0.0,0.0,0.0,0.0,"(43.742917546, -79.367244891)"
F11145328,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,346.0,10,43.638671,-79.413956,Pirandello St / East Liberty St,...,2011-11-09,15.0,8.6,11.63,6.37,0.0,2.4,4.57,0.0,"(43.638671208999995, -79.413955558)"
F14035008,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,332.0,10,43.649906,-79.387138,Richmond St W / Simcoe St,...,2014-04-22,15.1,4.8,10.33,7.67,0.0,0.4,2.6,0.0,"(43.649906415, -79.387137745)"
F16072291,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,322.0,15,43.708018,-79.341387,Overlea Blvd / William Morgan Dr,...,2016-08-06,30.5,18.5,24.63,0.0,6.63,0.0,0.0,0.0,"(43.708017748, -79.34138708899998)"
F14014936,Medical,Medical,89 - Other Medical,0,01 - 911,433.0,3,43.620512,-79.483268,Lake Shore Blvd W / Legion Rd,...,2014-02-13,-1.6,-17.0,-7.43,25.43,0.0,0.0,0.07,29.0,"(43.620511692, -79.483267663)"
F12006979,FAR - Alarm Residential,Emergency Fire,32 - Alarm System Equipment - Accidental activ...,0,05 - Telephone from Monitoring Agency,313.0,13,43.667973,-79.376491,Lourdes Lane / Homewood Ave,...,2012-01-18,-2.6,-9.0,-5.3,23.3,0.0,0.0,0.0,0.0,"(43.667973376999996, -79.376491122)"


In [29]:
# read in the csv file, make sure the tuple is preserved
df_merge = pd.read_csv(
    os.path.join(FIRE_PROCESSED_UNZIPPED_DIRECTORY,
                "2011-2018_Toronto_Fire_Incidents_Weather.csv"),
                index_col="INCIDENT_NUM",
                converters={"lat_long":eval})
df_merge.head()

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,...,Datetime,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM,lat_long
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.679098579, -79.46176085399999)"
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.726342492, -79.396400551)"
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.66854833, -79.335324059)"
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.6571226, -79.43431345399999)"
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,Driftwood Ave / Wilmont Dr,...,2011-01-01,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0,"(43.759840161999996, -79.516182424)"


# 5
# Haversine Formula
- Determine great-circle distance between 2 points on a sphere - [link](https://en.wikipedia.org/wiki/Haversine_formula)
- Needed to relate the location of Toronto Fire Services Stations and Toronto Fire Hydrants to all incidents
    - determine the closest stations and Fire Hydrants
    - maybe there are overused Stations / Fire Hydrants that require additional resources

In [30]:
# Python Haversine formula
# Adapted from
# "https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points"
def haversine(latit_1=-90, latit_2=90, longit_1=0, longit_2=180, radius=6371000):
    """
    determines the great-circle distance between two
    points on a sphere (default value is Earth's) given their longitudes and latitudes.
    Longitudes and Latitudes are in decimal degrees.
    Returns value in metres (m).
    """
    # convert decimal degrees to radians
    longit_1 = math.radians(longit_1)
    latit_1 = math.radians(latit_1)
    longit_2 = math.radians(longit_2)
    latit_2 = math.radians(latit_2)
    
    # calculate the difference between latitudes and longitudes
    delta_latitude = latit_1 - latit_2
    delta_longitude = longit_1 - longit_2
    
    # great circle distance
    a = math.sin(delta_latitude / 2)**2 + math.cos(latit_1) * math.cos(latit_2) * math.sin(delta_longitude / 2)**2
    c = 2 * math.asin(math.sqrt(a))
    
    # return the great circle distance for Earth
    return radius * c

# since I get pi back that means it works
haversine()/6371000

3.141592653589793

In [31]:
# Python Haversine formula for numpy arrays
# An attempt to make the computation take less time (~1000 minutes for ~1 million rows)
# this one should take x10 times less
def np_haversine(latit_1=-90, latit_2=90, longit_1=0, longit_2=180, radius=6371000):
    """
    Calculate great circle distance between 2 points on a sphere (Earth is default).
    Positions are in decimal degrees.
    Function was rewritten to be calculated unsing Numpy methods.
    Returns value in metres (m).
    """
    # convert decimal degrees to radians
    latit_1 = np.radians(latit_1)
    latit_2 = np.radians(latit_2)
    longit_1 = np.radians(longit_1)
    longit_2 = np.radians(longit_2)
    
    # calculate the difference between latitudes and longitudes
    delta_latitude = latit_1 - latit_2
    delta_longitude = longit_1 - longit_2
    
    # great circle distance
    a = np.sin(delta_latitude / 2)**2 + np.cos(latit_1) * np.cos(latit_2) * np.sin(delta_longitude / 2)**2
    c = 2 * np.arcsin(np.sqrt(a))
    
    # return the distance
    return radius * c

np_haversine()/6371000

3.141592653589793

# 5b
# Closest Toronto Fire Services Station

In [32]:
# get LATITUDE and LONGITUDE of each fire department
df_station_locations = df_locations.loc[:, ["LATITUDE", "LONGITUDE"]]
df_station_locations.head()

Unnamed: 0_level_0,LATITUDE,LONGITUDE
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1
214,43.794219,-79.163605
215,43.777401,-79.148069
221,43.734799,-79.255066
222,43.720408,-79.284094
223,43.723965,-79.233264


In [33]:
# get the minimum value and replace the value in lat_longs with the index value
# corresponding to the Closest Fire Station
def Closest_Point(x):
    """
    Get the closest point from all the datapoints in df_location
    """
    if x == np.nan:
        return x
    
    # use the above DataFrame
    global df_station_locations
    
    # broadcasting will get us all the values we need
    np_haversine_array = np.array(
                        np_haversine(
                        df_station_locations["LATITUDE"].values,
                        x[0],
                        df_station_locations["LONGITUDE"].values,
                        x[1]
                                    )
                        )
    
    # get the minimum index
    minimum_index = np.argmin(np_haversine_array)
    
    # return the index of the minimum distance
    return df_station_locations.index[minimum_index]

## The following code will take about ~20 minutes to run
- This is essentially a VLOOKUP() function from Excel
- YOU HAVE BEEN WARNED

In [34]:
# apply the Closest_Point function to obtain the Fire Station closest to the Event
df_slice = df_merge.copy()
df_slice["FIRE_STATION_CLOSEST"] = df_slice["lat_long"].apply(lambda x: Closest_Point(x))
df_slice

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,...,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM,lat_long,FIRE_STATION_CLOSEST
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,...,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,"(43.679098579, -79.46176085399999)",342
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,...,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,"(43.726342492, -79.396400551)",131
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,...,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,"(43.66854833, -79.335324059)",324
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,...,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,"(43.6571226, -79.43431345399999)",345
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.759840,-79.516182,Driftwood Ave / Wilmont Dr,...,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,"(43.759840161999996, -79.516182424)",142
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
F18139238,FAHRD - Alarm Highrise Residential Downtown,Emergency Fire,"33 - Human - Malicious intent, prank",0,05 - Telephone from Monitoring Agency,312.0,11,43.667162,-79.401539,Bloor St W / Huron St,...,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,"(43.667161863000004, -79.401538955)",344
F18139239,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442.0,5,0.000000,0.000000,M6M,...,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,"(0.0, 0.0)",215
F18139240,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,322.0,14,0.000000,0.000000,M4J,...,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,"(0.0, 0.0)",215
F18139241,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442.0,2,0.000000,0.000000,M9P,...,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,"(0.0, 0.0)",215


# 5c
# Closest Toronto Fire Hydrant
# NOT DONE AS IT WILL TAKE ~ 1000 hours to complete with my resources (~42 days)

# 6
# Manipulations of Merged Dataset
- The Final .csv file is not ready yet
    - some manipulations are required

# 6a
# Create Minutes to Arrival and Minutes to Leave Columns
- How long did it take for Fire Services to arrive on location
- How long did it take for them to leave the location after arriving

In [35]:
# get the number of minutes it took for Fire Services to arrive to the emergency
df_slice["MINUTES_ARRIVAL"] = np.around((pd.to_datetime(df_slice["TFS Arrival Time"]) -
               pd.to_datetime(df_slice["TFS Alarm Time"])) / np.timedelta64(1, "m"), decimals=3)

# get the number of minutes after arriving it takes for the Fire Services to leave
df_slice["MINUTES_LEAVE"] = np.around((pd.to_datetime(df_slice["Last TFS Unit Clear Time"]) -
               pd.to_datetime(df_slice["TFS Arrival Time"])) / np.timedelta64(1, "m"), decimals=3)
df_slice

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,...,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM,lat_long,FIRE_STATION_CLOSEST,MINUTES_ARRIVAL,MINUTES_LEAVE
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,...,6.40,11.60,0.0,3.7,8.7,0.0,"(43.679098579, -79.46176085399999)",342,6.317,21.267
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,...,6.40,11.60,0.0,3.7,8.7,0.0,"(43.726342492, -79.396400551)",131,5.117,6.183
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,...,6.40,11.60,0.0,3.7,8.7,0.0,"(43.66854833, -79.335324059)",324,4.517,17.617
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,...,6.40,11.60,0.0,3.7,8.7,0.0,"(43.6571226, -79.43431345399999)",345,6.000,9.883
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.759840,-79.516182,Driftwood Ave / Wilmont Dr,...,6.40,11.60,0.0,3.7,8.7,0.0,"(43.759840161999996, -79.516182424)",142,4.933,10.133
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
F18139238,FAHRD - Alarm Highrise Residential Downtown,Emergency Fire,"33 - Human - Malicious intent, prank",0,05 - Telephone from Monitoring Agency,312.0,11,43.667162,-79.401539,Bloor St W / Huron St,...,1.33,16.67,0.0,6.4,13.5,1.5,"(43.667161863000004, -79.401538955)",344,4.433,6.817
F18139239,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442.0,5,0.000000,0.000000,M6M,...,1.33,16.67,0.0,6.4,13.5,1.5,"(0.0, 0.0)",215,5.650,2.667
F18139240,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,322.0,14,0.000000,0.000000,M4J,...,1.33,16.67,0.0,6.4,13.5,1.5,"(0.0, 0.0)",215,6.667,4.183
F18139241,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442.0,2,0.000000,0.000000,M9P,...,1.33,16.67,0.0,6.4,13.5,1.5,"(0.0, 0.0)",215,7.717,2.867


# 6b
# Rename and Drop Columns and Change Datatypes
- some data cleaning before writing to .csv file and .zip file

In [36]:
# get datatypes of columns
df_slice.info()

<class 'pandas.core.frame.DataFrame'>
Index: 975113 entries, F11000010 to F18139242
Data columns (total 27 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   Initial CAD Event Type       975113 non-null  object 
 1   Initial CAD Event Call Type  975113 non-null  object 
 2   Final Incident Type          975065 non-null  object 
 3   Event Alarm Level            975113 non-null  int64  
 4   Call Source                  975043 non-null  object 
 5   Incident Station Area        975052 non-null  float64
 6   Incident Ward                975113 non-null  int64  
 7   LATITUDE                     975113 non-null  float64
 8   Longitude                    975113 non-null  float64
 9   Intersection                 975112 non-null  object 
 10  TFS Alarm Time               975113 non-null  object 
 11  TFS Arrival Time             951654 non-null  object 
 12  Last TFS Unit Clear Time     975109 non-null  object

In [37]:
# There are some null values in "Incident Station Area" that one can infer from
# "FIRE_STATION_CLOSEST"
df_slice["Incident Station Area"].loc[df_slice["Incident Station Area"].isnull()] = \
            df_slice["FIRE_STATION_CLOSEST"].loc[df_slice["Incident Station Area"].isnull()]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


In [38]:
# rename some columns in pandas
df_slice = df_slice.rename(columns={
    "Incident Station Area" : "FIRE_STATION",
    "Longitude" : "LONGITUDE",
    "Initial CAD Event Type" : "CAD_TYPE",
    "Initial CAD Event Call Type" : "CAD_CALL_TYPE",
    "Final Incident Type" : "FINAL_TYPE",
    "Event Alarm Level" : "ALARM_LEVEL",
    "Call Source" : "CALL_SOURCE",
    "Persons Rescued" : "PERSONS_RESCUED",
    "Datetime" : "DATETIME"
    })

In [39]:
# Drop some useless columns in the DataFrame
df_slice = df_slice.drop(columns=["Intersection", "TFS Alarm Time", "TFS Arrival Time",
                                 "Last TFS Unit Clear Time", "lat_long", "Incident Ward"])

In [40]:
# lets look at our changes
df_slice.info()

<class 'pandas.core.frame.DataFrame'>
Index: 975113 entries, F11000010 to F18139242
Data columns (total 21 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   CAD_TYPE              975113 non-null  object 
 1   CAD_CALL_TYPE         975113 non-null  object 
 2   FINAL_TYPE            975065 non-null  object 
 3   ALARM_LEVEL           975113 non-null  int64  
 4   CALL_SOURCE           975043 non-null  object 
 5   FIRE_STATION          975113 non-null  float64
 6   LATITUDE              975113 non-null  float64
 7   LONGITUDE             975113 non-null  float64
 8   PERSONS_RESCUED       975045 non-null  float64
 9   DATETIME              975113 non-null  object 
 10  MAX_TEMP              975113 non-null  float64
 11  MIN_TEMP              975113 non-null  float64
 12  MEAN_TEMP             975113 non-null  float64
 13  HDD                   975113 non-null  float64
 14  CDD                   975113 non-null  float64

In [41]:
# change datatype of 1 column in dataframe
df_slice["FIRE_STATION_CLOSEST"] = df_slice["FIRE_STATION_CLOSEST"].astype("float64")

In [42]:
# lets have a look at the entire DataFrame
from IPython.display import display

with pd.option_context('display.max_columns', None):
    display(df_slice)

Unnamed: 0_level_0,CAD_TYPE,CAD_CALL_TYPE,FINAL_TYPE,ALARM_LEVEL,CALL_SOURCE,FIRE_STATION,LATITUDE,LONGITUDE,PERSONS_RESCUED,DATETIME,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM,FIRE_STATION_CLOSEST,MINUTES_ARRIVAL,MINUTES_LEAVE
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,43.679099,-79.461761,0.0,2011-01-01,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,342.0,6.317,21.267
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,43.726342,-79.396401,0.0,2011-01-01,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,131.0,5.117,6.183
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,43.668548,-79.335324,0.0,2011-01-01,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,324.0,4.517,17.617
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,43.657123,-79.434313,0.0,2011-01-01,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,345.0,6.000,9.883
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,43.759840,-79.516182,0.0,2011-01-01,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,142.0,4.933,10.133
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
F18139238,FAHRD - Alarm Highrise Residential Downtown,Emergency Fire,"33 - Human - Malicious intent, prank",0,05 - Telephone from Monitoring Agency,312.0,43.667162,-79.401539,0.0,2018-12-31,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,344.0,4.433,6.817
F18139239,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442.0,0.000000,0.000000,0.0,2018-12-31,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,215.0,5.650,2.667
F18139240,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,322.0,0.000000,0.000000,0.0,2018-12-31,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,215.0,6.667,4.183
F18139241,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442.0,0.000000,0.000000,0.0,2018-12-31,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,215.0,7.717,2.867


In [43]:
# reorder the columns to a more pleasing order
df_slice = df_slice.loc[:, 
            [
                'DATETIME', 'MINUTES_ARRIVAL', 'MINUTES_LEAVE',
                'FIRE_STATION', 'FIRE_STATION_CLOSEST', 'LATITUDE',
                'LONGITUDE', 'MAX_TEMP', 'MIN_TEMP',
                'MEAN_TEMP', 'HDD', 'CDD',
                'RAIN_MM', 'PRECIP_MM', 'SNOW_CM',
                'CAD_TYPE', 'CAD_CALL_TYPE', 'FINAL_TYPE',
                'ALARM_LEVEL', 'CALL_SOURCE', 'PERSONS_RESCUED'
            ]]

with pd.option_context('display.max_columns', None):
    display(df_slice)

Unnamed: 0_level_0,DATETIME,MINUTES_ARRIVAL,MINUTES_LEAVE,FIRE_STATION,FIRE_STATION_CLOSEST,LATITUDE,LONGITUDE,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM,CAD_TYPE,CAD_CALL_TYPE,FINAL_TYPE,ALARM_LEVEL,CALL_SOURCE,PERSONS_RESCUED
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
F11000010,2011-01-01,6.317,21.267,342.0,342.0,43.679099,-79.461761,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,0.0
F11000011,2011-01-01,5.117,6.183,131.0,131.0,43.726342,-79.396401,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,0.0
F11000012,2011-01-01,4.517,17.617,324.0,324.0,43.668548,-79.335324,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,0.0
F11000013,2011-01-01,6.000,9.883,345.0,345.0,43.657123,-79.434313,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,0.0
F11000014,2011-01-01,4.933,10.133,142.0,142.0,43.759840,-79.516182,11.5,0.9,6.40,11.60,0.0,3.7,8.7,0.0,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
F18139238,2018-12-31,4.433,6.817,312.0,344.0,43.667162,-79.401539,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,FAHRD - Alarm Highrise Residential Downtown,Emergency Fire,"33 - Human - Malicious intent, prank",0,05 - Telephone from Monitoring Agency,0.0
F18139239,2018-12-31,5.650,2.667,442.0,215.0,0.000000,0.000000,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,0.0
F18139240,2018-12-31,6.667,4.183,322.0,215.0,0.000000,0.000000,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,0.0
F18139241,2018-12-31,7.717,2.867,442.0,215.0,0.000000,0.000000,6.3,-5.5,1.33,16.67,0.0,6.4,13.5,1.5,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,0.0


In [44]:
# write to .csv and to a .csv.zip file
PATH_FINAL_CSV = os.path.join(FIRE_PROCESSED_UNZIPPED_DIRECTORY, "2011-2018_Toronto_Fire_Incidents_Weather_PROCESSED.csv")
PAT_FINAL_CSV_GZIP = os.path.join(FIRE_PROCESSED_ZIPPED_DIRECTORY, "2011-2018_Toronto_Fire_Incidents_Weather_PROCESSED.csv.gz")

# write to .csv
df_slice.to_csv(PATH_FINAL_CSV)

# write to compressed .csv.gz file
df_slice.to_csv(PAT_FINAL_CSV_GZIP, compression="gzip")