# Jupyter Notebook Purpose
- Run all code cells and you will have the data processed into 1 .csv or 1 .csv.gz file
    - the geopandas part can be skipped if you are unable to `pip install` or `conda install`

In [1]:
# Group 10 Collaborators
COLLABORATORS = ["Nidhi Punja",
                 "Judith Roth",
                 "Iman Dordizadeh Basirabad",
                 "Daniel Adam Cebula",
                 "Cynthia Fung",
                 "Ben Klassen"]

# Group 10 Members
for _ in COLLABORATORS:
    print(f"Group 10 Member: {_:->30}")

Group 10 Member: -------------------Nidhi Punja
Group 10 Member: -------------------Judith Roth
Group 10 Member: -----Iman Dordizadeh Basirabad
Group 10 Member: ------------Daniel Adam Cebula
Group 10 Member: ------------------Cynthia Fung
Group 10 Member: -------------------Ben Klassen


# Table of Contents

## 1. [Python Dependencies](#1)
### a. [Geopandas Issue](#1a)
___
## 2. [Folder Creation](#2)
- repeated just in case
___
## 3. [Toronto Fire Services Station Locations](#3)
### a. [From Shapefile to .csv](#3a)
___
## 4. [Toronto Fire Incidents merged with Toronto Weather](#4) - MOVED TO 03-DONE.ipynb
### a. [Merge DataFrames](#4a) - MOVED TO 03-DONE.ipynb
___
## 5. [Haversine Formula](#5)
### a. [Python Formula](#5a)
### b. [Closest Toronto Fire Services Station](#5b)
### c. [Closest Toronto Fire Hydrant](#5c)
- ABANDONED DUE TO LACK OF COMPUTER RESOURCES
___
## 6. [Manipulations of Fire Incidents Dataset](#6)
### a. [Create Minutes to Arrival and Minutes to Leave Columns](#6a)
### b. [Rename and Drop Columns and Change Datatypes](#6b)
___
## 7. [Creation of Zipped CSV Files](#7)

# 1
# Python Dependencies

In [2]:
# Python Modules for Miscellaneous reasons
from zipfile import ZipFile  # to read and write to zipped folders
import requests  # simple HTTP library for Python
import os        # portable way to use operating system functionalities
import io        # Tool for working with streams (Input/Ouput data)
import datetime  # python classes for manipulating dates and times
import dateutil  # powerful extensions to standard datetime Python module
import time      # used for time.sleep() to delay the HTTP requests ever so slightly
import re        # used for Python regex library
import math      # radians, cos, sin, asin and sqrt are used for haversine formula
from IPython.display import display # use this to see the entire DataFrame in the right format
from create_folder import create_folder # create folder function that I have defined and placed in create_folder.py file

In [3]:
# DATA ANALYSIS Python Dependencies
import pandas as pd
import numpy as np

# 1a
# Geopandas Issue
## Geopandas has conflicts with a matplotlib library dependency
- Will need to create another Conda Virtual Environment to use it
    - Well you do not, as I have included the .csv file so you do not need to do the following steps to generate it
    - I have commented out all code cells that would cause issues
    - If you wish I have left instructions on how to do so

## Create a geopandas Conda Virtual Environment

### 1. Create another Conda Virtual Environment - [link](https://geopandas.org/install.html)

#### A. ```conda create -n geo_env```
    - press 'y' and ENTER
#### B. ```conda activate geo_env```
    - MAKE SURE YOU ARE IN THE ```geo_env``` for all the following steps
#### C. ```conda config --env --add channels conda-forge```
#### D. ```conda config --env --set channel_priority strict```
#### E. ```conda install python=3 geopandas```

### 2. Download Jupyter Lab / Notebook - [link](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)

#### A. ```conda install -c conda-forge jupyterlab```

### 3. Open up this notebook in Jupyter Notebook / Lab
- ```jupyter lab```
- ```jupyter notebook```

In [4]:
# import geopandas as gpd

# 2
# Creation of Folders
- these folders will be used to store `RAW_ZIPPED/`, `RAW_UNZIPPED/`, `PROCESSED_ZIPPED/`, `PROCESSED_UNZIPPED/` data files
- it is assumed that you do not have them so I will create them for you if you do not have them

In [5]:
# Here are the major directory names that will hold the data / metadata
RAW_ZIPPED_DIRECTORY = create_folder(folder_name="RAW_ZIPPED")
RAW_UNZIPPED_DIRECTORY = create_folder(folder_name="RAW_UNZIPPED")
PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name="PROCESSED_ZIPPED")
PROCESSED_UNZIPPED_DIRECTORY = create_folder(folder_name="PROCESSED_UNZIPPED")

In [6]:
# Create folders for fire_incidents
FIRE_RAW_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(RAW_ZIPPED_DIRECTORY, "FIRE_INCIDENTS"))
FIRE_RAW_UNZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(RAW_UNZIPPED_DIRECTORY, "FIRE_INCIDENTS"))
FIRE_PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_ZIPPED_DIRECTORY, "FIRE_INCIDENTS"))
FIRE_PROCESSED_UNZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_UNZIPPED_DIRECTORY, "FIRE_INCIDENTS"))

In [7]:
# Create folders for toronto_weather
WEATHER_RAW_UNZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(RAW_UNZIPPED_DIRECTORY, "TORONTO_WEATHER"))
WEATHER_PROCESSED_UNZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_UNZIPPED_DIRECTORY, "TORONTO_WEATHER"))
WEATHER_PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_ZIPPED_DIRECTORY, "TORONTO_WEATHER"))
WEATHER_PROCESSED_UNZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_UNZIPPED_DIRECTORY, "TORONTO_WEATHER"))

In [8]:
# Create folders for fire_stations
STATIONS_RAW_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(RAW_ZIPPED_DIRECTORY, "FIRE_STATIONS"))
STATIONS_RAW_UNZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(RAW_UNZIPPED_DIRECTORY, "FIRE_STATIONS"))
STATIONS_PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_ZIPPED_DIRECTORY, "FIRE_STATIONS"))
STATIONS_PROCESSED_UNZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_UNZIPPED_DIRECTORY, "FIRE_STATIONS"))

# 3
# Toronto Fire Services Station Locations

# 3a
# From Shapefile to .csv

In [9]:
# # get the shapefile path
# SHAPEFILE_PATH = os.path.join(STATIONS_RAW_UNZIPPED_DIRECTORY, "FIRE_FACILITY_WGS84.shp")

# # allow Geopandas to read it
# shapefile = gpd.read_file(SHAPEFILE_PATH)
# shapefile

Unnamed: 0,NAME,ADDRESS,X,Y,LATITUDE,LONGITUDE,WARD_NAME,MUN_NAME,OBJECTID,geometry
0,FIRE STATION 214,745 MEADOWVALE RD,331855.873,4850304.607,43.794219,-79.163605,Scarborough East (44),Scarborough,1567567.0,POINT (-79.16360 43.79422)
1,FIRE STATION 215,5318 LAWRENCE AVE E,333114.022,4848441.177,43.777401,-79.148069,Scarborough East (44),Scarborough,2250005.0,POINT (-79.14807 43.77740)
2,FIRE STATION 221,2575 EGLINTON AVE E,324515.001,4843677.541,43.734799,-79.255066,Scarborough Southwest (35),Scarborough,2048861.0,POINT (-79.25507 43.73480)
3,FIRE STATION 222,755 WARDEN AVE,322180.852,4842072.337,43.720408,-79.284094,Scarborough Southwest (35),Scarborough,2449586.0,POINT (-79.28409 43.72041)
4,FIRE STATION 223,116 DORSET RD,326275.035,4842479.439,43.723965,-79.233264,Scarborough Southwest (36),Scarborough,2172861.0,POINT (-79.23326 43.72397)
...,...,...,...,...,...,...,...,...,...,...
79,FIRE STATION 145,20 BEFFORT RD,308003.882,4843645.466,43.734764,-79.460035,York Centre (9),North York,3186765.0,POINT (-79.46004 43.73476)
80,FIRE STATION 146,2220 JANE ST,303996.599,4842402.430,43.723581,-79.509779,York West (7),North York,3072600.0,POINT (-79.50978 43.72358)
81,FIRE STATION 211,900 TAPSCOTT RD,325466.636,4853590.517,43.824001,-79.242873,Scarborough-Rouge River (42),Scarborough,1520443.0,POINT (-79.24287 43.82400)
82,FIRE STATION 212,8500 SHEPPARD AVE E,329838.107,4851487.882,43.804941,-79.188623,Scarborough-Rouge River (42),Scarborough,2303629.0,POINT (-79.18862 43.80494)


In [10]:
# # convert from geopandas to pandas dataframe
# df_locations = pd.DataFrame(shapefile)

# # Set the index as the Fire Station Number
# df_locations.index = df_locations["NAME"].str.extract('(\d+)')[0].astype('int32')
# df_locations.index.name = "INDEX"
# df_locations

Unnamed: 0_level_0,NAME,ADDRESS,X,Y,LATITUDE,LONGITUDE,WARD_NAME,MUN_NAME,OBJECTID,geometry
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
214,FIRE STATION 214,745 MEADOWVALE RD,331855.873,4850304.607,43.794219,-79.163605,Scarborough East (44),Scarborough,1567567.0,POINT (-79.16360 43.79422)
215,FIRE STATION 215,5318 LAWRENCE AVE E,333114.022,4848441.177,43.777401,-79.148069,Scarborough East (44),Scarborough,2250005.0,POINT (-79.14807 43.77740)
221,FIRE STATION 221,2575 EGLINTON AVE E,324515.001,4843677.541,43.734799,-79.255066,Scarborough Southwest (35),Scarborough,2048861.0,POINT (-79.25507 43.73480)
222,FIRE STATION 222,755 WARDEN AVE,322180.852,4842072.337,43.720408,-79.284094,Scarborough Southwest (35),Scarborough,2449586.0,POINT (-79.28409 43.72041)
223,FIRE STATION 223,116 DORSET RD,326275.035,4842479.439,43.723965,-79.233264,Scarborough Southwest (36),Scarborough,2172861.0,POINT (-79.23326 43.72397)
...,...,...,...,...,...,...,...,...,...,...
145,FIRE STATION 145,20 BEFFORT RD,308003.882,4843645.466,43.734764,-79.460035,York Centre (9),North York,3186765.0,POINT (-79.46004 43.73476)
146,FIRE STATION 146,2220 JANE ST,303996.599,4842402.430,43.723581,-79.509779,York West (7),North York,3072600.0,POINT (-79.50978 43.72358)
211,FIRE STATION 211,900 TAPSCOTT RD,325466.636,4853590.517,43.824001,-79.242873,Scarborough-Rouge River (42),Scarborough,1520443.0,POINT (-79.24287 43.82400)
212,FIRE STATION 212,8500 SHEPPARD AVE E,329838.107,4851487.882,43.804941,-79.188623,Scarborough-Rouge River (42),Scarborough,2303629.0,POINT (-79.18862 43.80494)


In [11]:
# # get the CSV path
# SHAPEFILE_CSV_PATH = os.path.join(STATIONS_RAW_UNZIPPED_DIRECTORY,
#                                   "Toronto_Fire_Station_Locations.csv")

# # write the DataFrame to csv
# df_locations.to_csv(SHAPEFILE_CSV_PATH)

In [12]:
# # Do not need all the columns
# # slice out the ones that are irrelevant
# df_locations = df_locations.loc[:,
#                                ["NAME", "ADDRESS", "LATITUDE",
#                                 "LONGITUDE", "WARD_NAME", "MUN_NAME"]]
# df_locations.head()

Unnamed: 0_level_0,NAME,ADDRESS,LATITUDE,LONGITUDE,WARD_NAME,MUN_NAME
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
214,FIRE STATION 214,745 MEADOWVALE RD,43.794219,-79.163605,Scarborough East (44),Scarborough
215,FIRE STATION 215,5318 LAWRENCE AVE E,43.777401,-79.148069,Scarborough East (44),Scarborough
221,FIRE STATION 221,2575 EGLINTON AVE E,43.734799,-79.255066,Scarborough Southwest (35),Scarborough
222,FIRE STATION 222,755 WARDEN AVE,43.720408,-79.284094,Scarborough Southwest (35),Scarborough
223,FIRE STATION 223,116 DORSET RD,43.723965,-79.233264,Scarborough Southwest (36),Scarborough


In [13]:
# # get the CSV path
# SHAPEFILE_CSV_PROCESSED_PATH = os.path.join(STATIONS_PROCESSED_UNZIPPED_DIRECTORY,
#                                   "Toronto_Fire_Station_Locations_PROCESSED.csv")

# # write the DataFrame to csv
# df_locations.to_csv(SHAPEFILE_CSV_PROCESSED_PATH)

## Read in the csv
- dont worry if you cannot do the geopandas part
- just use the "Toronto_Fire_Station_Locations.csv" I left for you

In [14]:
# read in the .csv, set the index and set the lat_long column as a tuple
df_locations = pd.read_csv(
    os.path.join(STATIONS_PROCESSED_UNZIPPED_DIRECTORY,
                 "Toronto_Fire_Station_Locations_PROCESSED.csv"),
                    index_col="INDEX")
df_locations

Unnamed: 0_level_0,NAME,ADDRESS,LATITUDE,LONGITUDE,WARD_NAME,MUN_NAME
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
214,FIRE STATION 214,745 MEADOWVALE RD,43.794219,-79.163605,Scarborough East (44),Scarborough
215,FIRE STATION 215,5318 LAWRENCE AVE E,43.777401,-79.148069,Scarborough East (44),Scarborough
221,FIRE STATION 221,2575 EGLINTON AVE E,43.734799,-79.255066,Scarborough Southwest (35),Scarborough
222,FIRE STATION 222,755 WARDEN AVE,43.720408,-79.284094,Scarborough Southwest (35),Scarborough
223,FIRE STATION 223,116 DORSET RD,43.723965,-79.233264,Scarborough Southwest (36),Scarborough
...,...,...,...,...,...,...
145,FIRE STATION 145,20 BEFFORT RD,43.734764,-79.460035,York Centre (9),North York
146,FIRE STATION 146,2220 JANE ST,43.723581,-79.509779,York West (7),North York
211,FIRE STATION 211,900 TAPSCOTT RD,43.824001,-79.242873,Scarborough-Rouge River (42),Scarborough
212,FIRE STATION 212,8500 SHEPPARD AVE E,43.804941,-79.188623,Scarborough-Rouge River (42),Scarborough


In [15]:
# data types are appropriate
df_locations.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 84 entries, 214 to 213
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   NAME       84 non-null     object 
 1   ADDRESS    84 non-null     object 
 2   LATITUDE   84 non-null     float64
 3   LONGITUDE  84 non-null     float64
 4   WARD_NAME  84 non-null     object 
 5   MUN_NAME   84 non-null     object 
dtypes: float64(2), object(4)
memory usage: 4.6+ KB


In [16]:
# all values are not null
df_locations.isnull().sum()

NAME         0
ADDRESS      0
LATITUDE     0
LONGITUDE    0
WARD_NAME    0
MUN_NAME     0
dtype: int64

In [17]:
# there are no duplicates either
df_locations["NAME"].duplicated().sum()

0

# 4
# Toronto Fire Incidents merged with Toronto Weather

# 4a
# Merge DataFrames

In [18]:
# read in the fire incident csv as a DataFrame
FIRE_PATH = os.path.join(FIRE_PROCESSED_UNZIPPED_DIRECTORY,
                         "2011-2018_Basic_Incident_Details.csv")
df_fire = pd.read_csv(FIRE_PATH, parse_dates=["DATETIME"])

# number of rows and columns
df_fire.shape

(975175, 16)

In [19]:
# preview the DataFrame
df_fire.head()

Unnamed: 0,Incident Number,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,TFS Alarm Time,TFS Arrival Time,Last TFS Unit Clear Time,Persons Rescued,DATETIME
0,F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,2011-01-01 00:03:43,2011-01-01 00:10:02,2011-01-01 00:31:18,0.0,2011-01-01 00:03:43
1,F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,2011-01-01 00:03:55,2011-01-01 00:09:02,2011-01-01 00:15:13,0.0,2011-01-01 00:03:55
2,F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,2011-01-01 00:05:03,2011-01-01 00:09:34,2011-01-01 00:27:11,0.0,2011-01-01 00:05:03
3,F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,2011-01-01 00:04:46,2011-01-01 00:10:46,2011-01-01 00:20:39,0.0,2011-01-01 00:04:46
4,F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,Driftwood Ave / Wilmont Dr,2011-01-01 00:06:07,2011-01-01 00:11:03,2011-01-01 00:21:11,0.0,2011-01-01 00:06:07


In [20]:
# get datatypes of each column
df_fire.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 975175 entries, 0 to 975174
Data columns (total 16 columns):
 #   Column                       Non-Null Count   Dtype         
---  ------                       --------------   -----         
 0   Incident Number              975175 non-null  object        
 1   Initial CAD Event Type       975175 non-null  object        
 2   Initial CAD Event Call Type  975175 non-null  object        
 3   Final Incident Type          975127 non-null  object        
 4   Event Alarm Level            975175 non-null  int64         
 5   Call Source                  975105 non-null  object        
 6   Incident Station Area        975113 non-null  float64       
 7   Incident Ward                975175 non-null  int64         
 8   LATITUDE                     975127 non-null  float64       
 9   Longitude                    975127 non-null  float64       
 10  Intersection                 975126 non-null  object        
 11  TFS Alarm Time            

In [21]:
# some null values in the data but I believe they can be managed (less than <1%)
df_fire.isnull().sum()

Incident Number                    0
Initial CAD Event Type             0
Initial CAD Event Call Type        0
Final Incident Type               48
Event Alarm Level                  0
Call Source                       70
Incident Station Area             62
Incident Ward                      0
LATITUDE                          48
Longitude                         48
Intersection                      49
TFS Alarm Time                     0
TFS Arrival Time               23480
Last TFS Unit Clear Time           4
Persons Rescued                   68
DATETIME                           0
dtype: int64

In [22]:
# remove all values with null "LATITUDE" or "Longitude" through slicing
df_fire = df_fire.loc[df_fire["LATITUDE"].notnull() & df_fire["Longitude"].notnull()]
df_fire.shape

(975127, 16)

In [23]:
# 14 duplicates in Incident Number (7 rows that are replicated)
df_fire["Incident Number"].duplicated(keep=False).sum()

14

In [24]:
# Check duplicate "Incident Number" values
# It appears that Incident Types are double booked for the same event
# This may be some sort of error
df_fire.loc[df_fire["Incident Number"].duplicated(keep=False)]

Unnamed: 0,Incident Number,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,TFS Alarm Time,TFS Arrival Time,Last TFS Unit Clear Time,Persons Rescued,DATETIME
358649,F13096124,FAHR - Alarm Highrise Residential,Emergency Fire,34 - Human - Perceived Emergency,0,05 - Telephone from Monitoring Agency,442.0,5,43.707785,-79.533627,Weston Rd / Oak St,2013-11-23 17:47:14,2013-11-23 17:52:58,2013-11-23 19:41:10,0.0,2013-11-23 17:47:14
358650,F13096124,FAHR - Alarm Highrise Residential,Emergency Fire,99 - Other Response,0,05 - Telephone from Monitoring Agency,442.0,5,43.707785,-79.533627,Weston Rd / Oak St,2013-11-23 17:47:14,2013-11-23 17:52:58,2013-11-23 19:41:10,0.0,2013-11-23 17:47:14
359742,F13097259,CCNE - Check Call - Non Emergency,Non Emergency,94 - Other Public Service,0,09 - Other Alarm,134.0,15,43.705077,-79.384773,Manor Rd E / Forman Ave,2013-11-27 13:56:18,2013-11-27 14:12:20,2013-11-28 15:06:38,0.0,2013-11-27 13:56:18
359743,F13097259,CCNE - Check Call - Non Emergency,Non Emergency,99 - Other Response,0,09 - Other Alarm,134.0,15,43.705077,-79.384773,Manor Rd E / Forman Ave,2013-11-27 13:56:18,2013-11-27 14:12:20,2013-11-28 15:06:38,0.0,2013-11-27 13:56:18
391077,F14016411,FICI - Fire - Commercial/Industrial,Emergency Fire,99 - Other Response,0,09 - Other Alarm,411.0,7,43.760025,-79.532077,HWY 400 / 400 S Finch Ramp,2014-02-18 14:40:34,2014-02-18 14:46:23,2014-02-19 08:30:41,0.0,2014-02-18 14:40:34
391078,F14016411,FICI - Fire - Commercial/Industrial,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",0,02 - Telephone from Civlian (other than 911),411.0,7,43.760025,-79.532077,HWY 400 / 400 S Finch Ramp,2014-02-18 14:40:34,2014-02-18 14:46:23,2014-02-19 08:30:41,0.0,2014-02-18 14:40:34
401808,F14027496,CCNE - Check Call - Non Emergency,Non Emergency,94 - Other Public Service,0,02 - Telephone from Civlian (other than 911),223.0,21,43.741609,-79.23261,Cedar Brae Blvd / Bellamy Rd N / Trudelle St,2014-03-28 10:32:24,2014-03-28 10:38:55,2014-03-28 13:44:01,0.0,2014-03-28 10:32:24
401809,F14027496,CCNE - Check Call - Non Emergency,Non Emergency,99 - Other Response,0,02 - Telephone from Civlian (other than 911),223.0,21,43.741609,-79.23261,Cedar Brae Blvd / Bellamy Rd N / Trudelle St,2014-03-28 10:32:24,2014-03-28 10:38:55,2014-03-28 13:44:01,0.0,2014-03-28 10:32:24
459379,F14087165,Medical,Medical,89 - Other Medical,0,04 - From Police Services,146.0,6,43.724894,-79.509607,Jane St / Chalkfarm Dr / Heathrow Dr,2014-10-04 09:59:52,2014-10-04 10:01:14,2014-10-04 14:33:26,0.0,2014-10-04 09:59:52
459380,F14087165,Medical,Medical,89 - Other Medical,0,11 - No alarm received - incident discovered b...,146.0,6,43.724894,-79.509607,Jane St / Chalkfarm Dr / Heathrow Dr,2014-10-04 09:59:52,2014-10-04 10:01:14,2014-10-04 14:33:26,0.0,2014-10-04 09:59:52


In [25]:
# drop all values (that were duplicated) through slicing
# ~ is the Boolean NOT operator
df_fire = df_fire.loc[~df_fire["Incident Number"].duplicated(keep=False)]
df_fire.head()

Unnamed: 0,Incident Number,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,TFS Alarm Time,TFS Arrival Time,Last TFS Unit Clear Time,Persons Rescued,DATETIME
0,F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,2011-01-01 00:03:43,2011-01-01 00:10:02,2011-01-01 00:31:18,0.0,2011-01-01 00:03:43
1,F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,2011-01-01 00:03:55,2011-01-01 00:09:02,2011-01-01 00:15:13,0.0,2011-01-01 00:03:55
2,F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,2011-01-01 00:05:03,2011-01-01 00:09:34,2011-01-01 00:27:11,0.0,2011-01-01 00:05:03
3,F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,2011-01-01 00:04:46,2011-01-01 00:10:46,2011-01-01 00:20:39,0.0,2011-01-01 00:04:46
4,F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,Driftwood Ave / Wilmont Dr,2011-01-01 00:06:07,2011-01-01 00:11:03,2011-01-01 00:21:11,0.0,2011-01-01 00:06:07


In [26]:
# check to see if the "Incident Number" is unique for each row
len(df_fire["Incident Number"].unique()) == len(df_fire)

True

In [27]:
# set the "Incident Number" as the index, since it is unique for each row
df_fire.index = df_fire["Incident Number"]
df_fire = df_fire.drop(columns=["Incident Number"])
df_fire.index.name = "INCIDENT_NUM"
df_fire.head()

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,TFS Alarm Time,TFS Arrival Time,Last TFS Unit Clear Time,Persons Rescued,DATETIME
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,2011-01-01 00:03:43,2011-01-01 00:10:02,2011-01-01 00:31:18,0.0,2011-01-01 00:03:43
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,2011-01-01 00:03:55,2011-01-01 00:09:02,2011-01-01 00:15:13,0.0,2011-01-01 00:03:55
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,2011-01-01 00:05:03,2011-01-01 00:09:34,2011-01-01 00:27:11,0.0,2011-01-01 00:05:03
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,2011-01-01 00:04:46,2011-01-01 00:10:46,2011-01-01 00:20:39,0.0,2011-01-01 00:04:46
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,Driftwood Ave / Wilmont Dr,2011-01-01 00:06:07,2011-01-01 00:11:03,2011-01-01 00:21:11,0.0,2011-01-01 00:06:07


In [28]:
# create "lat_long" column that is a tuple of latitude and longitude
df_fire["lat_long"] = list(zip(df_fire["LATITUDE"], df_fire["Longitude"]))
df_fire.head()

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,TFS Alarm Time,TFS Arrival Time,Last TFS Unit Clear Time,Persons Rescued,DATETIME,lat_long
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,2011-01-01 00:03:43,2011-01-01 00:10:02,2011-01-01 00:31:18,0.0,2011-01-01 00:03:43,"(43.679098579, -79.46176085399999)"
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,2011-01-01 00:03:55,2011-01-01 00:09:02,2011-01-01 00:15:13,0.0,2011-01-01 00:03:55,"(43.726342492, -79.396400551)"
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,2011-01-01 00:05:03,2011-01-01 00:09:34,2011-01-01 00:27:11,0.0,2011-01-01 00:05:03,"(43.66854833, -79.335324059)"
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,2011-01-01 00:04:46,2011-01-01 00:10:46,2011-01-01 00:20:39,0.0,2011-01-01 00:04:46,"(43.6571226, -79.43431345399999)"
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,Driftwood Ave / Wilmont Dr,2011-01-01 00:06:07,2011-01-01 00:11:03,2011-01-01 00:21:11,0.0,2011-01-01 00:06:07,"(43.759840161999996, -79.516182424)"


In [29]:
# write it to a folder
MERGED_FILE_PATH = os.path.join(FIRE_PROCESSED_UNZIPPED_DIRECTORY,
                               "2011-2018_Toronto_Fire_Incidents.csv")

df_fire.to_csv(MERGED_FILE_PATH)

# display 10 random values
df_fire.sample(10)

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,TFS Alarm Time,TFS Arrival Time,Last TFS Unit Clear Time,Persons Rescued,DATETIME,lat_long
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
F17009060,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,313.0,14,43.676202,-79.358873,Danforth Ave / Broadview Ave,2017-01-28 09:48:11,2017-01-28 09:52:14,2017-01-28 09:54:12,0.0,2017-01-28 09:48:11,"(43.676202331999995, -79.358872563)"
F12022953,FIHR - Fire - Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",0,05 - Telephone from Monitoring Agency,224.0,19,43.69513,-79.303481,Lumsden Ave / Barrington Ave,2012-03-01 21:46:21,2012-03-01 21:50:35,2012-03-01 22:16:07,0.0,2012-03-01 21:46:21,"(43.695129656000006, -79.303480729)"
F18077223,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,114.0,18,0.0,0.0,M2N,2018-07-21 22:14:50,,2018-07-21 22:20:29,0.0,2018-07-21 22:14:50,"(0.0, 0.0)"
F12075958,FAR - Alarm Residential,Emergency Fire,31 - Alarm Equipment - Malfunction,0,05 - Telephone from Monitoring Agency,226.0,19,43.692588,-79.293723,Dentonia Park Ave / Sibley Ave / Dentonia Park...,2012-07-17 06:46:13,2012-07-17 06:51:39,2012-07-17 07:01:51,0.0,2012-07-17 06:46:13,"(43.692587671999995, -79.293722781)"
F13005912,FACI - Alarm Commercial/Industrial,Emergency Fire,32 - Alarm System Equipment - Accidental activ...,0,01 - 911,441.0,1,43.69379,-79.571919,Martin Grove Rd / Ronson Dr,2013-01-21 16:21:24,2013-01-21 16:24:40,2013-01-21 16:31:36,0.0,2013-01-21 16:21:24,"(43.69379026, -79.57191926899999)"
F13044516,FIG - Fire - Grass/Rubbish,Emergency Fire,23 - Open air burning/unauthorized controlled ...,0,02 - Telephone from Civlian (other than 911),224.0,19,43.684309,-79.312213,Woodbine Ave / Mendel Ave,2013-06-08 15:41:06,2013-06-08 15:44:53,2013-06-08 15:47:54,0.0,2013-06-08 15:41:06,"(43.684309494, -79.31221331099998)"
F12080544,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,125.0,16,43.71836,-79.331406,Deauville Lane / St Dennis Dr,2012-08-01 04:17:08,2012-08-01 04:24:37,2012-08-01 04:49:47,0.0,2012-08-01 04:17:08,"(43.718360231999995, -79.331406395)"
F13034264,CC - Check Call,Other Emergency Events,99 - Other Response,0,02 - Telephone from Civlian (other than 911),425.0,4,43.653412,-79.465171,West Rd / Colborne Lodge Dr / High Park Trl,2013-05-05 23:35:31,2013-05-05 23:50:25,2013-05-05 23:59:38,0.0,2013-05-05 23:35:31,"(43.653412418, -79.46517054899999)"
F12120049,TEMS - TEMS TRANSFERRED - READ REMARKS,Medical,62 - Vehicle Collision,0,03 - From Ambulance,232.0,21,43.772142,-79.251498,Ellesmere Rd / McCowan Rd,2012-12-14 13:49:45,2012-12-14 13:55:10,2012-12-14 14:03:05,0.0,2012-12-14 13:49:45,"(43.772142024, -79.25149843199999)"
F13017273,FAR - Alarm Residential,Emergency Fire,32 - Alarm System Equipment - Accidental activ...,0,05 - Telephone from Monitoring Agency,122.0,15,43.751671,-79.392722,Fenn Ave / Danville Dr,2013-03-02 16:13:03,2013-03-02 16:17:32,2013-03-02 16:20:10,0.0,2013-03-02 16:13:03,"(43.751671386000005, -79.39272233)"


In [30]:
# read in the csv file, make sure the tuple is preserved
df_fire = pd.read_csv(
    os.path.join(FIRE_PROCESSED_UNZIPPED_DIRECTORY,
                "2011-2018_Toronto_Fire_Incidents.csv"),
                index_col="INCIDENT_NUM",
                converters={"lat_long":eval},
                parse_dates=["DATETIME"])
df_fire.head()

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,TFS Alarm Time,TFS Arrival Time,Last TFS Unit Clear Time,Persons Rescued,DATETIME,lat_long
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,2011-01-01 00:03:43,2011-01-01 00:10:02,2011-01-01 00:31:18,0.0,2011-01-01 00:03:43,"(43.679098579, -79.46176085399999)"
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,2011-01-01 00:03:55,2011-01-01 00:09:02,2011-01-01 00:15:13,0.0,2011-01-01 00:03:55,"(43.726342492, -79.396400551)"
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,2011-01-01 00:05:03,2011-01-01 00:09:34,2011-01-01 00:27:11,0.0,2011-01-01 00:05:03,"(43.66854833, -79.335324059)"
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,2011-01-01 00:04:46,2011-01-01 00:10:46,2011-01-01 00:20:39,0.0,2011-01-01 00:04:46,"(43.6571226, -79.43431345399999)"
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,Driftwood Ave / Wilmont Dr,2011-01-01 00:06:07,2011-01-01 00:11:03,2011-01-01 00:21:11,0.0,2011-01-01 00:06:07,"(43.759840161999996, -79.516182424)"


# 5
# Haversine Formula
- Determine great-circle distance between 2 points on a sphere - [link](https://en.wikipedia.org/wiki/Haversine_formula)
- Needed to relate the location of Toronto Fire Services Stations and Toronto Fire Hydrants to all incidents
    - determine the closest stations and Fire Hydrants
    - maybe there are overused Stations / Fire Hydrants that require additional resources

In [31]:
# Python Haversine formula
# Adapted from
# "https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points"
def haversine(latit_1=-90, latit_2=90, longit_1=0, longit_2=180, radius=6371000):
    """
    determines the great-circle distance between two
    points on a sphere (default value is Earth's) given their longitudes and latitudes.
    Longitudes and Latitudes are in decimal degrees.
    Returns value in metres (m).
    """
    # convert decimal degrees to radians
    longit_1 = math.radians(longit_1)
    latit_1 = math.radians(latit_1)
    longit_2 = math.radians(longit_2)
    latit_2 = math.radians(latit_2)
    
    # calculate the difference between latitudes and longitudes
    delta_latitude = latit_1 - latit_2
    delta_longitude = longit_1 - longit_2
    
    # great circle distance
    a = math.sin(delta_latitude / 2)**2 + math.cos(latit_1) * math.cos(latit_2) * math.sin(delta_longitude / 2)**2
    c = 2 * math.asin(math.sqrt(a))
    
    # return the great circle distance for Earth
    return radius * c

# since I get pi back that means it works
haversine()/6371000

3.141592653589793

In [32]:
# Python Haversine formula for numpy arrays
# An attempt to make the computation take less time (~1000 minutes for ~1 million rows)
# this one should take x10-x30 times less
def np_haversine(latit_1=-90, latit_2=90, longit_1=0, longit_2=180, radius=6371000):
    """
    Calculate great circle distance between 2 points on a sphere (Earth is default).
    Positions are in decimal degrees.
    Function was rewritten to be calculated unsing Numpy methods.
    Returns value in metres (m).
    """
    # convert decimal degrees to radians
    latit_1 = np.radians(latit_1)
    latit_2 = np.radians(latit_2)
    longit_1 = np.radians(longit_1)
    longit_2 = np.radians(longit_2)
    
    # calculate the difference between latitudes and longitudes
    delta_latitude = latit_1 - latit_2
    delta_longitude = longit_1 - longit_2
    
    # great circle distance
    a = np.sin(delta_latitude / 2)**2 + np.cos(latit_1) * np.cos(latit_2) * np.sin(delta_longitude / 2)**2
    c = 2 * np.arcsin(np.sqrt(a))
    
    # return the distance
    return radius * c

np_haversine()/6371000

3.141592653589793

# 5b
# Closest Toronto Fire Services Station

In [33]:
# get LATITUDE and LONGITUDE of each fire department
df_station_locations = df_locations.loc[:, ["LATITUDE", "LONGITUDE"]]
df_station_locations.head()

Unnamed: 0_level_0,LATITUDE,LONGITUDE
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1
214,43.794219,-79.163605
215,43.777401,-79.148069
221,43.734799,-79.255066
222,43.720408,-79.284094
223,43.723965,-79.233264


In [34]:
# get the minimum value and replace the value in lat_longs with the index value
# corresponding to the Closest Fire Station
def Closest_Point(x):
    """
    Get the closest point from all the datapoints in df_location
    """
    if x == np.nan:
        return x
    
    # use the above DataFrame
    global df_station_locations
    
    # broadcasting will get us all the values we need
    np_haversine_array = np.array(
                        np_haversine(
                        df_station_locations["LATITUDE"].values,
                        x[0],
                        df_station_locations["LONGITUDE"].values,
                        x[1]
                                    )
                        )
    
    # get the minimum index
    minimum_index = np.argmin(np_haversine_array)
    
    # return the index of the minimum distance
    return df_station_locations.index[minimum_index]

## The following code will take about ~20 minutes to run
- This is essentially a VLOOKUP() function from Excel
- YOU HAVE BEEN WARNED

In [35]:
# apply the Closest_Point function to obtain the Fire Station closest to the Event
df_slice = df_fire.copy()
df_slice["FIRE_STATION_CLOSEST"] = df_slice["lat_long"].apply(lambda x: Closest_Point(x))
df_slice

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,TFS Alarm Time,TFS Arrival Time,Last TFS Unit Clear Time,Persons Rescued,DATETIME,lat_long,FIRE_STATION_CLOSEST
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,2011-01-01 00:03:43,2011-01-01 00:10:02,2011-01-01 00:31:18,0.0,2011-01-01 00:03:43,"(43.679098579, -79.46176085399999)",342
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,2011-01-01 00:03:55,2011-01-01 00:09:02,2011-01-01 00:15:13,0.0,2011-01-01 00:03:55,"(43.726342492, -79.396400551)",131
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,2011-01-01 00:05:03,2011-01-01 00:09:34,2011-01-01 00:27:11,0.0,2011-01-01 00:05:03,"(43.66854833, -79.335324059)",324
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,2011-01-01 00:04:46,2011-01-01 00:10:46,2011-01-01 00:20:39,0.0,2011-01-01 00:04:46,"(43.6571226, -79.43431345399999)",345
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.759840,-79.516182,Driftwood Ave / Wilmont Dr,2011-01-01 00:06:07,2011-01-01 00:11:03,2011-01-01 00:21:11,0.0,2011-01-01 00:06:07,"(43.759840161999996, -79.516182424)",142
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
F18139238,FAHRD - Alarm Highrise Residential Downtown,Emergency Fire,"33 - Human - Malicious intent, prank",0,05 - Telephone from Monitoring Agency,312.0,11,43.667162,-79.401539,Bloor St W / Huron St,2018-12-31 23:55:07,2018-12-31 23:59:33,2019-01-01 00:06:22,0.0,2018-12-31 23:55:07,"(43.667161863000004, -79.401538955)",344
F18139239,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442.0,5,0.000000,0.000000,M6M,2018-12-31 23:56:38,2019-01-01 00:02:17,2019-01-01 00:04:57,0.0,2018-12-31 23:56:38,"(0.0, 0.0)",215
F18139240,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,322.0,14,0.000000,0.000000,M4J,2018-12-31 23:56:52,2019-01-01 00:03:32,2019-01-01 00:07:43,0.0,2018-12-31 23:56:52,"(0.0, 0.0)",215
F18139241,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442.0,2,0.000000,0.000000,M9P,2018-12-31 23:59:30,2019-01-01 00:07:13,2019-01-01 00:10:05,0.0,2018-12-31 23:59:30,"(0.0, 0.0)",215


# 5c
# Closest Toronto Fire Hydrant
# NOT DONE AS IT WILL TAKE ~ 1000 hours to complete with my resources (~42 days)

# 6
# Manipulations of Fire Incidents Dataset
- The Final .csv file is not ready yet
    - some manipulations are required

# 6a
# Create Minutes to Arrival and Minutes to Leave Columns
- How long did it take for Fire Services to arrive on location
- How long did it take for them to leave the location after arriving

In [36]:
# get the number of minutes it took for Fire Services to arrive to the emergency
df_slice["MINUTES_ARRIVAL"] = np.around((pd.to_datetime(df_slice["TFS Arrival Time"]) -
               pd.to_datetime(df_slice["TFS Alarm Time"])) / np.timedelta64(1, "m"), decimals=3)

# get the number of minutes after arriving it takes for the Fire Services to leave
df_slice["MINUTES_LEAVE"] = np.around((pd.to_datetime(df_slice["Last TFS Unit Clear Time"]) -
               pd.to_datetime(df_slice["TFS Arrival Time"])) / np.timedelta64(1, "m"), decimals=3)
df_slice.head()

Unnamed: 0_level_0,Initial CAD Event Type,Initial CAD Event Call Type,Final Incident Type,Event Alarm Level,Call Source,Incident Station Area,Incident Ward,LATITUDE,Longitude,Intersection,TFS Alarm Time,TFS Arrival Time,Last TFS Unit Clear Time,Persons Rescued,DATETIME,lat_long,FIRE_STATION_CLOSEST,MINUTES_ARRIVAL,MINUTES_LEAVE
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342.0,9,43.679099,-79.461761,Silverthorn Ave / Turnberry Ave,2011-01-01 00:03:43,2011-01-01 00:10:02,2011-01-01 00:31:18,0.0,2011-01-01 00:03:43,"(43.679098579, -79.46176085399999)",342,6.317,21.267
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131.0,15,43.726342,-79.396401,Lawrence Ave E / Mount Pleasant Rd,2011-01-01 00:03:55,2011-01-01 00:09:02,2011-01-01 00:15:13,0.0,2011-01-01 00:03:55,"(43.726342492, -79.396400551)",131,5.117,6.183
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324.0,14,43.668548,-79.335324,Endean Ave / Jones Ave,2011-01-01 00:05:03,2011-01-01 00:09:34,2011-01-01 00:27:11,0.0,2011-01-01 00:05:03,"(43.66854833, -79.335324059)",324,4.517,17.617
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345.0,9,43.657123,-79.434313,Dufferin St / Dufferin Park Ave,2011-01-01 00:04:46,2011-01-01 00:10:46,2011-01-01 00:20:39,0.0,2011-01-01 00:04:46,"(43.6571226, -79.43431345399999)",345,6.0,9.883
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142.0,7,43.75984,-79.516182,Driftwood Ave / Wilmont Dr,2011-01-01 00:06:07,2011-01-01 00:11:03,2011-01-01 00:21:11,0.0,2011-01-01 00:06:07,"(43.759840161999996, -79.516182424)",142,4.933,10.133


# 6b
# Rename and Drop Columns and Change Datatypes
- some data cleaning before writing to .csv file

In [37]:
# get datatypes of columns
df_slice.info()

<class 'pandas.core.frame.DataFrame'>
Index: 975113 entries, F11000010 to F18139242
Data columns (total 19 columns):
 #   Column                       Non-Null Count   Dtype         
---  ------                       --------------   -----         
 0   Initial CAD Event Type       975113 non-null  object        
 1   Initial CAD Event Call Type  975113 non-null  object        
 2   Final Incident Type          975065 non-null  object        
 3   Event Alarm Level            975113 non-null  int64         
 4   Call Source                  975043 non-null  object        
 5   Incident Station Area        975052 non-null  float64       
 6   Incident Ward                975113 non-null  int64         
 7   LATITUDE                     975113 non-null  float64       
 8   Longitude                    975113 non-null  float64       
 9   Intersection                 975112 non-null  object        
 10  TFS Alarm Time               975113 non-null  object        
 11  TFS Arrival Time    

In [38]:
# There are some null values in "Incident Station Area" that one can infer from
# "FIRE_STATION_CLOSEST"
df_slice["Incident Station Area"].loc[df_slice["Incident Station Area"].isnull()] = \
            df_slice["FIRE_STATION_CLOSEST"].loc[df_slice["Incident Station Area"].isnull()]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


In [39]:
# rename some columns in pandas
df_slice = df_slice.rename(columns={
    "Incident Station Area" : "FIRE_STATION",
    "Longitude" : "LONGITUDE",
    "Initial CAD Event Type" : "CAD_TYPE",
    "Initial CAD Event Call Type" : "CAD_CALL_TYPE",
    "Final Incident Type" : "FINAL_TYPE",
    "Event Alarm Level" : "ALARM_LEVEL",
    "Call Source" : "CALL_SOURCE",
    "Persons Rescued" : "PERSONS_RESCUED"
    })

In [40]:
# Drop some useless columns in the DataFrame
df_slice = df_slice.drop(columns=["Intersection", "TFS Alarm Time", "TFS Arrival Time",
                                 "Last TFS Unit Clear Time", "lat_long", "Incident Ward"])

In [41]:
# lets look at our changes
df_slice.info()

<class 'pandas.core.frame.DataFrame'>
Index: 975113 entries, F11000010 to F18139242
Data columns (total 13 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   CAD_TYPE              975113 non-null  object        
 1   CAD_CALL_TYPE         975113 non-null  object        
 2   FINAL_TYPE            975065 non-null  object        
 3   ALARM_LEVEL           975113 non-null  int64         
 4   CALL_SOURCE           975043 non-null  object        
 5   FIRE_STATION          975113 non-null  float64       
 6   LATITUDE              975113 non-null  float64       
 7   LONGITUDE             975113 non-null  float64       
 8   PERSONS_RESCUED       975045 non-null  float64       
 9   DATETIME              975113 non-null  datetime64[ns]
 10  FIRE_STATION_CLOSEST  975113 non-null  int64         
 11  MINUTES_ARRIVAL       951654 non-null  float64       
 12  MINUTES_LEAVE         951650 non-null  float64      

In [42]:
# change datatype of several columns in dataframe
df_slice["FIRE_STATION"] = df_slice["FIRE_STATION"].astype("int64")

In [43]:
# lets have a look at the entire DataFrame
from IPython.display import display

with pd.option_context('display.max_columns', None):
    display(df_slice)

Unnamed: 0_level_0,CAD_TYPE,CAD_CALL_TYPE,FINAL_TYPE,ALARM_LEVEL,CALL_SOURCE,FIRE_STATION,LATITUDE,LONGITUDE,PERSONS_RESCUED,DATETIME,FIRE_STATION_CLOSEST,MINUTES_ARRIVAL,MINUTES_LEAVE
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
F11000010,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,342,43.679099,-79.461761,0.0,2011-01-01 00:03:43,342,6.317,21.267
F11000011,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,131,43.726342,-79.396401,0.0,2011-01-01 00:03:55,131,5.117,6.183
F11000012,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,324,43.668548,-79.335324,0.0,2011-01-01 00:05:03,324,4.517,17.617
F11000013,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,345,43.657123,-79.434313,0.0,2011-01-01 00:04:46,345,6.000,9.883
F11000014,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,142,43.759840,-79.516182,0.0,2011-01-01 00:06:07,142,4.933,10.133
...,...,...,...,...,...,...,...,...,...,...,...,...,...
F18139238,FAHRD - Alarm Highrise Residential Downtown,Emergency Fire,"33 - Human - Malicious intent, prank",0,05 - Telephone from Monitoring Agency,312,43.667162,-79.401539,0.0,2018-12-31 23:55:07,344,4.433,6.817
F18139239,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442,0.000000,0.000000,0.0,2018-12-31 23:56:38,215,5.650,2.667
F18139240,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,322,0.000000,0.000000,0.0,2018-12-31 23:56:52,215,6.667,4.183
F18139241,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,442,0.000000,0.000000,0.0,2018-12-31 23:59:30,215,7.717,2.867


In [44]:
# reorder the columns to a more pleasing order
df_slice = df_slice.loc[:, 
            [
                'DATETIME', 'MINUTES_ARRIVAL', 'MINUTES_LEAVE',
                'FIRE_STATION', 'FIRE_STATION_CLOSEST', 'LATITUDE',
                'LONGITUDE',
                'CAD_TYPE', 'CAD_CALL_TYPE', 'FINAL_TYPE',
                'ALARM_LEVEL', 'CALL_SOURCE', 'PERSONS_RESCUED'
            ]]

with pd.option_context('display.max_columns', None):
    display(df_slice)

Unnamed: 0_level_0,DATETIME,MINUTES_ARRIVAL,MINUTES_LEAVE,FIRE_STATION,FIRE_STATION_CLOSEST,LATITUDE,LONGITUDE,CAD_TYPE,CAD_CALL_TYPE,FINAL_TYPE,ALARM_LEVEL,CALL_SOURCE,PERSONS_RESCUED
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
F11000010,2011-01-01 00:03:43,6.317,21.267,342,342,43.679099,-79.461761,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,0.0
F11000011,2011-01-01 00:03:55,5.117,6.183,131,131,43.726342,-79.396401,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,0.0
F11000012,2011-01-01 00:05:03,4.517,17.617,324,324,43.668548,-79.335324,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,0.0
F11000013,2011-01-01 00:04:46,6.000,9.883,345,345,43.657123,-79.434313,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,0.0
F11000014,2011-01-01 00:06:07,4.933,10.133,142,142,43.759840,-79.516182,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
F18139238,2018-12-31 23:55:07,4.433,6.817,312,344,43.667162,-79.401539,FAHRD - Alarm Highrise Residential Downtown,Emergency Fire,"33 - Human - Malicious intent, prank",0,05 - Telephone from Monitoring Agency,0.0
F18139239,2018-12-31 23:56:38,5.650,2.667,442,215,0.000000,0.000000,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,0.0
F18139240,2018-12-31 23:56:52,6.667,4.183,322,215,0.000000,0.000000,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,0.0
F18139241,2018-12-31 23:59:30,7.717,2.867,442,215,0.000000,0.000000,Medical,Medical,89 - Other Medical,0,03 - From Ambulance,0.0


In [45]:
# write to .csv and to a .csv.zip file
PATH_FINAL_CSV = os.path.join(FIRE_PROCESSED_UNZIPPED_DIRECTORY, "2011-2018_Toronto_Fire_Incidents_PROCESSED.csv")

# write to .csv
df_slice.to_csv(PATH_FINAL_CSV)

# 7
# Creation of Zipped CSV Files

In [46]:
# write fire_incident data to a .csv.bz2 (compressed file)
PATH_FIRE_FINAL_CSV = os.path.join(FIRE_PROCESSED_UNZIPPED_DIRECTORY, "2011-2018_Toronto_Fire_Incidents_PROCESSED.csv")

# read the csv into a DataFrame
df_fire = pd.read_csv(PATH_FIRE_FINAL_CSV, parse_dates=["DATETIME"],
                     index_col="INCIDENT_NUM")

# write the csv to a csv.bz2 (compressed file)
PATH_FIRE_FINAL_CSV_BZ2 = os.path.join(FIRE_PROCESSED_ZIPPED_DIRECTORY, "2011-2018_Toronto_Fire_Incidents_PROCESSED.csv.bz2")
df_fire.to_csv(PATH_FIRE_FINAL_CSV_BZ2, compression='bz2')

df_fire.head()

Unnamed: 0_level_0,DATETIME,MINUTES_ARRIVAL,MINUTES_LEAVE,FIRE_STATION,FIRE_STATION_CLOSEST,LATITUDE,LONGITUDE,CAD_TYPE,CAD_CALL_TYPE,FINAL_TYPE,ALARM_LEVEL,CALL_SOURCE,PERSONS_RESCUED
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
F11000010,2011-01-01 00:03:43,6.317,21.267,342,342,43.679099,-79.461761,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,0.0
F11000011,2011-01-01 00:03:55,5.117,6.183,131,131,43.726342,-79.396401,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,0.0
F11000012,2011-01-01 00:05:03,4.517,17.617,324,324,43.668548,-79.335324,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,0.0
F11000013,2011-01-01 00:04:46,6.0,9.883,345,345,43.657123,-79.434313,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,0.0
F11000014,2011-01-01 00:06:07,4.933,10.133,142,142,43.75984,-79.516182,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,0.0


In [47]:
# write toronto_weather data to a .csv.bz2 (compressed file)
PATH_WEATHER_FINAL_CSV = os.path.join(WEATHER_PROCESSED_UNZIPPED_DIRECTORY, "2010-2020_Toronto_Weather.csv")

# read the csv into a DataFrame
df_weather = pd.read_csv(PATH_WEATHER_FINAL_CSV, parse_dates=["DATE"],
                        index_col="DATE")

# write the csv to a csv.bz2 (compressed file)
PATH_WEATHER_FINAL_CSV_BZ2 = os.path.join(WEATHER_PROCESSED_ZIPPED_DIRECTORY, "2010-2020_Toronto_Weather.csv.bz2")
df_weather.to_csv(PATH_WEATHER_FINAL_CSV_BZ2, compression='bz2')

df_weather.head()

Unnamed: 0_level_0,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2010-01-01,1.9,-9.9,-3.0,21.0,0.0,0.0,0.63,0.0
2010-01-02,-9.7,-18.5,-14.05,32.05,0.0,0.0,0.33,1.0
2010-01-03,-9.3,-17.0,-12.9,30.9,0.0,0.0,1.9,1.0
2010-01-04,-6.7,-13.5,-9.85,27.85,0.0,0.0,0.27,3.5
2010-01-05,-3.6,-12.5,-7.65,25.65,0.0,0.0,1.3,4.5


In [48]:
# write toronto_weather data to a .csv.bz2 (compressed file)
PATH_STATIONS_FINAL_CSV = os.path.join(STATIONS_PROCESSED_UNZIPPED_DIRECTORY,
                                  "Toronto_Fire_Station_Locations_PROCESSED.csv")

# read the csv into a DataFrame
df_locations = pd.read_csv(PATH_STATIONS_FINAL_CSV, index_col="INDEX")

# write the csv to a csv.bz2 (compressed file)
PATH_STATIONS_FINAL_CSV_BZ2 = os.path.join(STATIONS_PROCESSED_ZIPPED_DIRECTORY, "Toronto_Fire_Station_Locations.csv.bz2")
df_locations.to_csv(PATH_STATIONS_FINAL_CSV_BZ2, compression='bz2')

df_locations.head()

Unnamed: 0_level_0,NAME,ADDRESS,LATITUDE,LONGITUDE,WARD_NAME,MUN_NAME
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
214,FIRE STATION 214,745 MEADOWVALE RD,43.794219,-79.163605,Scarborough East (44),Scarborough
215,FIRE STATION 215,5318 LAWRENCE AVE E,43.777401,-79.148069,Scarborough East (44),Scarborough
221,FIRE STATION 221,2575 EGLINTON AVE E,43.734799,-79.255066,Scarborough Southwest (35),Scarborough
222,FIRE STATION 222,755 WARDEN AVE,43.720408,-79.284094,Scarborough Southwest (35),Scarborough
223,FIRE STATION 223,116 DORSET RD,43.723965,-79.233264,Scarborough Southwest (36),Scarborough
