# Jupyter Notebook Purpose
- read the .csv.gz compressed file dataset into pandas DataFrame and perform draw total counts on interactive Google Maps
    - explore through geospatial analysis where the top 1000 counts of geospatial locations (budget concerns with API) of TFS Fire Incidents occur

## Group 10 Members

- ### A. Nidhi Punja - [Email](mailto:npunja@uwaterloo.ca)
- ### B. Judith Roth - [Email](mailto:j5roth@uwaterloo.ca)
- ### C. Iman Dordizadeh Basirabad - [Email](mailto:idordiza@uwaterloo.ca)
- ### D. Daniel Adam Cebula - [Email](mailto:dacebula@uwaterloo.ca)
- ### E. Cynthia Fung - [Email](mailto:c27fung@uwaterloo.ca)
- ### F. Ben Klassen - [Email](mailto:b6klasse@uwaterloo.ca)

In [1]:
# Group 10 Collaborators
COLLABORATORS = ["Nidhi Punja",
                 "Judith Roth",
                 "Iman Dordizadeh Basirabad",
                 "Daniel Adam Cebula",
                 "Cynthia Fung",
                 "Ben Klassen"]

# Group 10 Members
for _ in COLLABORATORS:
    print(f"Group 10 Member: {_:->30}")

Group 10 Member: -------------------Nidhi Punja
Group 10 Member: -------------------Judith Roth
Group 10 Member: -----Iman Dordizadeh Basirabad
Group 10 Member: ------------Daniel Adam Cebula
Group 10 Member: ------------------Cynthia Fung
Group 10 Member: -------------------Ben Klassen


# Table of Contents

## 1. [Python Dependecies](#1.-Python-Libraries-and-Dependencies[1,2,3,4,5,6])
### 1a. [jupyter-gmaps interactive google maps](#1a.-jupyter-gmaps-Interactive-Google-Maps-Installation-Steps)
___
## 2. [Folder Creation](#2.-Folder-Creation-for-Data-Analyses-and-Visualization)
___
## 3. [Read in the Data](#3.-Read-in-DataSet-as-a-Pandas-DataFrame)
___
## 4. [GroupBy GeoSpatial DataFrames](#4.-GroupBy-TFS-Fire-Incidents-Geospatial-Data)
___
## 5. [jupyter-gmaps Google Maps](#5.-Jupyter-Gmaps-Interactive-Google-Maps)
### 5a. [Geospatial Findings](#5a.-Interactive-Google-Maps-Findings)
___
## 6. [References](#6.-Jupyter-Notebook-References)
___

# 1. Python Libraries and Dependencies<sup>[1,2,3,4,5,6]</sup>

In [2]:
# Python Modules for Miscellaneous reasons
import os        # portable way to use operating system functionalities
import datetime  # python classes for manipulating dates and times
import dateutil  # powerful extensions to standard datetime Python module
import re        # used for Python regex library
from IPython.display import display # use this to see the entire DataFrame in the right format
from create_folder import create_folder # create folder function that I have defined and placed in create_folder.py file
import warnings  # suppress warnings from various Python libraries
try:
    from api_keys import google_maps_api_key # import the google maps api key (need to create Google Developer Account)
except:
    print("Google Maps API-KEY not present")

In [3]:
# DATA ANALYSIS / VISUALIZATION Python Dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## 1a. jupyter-gmaps Interactive Google Maps Installation Steps

### 1. Install jupyter-gmaps - [link](https://jupyter-gmaps.readthedocs.io/en/latest/install.html)

#### A. Use conda ```conda install -c conda-forge gmaps``` or pip ```pip install gmaps```
    - press 'y' and ENTER
#### B. ```jupyter nbextension enable --py --sys-prefix widgetsnbextension```
#### C. ```jupyter nbextension enable --py --sys-prefix gmaps```
#### D. ```jupyter labextension install @jupyter-widgets/jupyterlab-manager```
#### E. Open notebook in Jupyter Notebook / Lab again
- ```jupyter lab```
- ```jupyter notebook```

In [4]:
# import the jupyter-gmaps Python library
import gmaps

# configure the Google Maps API_KEY with jupyter-gmaps
gmaps.configure(api_key=google_maps_api_key)

# 2. Folder Creation for Data Analyses and Visualization
- generate a folder that will hold data analyses / visualizations

In [5]:
# get a connection to the major directory holding the data / metadata of the DataFrame
PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name="PROCESSED_ZIPPED")

# generate a images folder and a analyses folder to hold all relevant information
IMAGES_DIRECTORY = create_folder(folder_name="IMAGES")
ANALYSES_DIRECTORY = create_folder(folder_name="ANALYSES")

# get folders for fire incidents, toronto weather and fire station location data
FIRE_PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_ZIPPED_DIRECTORY, "FIRE_INCIDENTS"))
WEATHER_PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_ZIPPED_DIRECTORY, "TORONTO_WEATHER"))
STATIONS_PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_ZIPPED_DIRECTORY, "FIRE_STATIONS"))

# 3. Read in DataSet as a Pandas DataFrame
- DataSet is a compressed file (.csv.bz2)
- DataSet Metadata is a .csv file

In [6]:
# read in the metadata from .csv into memory
# use this metatdata to explain the columns
df_metadata = pd.read_csv(
    os.path.join(FIRE_PROCESSED_ZIPPED_DIRECTORY, "FINAL_DATASET_METADATA.csv"),
    index_col="COLUMN_NAME")

# display it
with pd.option_context('display.max_colwidth', 300):
    display(df_metadata)

Unnamed: 0_level_0,COLUMN_DESCRIPTION
COLUMN_NAME,Unnamed: 1_level_1
INCIDENT_NUM,Toronto Fire Services (TFS) incident number. Used as index for the DataFrame because it is unique for each call.
DATETIME,"Year, Month, Day, Hour, Minute, Second of when TFS was notified of the incident (alarm)."
MINUTES_ARRIVAL,Minutes it took for the first unit to arrive (after alarm).
MINUTES_LEAVE,Minutes it took for the first unit to leave (after arrival).
FIRE_STATION,Number of TFS Station where incident occurred.
FIRE_STATION_CLOSEST,Number of closest (by smallest Haversine formula distance calculation) TFS Station where incident occurred.
NAME,Name of column FIRE_STATION TFS Fire Station.
ADDRESS,Address of column FIRE_STATION TFS Fire Station.
LATITUDE_STATION,Latitude (Decimal Degrees) of column FIRE_STATION TFS Fire Station.
LONGITUDE_STATION,Longitude (Decimal Degrees) of column FIRE_STATION TFS Fire Station.


In [7]:
# read the merged DataFrame from .csv.bz2 file into DataFrame
PATH_MERGED_CSV_BZ2 = os.path.join(FIRE_PROCESSED_ZIPPED_DIRECTORY, "FINAL_DATASET.csv.bz2")

# pandas DataFrame generated and is referenced by df variable name
df = pd.read_csv(PATH_MERGED_CSV_BZ2,
                 compression='bz2', index_col="INCIDENT_NUM", parse_dates=["DATETIME"])

# FOR GEOSPATIAL DATA THIS IS NOT ENTIRELY THE CASE
# make the columns categorical (for faster queries)
# df["CAD_TYPE"] = pd.Categorical(df["CAD_TYPE"])
# df["CAD_CALL_TYPE"] = pd.Categorical(df["CAD_CALL_TYPE"])
# df["FINAL_TYPE"] = pd.Categorical(df["FINAL_TYPE"])
# df["CALL_SOURCE"] = pd.Categorical(df["CALL_SOURCE"])
# df["NAME"] = pd.Categorical(df["NAME"])
# df["ADDRESS"] = pd.Categorical(df["ADDRESS"])
# df["WARD_NAME"] = pd.Categorical(df["WARD_NAME"])
# df["MUN_NAME"] = pd.Categorical(df["MUN_NAME"])

# display it
with pd.option_context('display.max_columns', None):
    display(df.head())

Unnamed: 0_level_0,DATETIME,MINUTES_ARRIVAL,MINUTES_LEAVE,FIRE_STATION,FIRE_STATION_CLOSEST,NAME,ADDRESS,LATITUDE_STATION,LONGITUDE_STATION,WARD_NAME,MUN_NAME,CAD_TYPE,CAD_CALL_TYPE,FINAL_TYPE,ALARM_LEVEL,CALL_SOURCE,PERSONS_RESCUED,LATITUDE,LONGITUDE,MAX_TEMP,MIN_TEMP,MEAN_TEMP,HDD,CDD,RAIN_MM,PRECIP_MM,SNOW_CM
INCIDENT_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
F11000010,2011-01-01 00:03:43,6.317,21.267,342.0,342.0,FIRE STATION 342,106 ASCOT AVE,43.679375,-79.44863,Davenport (17),former Toronto,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,0.0,43.679099,-79.461761,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
F11000011,2011-01-01 00:03:55,5.117,6.183,131.0,131.0,FIRE STATION 131,3135 YONGE ST,43.726226,-79.402161,Don Valley West (25),former Toronto,Medical,Carbon Monoxide,89 - Other Medical,1,01 - 911,0.0,43.726342,-79.396401,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
F11000012,2011-01-01 00:05:03,4.517,17.617,324.0,324.0,FIRE STATION 324,840 GERRARD ST E,43.667767,-79.343518,Toronto-Danforth (30),former Toronto,Medical,Medical,89 - Other Medical,1,03 - From Ambulance,0.0,43.668548,-79.335324,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
F11000013,2011-01-01 00:04:46,6.0,9.883,345.0,345.0,FIRE STATION 345,1287 DUFFERIN ST,43.667401,-79.438153,Davenport (18),former Toronto,FIG - Fire - Grass/Rubbish,Emergency Fire,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",1,01 - 911,0.0,43.657123,-79.434313,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0
F11000014,2011-01-01 00:06:07,4.933,10.133,142.0,142.0,FIRE STATION 142,2753 JANE ST,43.745991,-79.514374,York Centre (9),North York,FAHR - Alarm Highrise Residential,Emergency Fire,"33 - Human - Malicious intent, prank",1,05 - Telephone from Monitoring Agency,0.0,43.75984,-79.516182,11.5,0.9,6.4,11.6,0.0,3.7,8.7,0.0


# 4. GroupBy TFS Fire Incidents Geospatial Data
- groupby by Latitude and Longitude (excluding nulls) for major / minor intersections and explore the FINAL_TYPE, CAD_CALL_TYPE columns for 2011 - 2018 years
- groupby Laittude and Longitude (excluding nulls) for TFS Fire Stations and explore FINAL_TYPE, CAD_CALL_TYPE columns for 2011 - 2018 years

In [8]:
# drop all rows where latitude and longitude is null
df.dropna(subset=['LATITUDE', 'LONGITUDE'], inplace=True)

# Grab the top 10 "FINAL_TYPE" categories only
top_10_FINAL_TYPE = [x for x in df["FINAL_TYPE"].value_counts().index[:10]]
df_FINAL_TYPE_ = df.loc[df["FINAL_TYPE"].isin(top_10_FINAL_TYPE), :]

# Grab the top 10 "CAD_CALL_TYPE"
top_10_CAD_CALL_TYPE = [x for x in df["CAD_CALL_TYPE"].value_counts().index[:10]]
df_CAD_CALL_TYPE_ = df.loc[df["CAD_CALL_TYPE"].isin(top_10_CAD_CALL_TYPE), :]

# no nulls should be present in "LATITUDE" and "LONGITUDE" columns
df_FINAL_TYPE_.isnull().sum()

DATETIME                    0
MINUTES_ARRIVAL         12546
MINUTES_LEAVE           12547
FIRE_STATION                0
FIRE_STATION_CLOSEST        0
NAME                        0
ADDRESS                     0
LATITUDE_STATION            0
LONGITUDE_STATION           0
WARD_NAME                   0
MUN_NAME                    0
CAD_TYPE                    0
CAD_CALL_TYPE               0
FINAL_TYPE                  0
ALARM_LEVEL                 0
CALL_SOURCE                22
PERSONS_RESCUED            19
LATITUDE                    0
LONGITUDE                   0
MAX_TEMP                    0
MIN_TEMP                    0
MEAN_TEMP                   0
HDD                         0
CDD                         0
RAIN_MM                     0
PRECIP_MM                   0
SNOW_CM                     0
dtype: int64

In [9]:
# Create a new dataframe with only Latitude / Longitude and FINAL_TYPE
df_FINAL_TYPE = df_FINAL_TYPE_.loc[:, ["LATITUDE", "LONGITUDE", "FINAL_TYPE"]]

# apply the pandas get dummies to turn catgory column to numerical (1 or 0)
df_FINAL_TYPE_DUMMIES = pd.get_dummies(df_FINAL_TYPE)

# groupby to get a count for each supplied LATITUDE and lONGITUDE pair
df_FINAL_TYPE_GROUPBY1 = (df_FINAL_TYPE_DUMMIES.iloc[:, 0:3]
              .groupby(["LATITUDE", "LONGITUDE"])
              .count()
              .rename(columns={df_FINAL_TYPE_DUMMIES.columns[2]:"TOTAL COUNT"}))

# groupby to get a sum (get relative count of each FINAL_TYPE category)
df_FINAL_TYPE_GROUPBY2 = (df_FINAL_TYPE_DUMMIES
              .groupby(["LATITUDE", "LONGITUDE"])
              .sum())

# merge the 2 dataframes together
df_FINAL_TYPE_GROUPBY = (df_FINAL_TYPE_GROUPBY1.merge(df_FINAL_TYPE_GROUPBY2,
                                                      how="inner", left_index=True, right_index=True)
                           .sort_values(by="TOTAL COUNT", ascending=False)
                           .reset_index())

# save the dataframe as a csv file
df_FINAL_TYPE_GROUPBY.to_csv(os.path.join(ANALYSES_DIRECTORY, "Top_10_FINAL_TYPE_Geospatial.csv"),
                             index=False)

df_FINAL_TYPE_GROUPBY

Unnamed: 0,LATITUDE,LONGITUDE,TOTAL COUNT,FINAL_TYPE_22 - Pot on Stove (no fire),FINAL_TYPE_24 - Other Cooking/toasting/smoke/steam (No Fire),FINAL_TYPE_31 - Alarm Equipment - Malfunction,FINAL_TYPE_32 - Alarm System Equipment - Accidental activation (exc. code 35),"FINAL_TYPE_33 - Human - Malicious intent, prank",FINAL_TYPE_34 - Human - Perceived Emergency,FINAL_TYPE_35 - Human - Accidental (alarm accidentally activated by person),FINAL_TYPE_62 - Vehicle Collision,FINAL_TYPE_66 - Persons Trapped in Elevator,FINAL_TYPE_89 - Other Medical
0,43.658295,-79.371032,2239,95.0,49.0,249.0,274.0,511.0,69.0,149.0,10.0,52.0,781.0
1,43.656163,-79.370123,2080,68.0,59.0,198.0,273.0,351.0,85.0,127.0,4.0,150.0,765.0
2,43.667295,-79.373778,1755,71.0,62.0,101.0,128.0,219.0,133.0,101.0,5.0,231.0,704.0
3,43.746814,-79.583807,1553,51.0,28.0,155.0,116.0,92.0,34.0,54.0,16.0,156.0,851.0
4,43.712046,-79.281003,1417,2.0,12.0,47.0,30.0,12.0,54.0,18.0,35.0,15.0,1192.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19880,43.775506,-79.164807,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
19881,43.763875,-79.392464,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
19882,43.636648,-79.518873,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
19883,43.643878,-79.409802,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [10]:
# Create a new dataframe with only Latitude / Longitude of TFS Stations and FINAL_TYPE
df_FINAL_TYPE_STATION = df_FINAL_TYPE_.loc[:, ["FIRE_STATION", "LATITUDE_STATION", "LONGITUDE_STATION", "FINAL_TYPE"]]

# apply the pandas get dummies to turn catgory column to numerical (1 or 0)
df_FINAL_TYPE_STATION_DUMMIES = pd.get_dummies(df_FINAL_TYPE_STATION)

# groupby to get a count for each supplied LATITUDE and lONGITUDE pair
df_FINAL_TYPE_STATION_GROUPBY1 = (df_FINAL_TYPE_STATION_DUMMIES.iloc[:, 0:4]
              .groupby(["FIRE_STATION", "LATITUDE_STATION", "LONGITUDE_STATION"])
              .count()
              .rename(columns={df_FINAL_TYPE_STATION_DUMMIES.columns[3]:"TOTAL COUNT"}))

# groupby to get a sum (get relative count of each FINAL_TYPE category)
df_FINAL_TYPE_STATION_GROUPBY2 = (df_FINAL_TYPE_STATION_DUMMIES
              .groupby(["FIRE_STATION", "LATITUDE_STATION", "LONGITUDE_STATION"])
              .sum())

# merge the 2 dataframes together
df_FINAL_TYPE_STATION_GROUPBY = (df_FINAL_TYPE_STATION_GROUPBY1.merge(df_FINAL_TYPE_STATION_GROUPBY2,
                                                      how="inner", left_index=True, right_index=True)
                           .sort_values(by="TOTAL COUNT", ascending=False)
                           .reset_index())

# save the dataframe as a csv file
df_FINAL_TYPE_STATION_GROUPBY.to_csv(os.path.join(ANALYSES_DIRECTORY, "Top_10_FINAL_TYPE_Fire_Stations_Geospatial.csv"),
                             index=False)

df_FINAL_TYPE_STATION_GROUPBY

Unnamed: 0,FIRE_STATION,LATITUDE_STATION,LONGITUDE_STATION,TOTAL COUNT,FINAL_TYPE_22 - Pot on Stove (no fire),FINAL_TYPE_24 - Other Cooking/toasting/smoke/steam (No Fire),FINAL_TYPE_31 - Alarm Equipment - Malfunction,FINAL_TYPE_32 - Alarm System Equipment - Accidental activation (exc. code 35),"FINAL_TYPE_33 - Human - Malicious intent, prank",FINAL_TYPE_34 - Human - Perceived Emergency,FINAL_TYPE_35 - Human - Accidental (alarm accidentally activated by person),FINAL_TYPE_62 - Vehicle Collision,FINAL_TYPE_66 - Persons Trapped in Elevator,FINAL_TYPE_89 - Other Medical
0,314.0,43.663066,-79.384665,24306,387.0,708.0,2090.0,2901.0,1047.0,860.0,1390.0,311.0,715.0,13897.0
1,332.0,43.648315,-79.389567,22645,233.0,426.0,2300.0,2979.0,787.0,949.0,1382.0,620.0,689.0,12280.0
2,426.0,43.645095,-79.438870,22623,784.0,640.0,1041.0,1666.0,525.0,1257.0,806.0,918.0,678.0,14308.0
3,325.0,43.659379,-79.364857,21805,604.0,459.0,1499.0,1980.0,2133.0,1026.0,1010.0,362.0,802.0,11930.0
4,313.0,43.672095,-79.375564,18184,439.0,499.0,996.0,1584.0,709.0,998.0,1082.0,664.0,898.0,10315.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,215.0,43.777401,-79.148069,3603,59.0,55.0,185.0,143.0,25.0,183.0,65.0,130.0,3.0,2755.0
80,211.0,43.824001,-79.242873,3554,54.0,44.0,325.0,313.0,17.0,127.0,78.0,457.0,16.0,2123.0
81,424.0,43.658363,-79.479633,1651,24.0,36.0,56.0,98.0,16.0,87.0,27.0,26.0,9.0,1272.0
82,346.0,43.633677,-79.421330,850,1.0,7.0,129.0,92.0,20.0,24.0,54.0,23.0,24.0,476.0


In [11]:
# Create a new dataframe with only Latitude / Longitude and CAD_CALL_TYPE
df_CAD_CALL_TYPE = df_CAD_CALL_TYPE_.loc[:, ["LATITUDE", "LONGITUDE", "CAD_CALL_TYPE"]]

# apply the pandas get dummies to turn catgory column to numerical (1 or 0)
df_CAD_CALL_TYPE_DUMMIES = pd.get_dummies(df_CAD_CALL_TYPE)

# groupby to get a count for each supplied LATITUDE and lONGITUDE pair
df_CAD_CALL_TYPE_GROUPBY1 = (df_CAD_CALL_TYPE_DUMMIES.iloc[:, 0:3]
              .groupby(["LATITUDE", "LONGITUDE"])
              .count()
              .rename(columns={df_CAD_CALL_TYPE_DUMMIES.columns[2]:"TOTAL COUNT"}))

# groupby to get a sum (get relative count of each CAD_CALL_TYPE category)
df_CAD_CALL_TYPE_GROUPBY2 = (df_CAD_CALL_TYPE_DUMMIES
              .groupby(["LATITUDE", "LONGITUDE"])
              .sum())

# merge the 2 dataframes together
df_CAD_CALL_TYPE_GROUPBY = (df_CAD_CALL_TYPE_GROUPBY1.merge(df_CAD_CALL_TYPE_GROUPBY2,
                                                      how="inner", left_index=True, right_index=True)
                           .sort_values(by="TOTAL COUNT", ascending=False)
                           .reset_index())

# save the dataframe as a csv file
df_CAD_CALL_TYPE_GROUPBY.to_csv(os.path.join(ANALYSES_DIRECTORY, "Top_10_CAD_CALL_TYPE_Geospatial.csv"),
                             index=False)

df_CAD_CALL_TYPE_GROUPBY

Unnamed: 0,LATITUDE,LONGITUDE,TOTAL COUNT,CAD_CALL_TYPE_CBRN & Hazardous Materials,CAD_CALL_TYPE_Carbon Monoxide,CAD_CALL_TYPE_Emergency Fire,CAD_CALL_TYPE_Medical,CAD_CALL_TYPE_Non Emergency,CAD_CALL_TYPE_Other Emergency Events,CAD_CALL_TYPE_Technical Rescue,CAD_CALL_TYPE_Vehicle Incident
0,43.658295,-79.371032,2405,17.0,14.0,1435.0,770.0,7.0,79.0,66.0,17.0
1,43.656163,-79.370123,2298,15.0,18.0,1266.0,749.0,3.0,73.0,166.0,8.0
2,43.667295,-79.373778,1977,42.0,7.0,868.0,694.0,12.0,94.0,255.0,5.0
3,43.746814,-79.583807,1769,10.0,8.0,629.0,847.0,6.0,75.0,182.0,12.0
4,43.712046,-79.281003,1560,7.0,1.0,274.0,1190.0,5.0,28.0,19.0,36.0
...,...,...,...,...,...,...,...,...,...,...,...
20067,43.764162,-79.211585,1,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
20068,43.637785,-79.398549,1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
20069,43.743974,-79.281060,1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
20070,43.818776,-79.175174,1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [12]:
# Create a new dataframe with only Latitude / Longitude of TFS Stations and CAD_CALL_TYPE
df_CAD_CALL_TYPE_STATION = df_FINAL_TYPE_.loc[:, ["FIRE_STATION", "LATITUDE_STATION", "LONGITUDE_STATION", "CAD_CALL_TYPE"]]

# apply the pandas get dummies to turn catgory column to numerical (1 or 0)
df_CAD_CALL_TYPE_STATION_DUMMIES = pd.get_dummies(df_CAD_CALL_TYPE_STATION)

# groupby to get a count for each supplied LATITUDE and lONGITUDE Station pair
df_CAD_CALL_TYPE_STATION_GROUPBY1 = (df_CAD_CALL_TYPE_STATION_DUMMIES.iloc[:, 0:4]
              .groupby(["FIRE_STATION", "LATITUDE_STATION", "LONGITUDE_STATION"])
              .count()
              .rename(columns={df_CAD_CALL_TYPE_STATION_DUMMIES.columns[3]:"TOTAL COUNT"}))

# groupby to get a sum (get relative count of each CAD_CALL_TYPE category)
df_CAD_CALL_TYPE_STATION_GROUPBY2 = (df_CAD_CALL_TYPE_STATION_DUMMIES
              .groupby(["FIRE_STATION", "LATITUDE_STATION", "LONGITUDE_STATION"])
              .sum())

# merge the 2 dataframes together
df_CAD_CALL_TYPE_STATION_GROUPBY = (df_CAD_CALL_TYPE_STATION_GROUPBY1.merge(df_CAD_CALL_TYPE_STATION_GROUPBY2,
                                                      how="inner", left_index=True, right_index=True)
                           .sort_values(by="TOTAL COUNT", ascending=False)
                           .reset_index())

# save the dataframe as a csv file
df_CAD_CALL_TYPE_STATION_GROUPBY.to_csv(os.path.join(ANALYSES_DIRECTORY, "Top_10_CAD_CALL_TYPE_Fire_Stations_Geospatial.csv"),
                             index=False)

df_CAD_CALL_TYPE_STATION_GROUPBY

Unnamed: 0,FIRE_STATION,LATITUDE_STATION,LONGITUDE_STATION,TOTAL COUNT,CAD_CALL_TYPE_CBRN & Hazardous Materials,CAD_CALL_TYPE_Carbon Monoxide,CAD_CALL_TYPE_Emergency Fire,CAD_CALL_TYPE_Medical,CAD_CALL_TYPE_Non Emergency,CAD_CALL_TYPE_Other Emergency Events,CAD_CALL_TYPE_Technical Rescue,CAD_CALL_TYPE_Vehicle Incident
0,314.0,43.663066,-79.384665,24306,83.0,101.0,8785.0,13794.0,21.0,512.0,740.0,270.0
1,332.0,43.648315,-79.389567,22645,84.0,157.0,8271.0,12131.0,25.0,714.0,724.0,539.0
2,426.0,43.645095,-79.438870,22623,119.0,674.0,5952.0,13621.0,47.0,719.0,705.0,786.0
3,325.0,43.659379,-79.364857,21805,118.0,349.0,8050.0,11586.0,22.0,532.0,832.0,316.0
4,313.0,43.672095,-79.375564,18184,90.0,345.0,5806.0,9946.0,46.0,452.0,915.0,584.0
...,...,...,...,...,...,...,...,...,...,...,...,...
79,215.0,43.777401,-79.148069,3603,26.0,323.0,596.0,2426.0,16.0,117.0,6.0,93.0
80,211.0,43.824001,-79.242873,3554,14.0,233.0,856.0,1892.0,16.0,160.0,18.0,365.0
81,424.0,43.658363,-79.479633,1651,13.0,99.0,283.0,1174.0,0.0,52.0,9.0,21.0
82,346.0,43.633677,-79.421330,850,1.0,1.0,307.0,476.0,0.0,23.0,25.0,17.0


# 5. Jupyter Gmaps Interactive Google Maps
- from the 4 DataFrame above draw several Interactive Google Maps using Google Maps API Key

In [13]:
# lets only look at the top 1000 locations
df_FINAL_TYPE_TOP_1000 = df_FINAL_TYPE_GROUPBY[:1000].copy()

# get locations
locations = df_FINAL_TYPE_TOP_1000[["LATITUDE", "LONGITUDE"]]

# get total count
total_count = df_FINAL_TYPE_TOP_1000["TOTAL COUNT"]

# make gmaps figure
fig = gmaps.figure(layout={
        'width': '400px',
        'height': '400px',
        'padding': '3px',
        'border': '1px solid black'
})

# create heat_map layer
heat_layer = gmaps.heatmap_layer(locations,
                                 weights=total_count,
                                 dissipating=False,
                                 max_intensity=300,
                                 point_radius=0.005)

# add heat_layer to figure
fig.add_layer(heat_layer)

# display the figure
display(fig)

Figure(layout=FigureLayout(border='1px solid black', height='400px', padding='3px', width='400px'))

In [14]:
# lets only look at the top 1000 locations
df_FINAL_TYPE_STATION_GROUPBY_COPY = df_FINAL_TYPE_STATION_GROUPBY.copy()

# get locations
locations_station = df_FINAL_TYPE_STATION_GROUPBY_COPY[["LATITUDE_STATION", "LONGITUDE_STATION"]]

# get total count
total_count_station = df_FINAL_TYPE_STATION_GROUPBY_COPY["TOTAL COUNT"]

# make gmaps figure
fig_station = gmaps.figure(layout={
        'width': '400px',
        'height': '400px',
        'padding': '3px',
        'border': '1px solid black'
})

# generate popup text
info_box_template = """
<dl>
<dt>Station: {FIRE_STATION}</dt>
</dl>
<dl>
<dt>Count: {TOTAL COUNT}</dt>
</dl>
"""
count_info = [info_box_template.format(**row) for index, row in df_FINAL_TYPE_STATION_GROUPBY.iterrows()]



# create marker layer
marker_station = gmaps.symbol_layer(locations_station,
                                    fill_color="blue",
                                    stroke_color="blue",
                                    scale=2,
                                    info_box_content=count_info)

# add heat_layer and symbols to figure
fig_station.add_layer(heat_layer)
fig_station.add_layer(marker_station)

# display the figure
display(fig_station)

Figure(layout=FigureLayout(border='1px solid black', height='400px', padding='3px', width='400px'))

In [15]:
# close the 2 jupyter gmap figures
fig.close()
fig_station.close()

In [16]:
# lets only look at the top 1000 locations
df_CAD_CALL_TYPE_TOP_1000 = df_CAD_CALL_TYPE_GROUPBY[:1000].copy()

# get locations
locations = df_CAD_CALL_TYPE_TOP_1000[["LATITUDE", "LONGITUDE"]]

# get total count
total_count = df_CAD_CALL_TYPE_TOP_1000["TOTAL COUNT"]

# make gmaps figure
fig = gmaps.figure(layout={
        'width': '400px',
        'height': '400px',
        'padding': '3px',
        'border': '1px solid black'
})

# create heat_map layer
heat_layer = gmaps.heatmap_layer(locations,
                                 weights=total_count,
                                 dissipating=False,
                                 max_intensity=300,
                                 point_radius=0.005)

# add heat_layer to figure
fig.add_layer(heat_layer)

# display the figure
display(fig)

Figure(layout=FigureLayout(border='1px solid black', height='400px', padding='3px', width='400px'))

In [17]:
# lets only look at the top 1000 locations
df_CAD_CALL_TYPE_STATION_GROUPBY_COPY = df_CAD_CALL_TYPE_STATION_GROUPBY.copy()

# get locations
locations_station = df_CAD_CALL_TYPE_STATION_GROUPBY_COPY[["LATITUDE_STATION", "LONGITUDE_STATION"]]

# get total count
total_count_station = df_CAD_CALL_TYPE_STATION_GROUPBY_COPY["TOTAL COUNT"]

# make gmaps figure
fig_station = gmaps.figure(layout={
        'width': '400px',
        'height': '400px',
        'padding': '3px',
        'border': '1px solid black'
})

# generate popup text
info_box_template = """
<dl>
<dt>Station: {FIRE_STATION}</dt>
</dl>
<dl>
<dt>Count: {TOTAL COUNT}</dt>
</dl>
"""
count_info = [info_box_template.format(**row) for index, row in df_CAD_CALL_TYPE_STATION_GROUPBY_COPY.iterrows()]



# create marker layer
marker_station = gmaps.symbol_layer(locations_station,
                                    fill_color="blue",
                                    stroke_color="blue",
                                    scale=2,
                                    info_box_content=count_info)

# add heat_layer and symbols to figure
fig_station.add_layer(heat_layer)
fig_station.add_layer(marker_station)

# display the figure
display(fig_station)

Figure(layout=FigureLayout(border='1px solid black', height='400px', padding='3px', width='400px'))

In [18]:
# close the 2 jupyter gmap figures
fig.close()
fig_station.close()

## 5a. Interactive Google Maps Findings
- further diagrams could be done by analyzing sub-categories in FINAL_TYPE and CAD_CALL_TYPE but were not done due to cost concerns
- from a surface view it appears that the top 1000 geospatial locations that are serviced by TFS Fire Stations are right around them
    - no clear bias exists that indicates a missappropriation of TFS resources.

# 6. Jupyter Notebook References

[1] "Python Documentation."  *Python Software Foundation*.  [Online](https://docs.python.org/).  [Accessed August 04, 2020]

[2] G. Niemeyer.  "dateutil - powerful extensions to datetime."  *dateutil*.  [Online](https://github.com/dateutil/dateutil).  [Accessed August 04, 2020]

[3] "pandas."  *PyData*.  [Online](https://pandas.pydata.org/).  [Accessed August 04, 2020]

[4] "NumPy - The fundamental package for scientific computing with Python."  *NumPy*.  [Online](https://numpy.org/).  [Accessed August 04, 2020]

[5] "Matplotlib:  Visualization with Python."  *The Matplotlib Development team*.  [Online](https://matplotlib.org/).  [Accessed August 04, 2020]

[6] P. Bugnion.  "jupyter-gmaps:  Interactive Google maps in the IPython Notebook."  *jupyter-gmaps*.  [Online](https://github.com/pbugnion/gmaps).  [Accessed August 04, 2020]