# Jupyter Notebook Purpose
- read the .csv.gz compressed file dataset into pandas DataFrame and perform analyses and visualizations on the dataset
    - explore dataset features and labels to find trends that Toronto Fire Services (TFS) Basic Incidents follow.

## Group 10 Members

- ### A. Nidhi Punja - [Email](mailto:npunja@uwaterloo.ca)
- ### B. Judith Roth - [Email](mailto:j5roth@uwaterloo.ca)
- ### C. Iman Dordizadeh Basirabad - [Email](mailto:idordiza@uwaterloo.ca)
- ### D. Daniel Adam Cebula - [Email](mailto:dacebula@uwaterloo.ca)
- ### E. Cynthia Fung - [Email](mailto:c27fung@uwaterloo.ca)
- ### F. Ben Klassen - [Email](mailto:b6klasse@uwaterloo.ca)

In [None]:
# Group 10 Collaborators
COLLABORATORS = ["Nidhi Punja",
                 "Judith Roth",
                 "Iman Dordizadeh Basirabad",
                 "Daniel Adam Cebula",
                 "Cynthia Fung",
                 "Ben Klassen"]

# Group 10 Members
for _ in COLLABORATORS:
    print(f"Group 10 Member: {_:->30}")

# Table of Contents

## 1. [Python Dependecies](#1.-Python-Libraries-and-Dependencies[1,2,3,4,5,6])
___
## 2. [Folder Creation](#2.-Folder-Creation-for-Data-Analyses-and-Visualization)
___
## 3. [Read in the Data](#3.-Read-in-DataSet-as-a-Pandas-DataFrame)
___
## 4. [Toronto Fire Services (TFS) Incident Types](#4.-Toronto-Fire-Services-(TFS)-Incident-Type-Exploration)
___
## 5. [Toronto Fire Services (TFS) Time Series](#5.-Toronto-Fire-Services-Time-Series-Exploration)
___
## 6. [Weather / Climate effect on TFS Fire Incidents](#6.-The-Effect-of-Toronto-Weather-/-Climate-on-Toronto-Fire-Services-Basic-Incidents)
___
## 7. [Time and Days of the Week effect on TFS Fire Incidents](#7.-The-effect-if-Time-and-Day-of-the-Week-on-TFS-Fire-Incidents)



## 7. [References](#7.-Jupyter-Notebook-References)
___

# 1. Python Libraries and Dependencies<sup>[1,2,3,4,5,6]</sup>

In [None]:
# Python Modules for Miscellaneous reasons
import os        # portable way to use operating system functionalities
import datetime  # python classes for manipulating dates and times
import dateutil  # powerful extensions to standard datetime Python module
import re        # used for Python regex library
from IPython.display import display # use this to see the entire DataFrame in the right format
from create_folder import create_folder # create folder function that I have defined and placed in create_folder.py file
import warnings  # suppress warnings from various Python libraries
import math      # import python math library for various functions
import string    # use this library to remove punctuation

In [None]:
# DATA ANALYSIS / VISUALIZATION Python Dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Seaborn data visualization library based on matplotlib
import seaborn as sns

In [None]:
# some matplotlib libraries for formatting
import matplotlib.ticker as tick
import matplotlib.dates as mdates

# 2. Folder Creation for Data Analyses and Visualization
- generate a folder that will hold data analyses / visualizations

In [None]:
# get a connection to the major directory holding the data / metadata of the DataFrame
PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name="PROCESSED_ZIPPED")

# generate a images folder and a analyses folder to hold all relevant information
IMAGES_DIRECTORY = create_folder(folder_name="IMAGES")
ANALYSES_DIRECTORY = create_folder(folder_name="ANALYSES")

# get folders for fire incidents, toronto weather and fire station location data
FIRE_PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_ZIPPED_DIRECTORY, "FIRE_INCIDENTS"))
WEATHER_PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_ZIPPED_DIRECTORY, "TORONTO_WEATHER"))
STATIONS_PROCESSED_ZIPPED_DIRECTORY = create_folder(folder_name=os.path.join(PROCESSED_ZIPPED_DIRECTORY, "FIRE_STATIONS"))

# 3. Read in DataSet as a Pandas DataFrame
- DataSet is a compressed file (.csv.bz2)
- DataSet Metadata is a .csv file

In [None]:
# read in the metadata from .csv into memory
# use this metatdata to explain the columns
df_metadata = pd.read_csv(
    os.path.join(FIRE_PROCESSED_ZIPPED_DIRECTORY, "FINAL_DATASET_METADATA.csv"),
    index_col="COLUMN_NAME")

# display it
with pd.option_context('display.max_colwidth', 300):
    display(df_metadata)

In [None]:
# read the merged DataFrame from .csv.bz2 file into DataFrame
PATH_MERGED_CSV_BZ2 = os.path.join(FIRE_PROCESSED_ZIPPED_DIRECTORY, "FINAL_DATASET.csv.bz2")

# pandas DataFrame generated and is referenced by df variable name
df = pd.read_csv(PATH_MERGED_CSV_BZ2,
                 compression='bz2', index_col="INCIDENT_NUM", parse_dates=["DATETIME"])

# make the columns categorical (for faster queries)
df["CAD_TYPE"] = pd.Categorical(df["CAD_TYPE"])
df["CAD_CALL_TYPE"] = pd.Categorical(df["CAD_CALL_TYPE"])
df["FINAL_TYPE"] = pd.Categorical(df["FINAL_TYPE"])
df["CALL_SOURCE"] = pd.Categorical(df["CALL_SOURCE"])
df["NAME"] = pd.Categorical(df["NAME"])
df["ADDRESS"] = pd.Categorical(df["ADDRESS"])
df["WARD_NAME"] = pd.Categorical(df["WARD_NAME"])
df["MUN_NAME"] = pd.Categorical(df["MUN_NAME"])

# display it
with pd.option_context('display.max_columns', None):
    display(df.head())

# 4. Toronto Fire Services (TFS) Incident Type Exploration
- What are the majority of calls for TFS basic fire incidents?
    - 69 Total Final Incident Types
    - Top 15 are ~91% and Top 10 are ~85.5%
    - Fire is number 11

In [None]:
# lets get the total observations for each category in  "FINAL_TYPE" column
df_FINAL_TYPE = pd.DataFrame(df["FINAL_TYPE"].value_counts()
                                             .reset_index()
                                             .rename(columns={
                                                 "index":"FINAL_TYPE",
                                                 "FINAL_TYPE":"COUNT"}))

# set the index to start from 1 and set index name
df_FINAL_TYPE = df_FINAL_TYPE.set_index(np.arange(1, len(df_FINAL_TYPE)+1))
df_FINAL_TYPE.index.name = "INDEX"

# Create a "CODE" and "DESCRIPTION" column that splits the number and description from each other
df_FINAL_TYPE["CODE"] = df_FINAL_TYPE["FINAL_TYPE"].apply(lambda x: int(x.split("-")[0].strip()))
df_FINAL_TYPE["DESCRIPTION"] = df_FINAL_TYPE["FINAL_TYPE"].apply(lambda x: x.split("-")[1].strip())

# create a slice dropping FINAL_TYPE column and Reordering the rest
df_FINAL_TYPE = df_FINAL_TYPE.loc[:, ["CODE", "DESCRIPTION", "COUNT"]]

# display all the rows
with pd.option_context("display.max_rows", None):
    display(df_FINAL_TYPE)

In [None]:
# The top 15 Final Incident Types account for ~91% of all Basic Fire Incidents
print(f"""Top 15 Final Incident Types account for:  """ +
      f"""{(sum(df_FINAL_TYPE["COUNT"][:15]) / sum(df_FINAL_TYPE["COUNT"])):.2%}"""+
      """ of all Basic Fire Incidents.\n\n""")

# And Number 11 is Fire
display(df_FINAL_TYPE.loc[11])

# The top 10 Final Incident Types account for ~85.5% of all Basic Fire Incidents
# Toronto Fire Services might be wise to change their name to Toronto Emergency Services instead...
print(f"""\n\nTop 10 Final Incident Types account for:  """ +
      f"""{(sum(df_FINAL_TYPE["COUNT"][:10]) / sum(df_FINAL_TYPE["COUNT"])):.2%}"""+
      """ of all Basic Fire Incidents.\n\n""")

In [None]:
with warnings.catch_warnings():
    warnings.simplefilter("ignore") # suppress SetwithCopy Warnings from Pandas
    
    # generate 2 columns that provide percentage of total calls received for
    # types of calls
    df_FINAL_TYPE_15 = df_FINAL_TYPE.loc[:15]

    # Total Calls
    df_FINAL_TYPE_15["TOTAL_CALLS_RECEIVED_%"] = df_FINAL_TYPE_15["COUNT"].apply(
                            lambda x: f"{str(np.floor(np.around(x / np.sum(df_FINAL_TYPE['COUNT']), decimals=2) * 100))}%"
                                                                                 )
    # Top 15 Calls
    df_FINAL_TYPE_15["TOP_15_CALLS_RECEIVED_%"] = df_FINAL_TYPE_15["COUNT"].apply(
                            lambda x: f"{str(np.floor(np.around(x / np.sum(df_FINAL_TYPE['COUNT'][:15]), decimals=2) * 100))}%"
                                                                                 )

    display(df_FINAL_TYPE_15)

In [None]:
with warnings.catch_warnings():
    warnings.simplefilter("ignore") # suppress SetwithCopy Warnings from Pandas
    # append a Total Count to the Bottom
    df_FINAL_TYPE_15.loc["Total"] = df_FINAL_TYPE_15[["COUNT"]].sum()

    # Save the DataFrame to .csv file
    # Save the DataFrame to a .csv file
    df_FINAL_TYPE_15.to_csv(os.path.join(ANALYSES_DIRECTORY, "TFS_Final_Incident_Types.csv"))

    display(df_FINAL_TYPE_15)

In [None]:
# free up memory
del df_FINAL_TYPE_15, df_FINAL_TYPE

# 5. Toronto Fire Services Time Series Exploration
- Data has timestamps from January 2011 to December 2018
    - lets explore how many basic fire incidents the TFS responds to on a daily / monthly basis

### Daily

In [None]:
# take a slice of the DataFrame and make a copy of the data
# since the count is take all we needs is a column with no nulls
# CAD_TYPE was chosen
df_DATETIME = df.loc[:, ["DATETIME", "CAD_TYPE"]].copy()

# set the index as the "DATETIME" column which contains the timestamps
# drop the DATETIME column as it is now the index
df_DATETIME.index = df["DATETIME"]
df_DATETIME = df_DATETIME.drop(columns="DATETIME")

# resample dailyand get the daily count of TFS Basic Fire Incidents
df_DATETIME_DAILY = df_DATETIME.resample("D").count().rename(columns={"CAD_TYPE":"COUNT"})
df_DATETIME_DAILY

In [None]:
# visualize the resampled daily count time series with matplotlib
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# get y lim
y_lim1 = 0
y_lim2 = math.ceil(df_DATETIME_DAILY["COUNT"].max() / 1000) * 1000

# get x lim
x_lim1 = datetime.datetime(year=2010, month=10, day=1)
x_lim2 = datetime.datetime(year=2019, month=2, day=1)

# plot the time series
axes.plot(df_DATETIME_DAILY.index,
          df_DATETIME_DAILY["COUNT"],
          color="blue",
          linewidth=0.5,
          label="Daily Call Counts for TFS");

# set axis limits
axes.set_xlim(x_lim1, x_lim2)
axes.set_ylim(y_lim1, y_lim2)

# set minor ticks for x axis to be the months
months = mdates.MonthLocator()
axes.xaxis.set_minor_locator(months)

# set minor ticks for y axis to be values of 100
hundreds = tick.MultipleLocator(100)
axes.yaxis.set_minor_locator(hundreds)

# set title, x axis title and y axis title
axes.set_title("Daily Total Calls Received by the Toronto Fire Services (TFS)")
axes.set_xlabel("Years (2011 - 2018 range)")
axes.set_ylabel("Number of Daily Calls")

# plot a legend
plt.legend()

# magical padding
plt.tight_layout()

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "Daily_Total_Calls_TFS.png"))

In [None]:
# There appears to be a maximum (outlier?) in December 22 2013
df_DATETIME_DAILY.loc[df_DATETIME_DAILY["COUNT"] == df_DATETIME_DAILY["COUNT"].max()]

#### December 22, 2013 historically according to the National Post<sup>[7]</sup> was a nasty ice storm that left 300,000 people in Toronto Isolated
- this was an unprecedented event
- lets slice out this event and observe the time series plot again

In [None]:
# use ~ NOT boolean operator to slice out the outlier
df_DATETIME_DAILY = df_DATETIME_DAILY.loc[~(df_DATETIME_DAILY["COUNT"] == df_DATETIME_DAILY["COUNT"].max())]

# visualize the resampled daily count time series with matplotlib
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# get y lim
y_lim1 = 0
y_lim2 = math.ceil(df_DATETIME_DAILY["COUNT"].max() / 1000) * 1000

# get x lim
x_lim1 = datetime.datetime(year=2010, month=10, day=1)
x_lim2 = datetime.datetime(year=2019, month=2, day=1)

# plot the time series
axes.plot(df_DATETIME_DAILY.index,
          df_DATETIME_DAILY["COUNT"],
          color="blue",
          linewidth=0.5,
          label="Daily Call Counts for TFS");

# set axis limits
axes.set_xlim(x_lim1, x_lim2)
axes.set_ylim(y_lim1, y_lim2)

# set minor ticks for x axis to be the months
months = mdates.MonthLocator()
axes.xaxis.set_minor_locator(months)

# set minor ticks for y axis to be values of 100
fifties = tick.MultipleLocator(50)
axes.yaxis.set_minor_locator(fifties)

# set title, x axis title and y axis title
axes.set_title("Daily Total Calls Received by the Toronto Fire Services (TFS)")
axes.set_xlabel("Years (2011 - 2018 range)")
axes.set_ylabel("Number of Daily Calls")

# plot a legend
plt.legend()

# magical padding
plt.tight_layout()

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "Daily_Total_Calls_TFS_REMOVED_12-22-2013.png"))

In [None]:
# lets get the autocorrelation plot to see if there is a relationship
# between successive days
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# get x lim (lets stop at 2 years 365)
x_lim1 = 0
x_lim2 = 730

# get the y lim
y_lim1 = -1
y_lim2 = 1

# autocorrelation plot
pd.plotting.autocorrelation_plot(df_DATETIME_DAILY, ax=axes)

# set the x and y limits
axes.set_xlim(x_lim1, x_lim2)
axes.set_ylim(y_lim1, y_lim2)
axes.set_title("Daily Total Number of Calls Autocorrelation Plot")

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "Daily_Total_Calls_Autocorrelation.png"))

# It appears for about 300 days that there are strong correlations of Fire Incidents with each other

In [None]:
# lets take the first order difference to see if we can introduce
# stationarity to the time series
df_DATETIME_DAILY_DIFF = df_DATETIME_DAILY.diff(1).dropna()

# lets get the autocorrelation plot to see if there is a relationship
# for the first order difference
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# get x lim (lets stop at 2 years 365)
x_lim1 = 0
x_lim2 = 730

# get the y lim
y_lim1 = -1
y_lim2 = 1

# autocorrelation plot
pd.plotting.autocorrelation_plot(df_DATETIME_DAILY_DIFF, ax=axes)

# set the x and y limits
axes.set_xlim(x_lim1, x_lim2)
axes.set_ylim(y_lim1, y_lim2)
axes.set_title("Daily Total Number of Calls First Order Difference Autocorrelation Plot")

fig.savefig(os.path.join(IMAGES_DIRECTORY, "Daily_Total_Calls_First_Order_Difference_Autocorrelation.png"))

# The Time Series is now stationary, will need to perform statistics tests
# to make sure but visually it is clear

### Monthly

In [None]:
# resample dailyand get the daily count of TFS Basic Fire Incidents
df_DATETIME_MONTHLY = df_DATETIME.resample("M").count().rename(columns={"CAD_TYPE":"COUNT"})
df_DATETIME_MONTHLY

In [None]:
# visualize the resampled monthly count time series with matplotlib
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# get y lim
y_lim1 = 0
y_lim2 = math.ceil(df_DATETIME_MONTHLY["COUNT"].max() / 1000) * 1000

# get x lim
x_lim1 = datetime.datetime(year=2010, month=10, day=1)
x_lim2 = datetime.datetime(year=2019, month=2, day=1)

# plot the time series
axes.plot(df_DATETIME_MONTHLY.index,
          df_DATETIME_MONTHLY["COUNT"],
          color="blue",
          linewidth=0.5,
          label="Monthly Call Counts for TFS");

# set axis limits
axes.set_xlim(x_lim1, x_lim2)
axes.set_ylim(y_lim1, y_lim2)

# set minor ticks for x axis to be the months
months = mdates.MonthLocator()
axes.xaxis.set_minor_locator(months)

# set minor ticks for y axis to be values of 100
hundreds = tick.MultipleLocator(100)
axes.yaxis.set_minor_locator(hundreds)

# set title, x axis title and y axis title
axes.set_title("Monthly Total Calls Received by the Toronto Fire Services (TFS)")
axes.set_xlabel("Years (2011 - 2018 range)")
axes.set_ylabel("Number of Monthly Calls")

# plot a legend
plt.legend()

# magical padding
plt.tight_layout()

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "Monthly_Total_Calls_TFS.png"))

# The spike on Dec. 22, 2013 is kept as it cannot be ruled an outlier
# for monthly resampled data

In [None]:
# lets get the autocorrelation plot to see if there is a relationship
# between successive days
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# get the y lim
y_lim1 = -1
y_lim2 = 1

# autocorrelation plot
pd.plotting.autocorrelation_plot(df_DATETIME_MONTHLY, ax=axes)

# set the x and y limits
axes.set_ylim(y_lim1, y_lim2)
axes.set_title("Monthly Total Number of Calls Autocorrelation Plot")

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "Monthly_Total_Calls_Autocorrelation.png"))

# autocorrelation is present for the first 7 months

In [None]:
# lets take the first order difference to see if we can introduce
# stationarity to the time series
df_DATETIME_MONTHLY_DIFF = df_DATETIME_MONTHLY.diff(1).dropna()

# lets get the autocorrelation plot to see if there is a relationship
# for the first order difference
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# get the y lim
y_lim1 = -1
y_lim2 = 1

# autocorrelation plot
pd.plotting.autocorrelation_plot(df_DATETIME_MONTHLY_DIFF, ax=axes)

# set the y limits
axes.set_ylim(y_lim1, y_lim2)
axes.set_title("Monthly Total Number of Calls First Order Difference Autocorrelation Plot")

fig.savefig(os.path.join(IMAGES_DIRECTORY, "Monthly_Total_Calls_First_Order_Difference_Autocorrelation.png"))

# No autocorrelation for First Order Difference as it appears to be stationary

In [None]:
# delete the dataframes from memory
del df_DATETIME, df_DATETIME_DAILY, df_DATETIME_DAILY_DIFF, df_DATETIME_MONTHLY, df_DATETIME_MONTHLY_DIFF

# 6. The Effect of Toronto Weather / Climate on Toronto Fire Services Basic Incidents
- What role does Toronto Weather and Climate have on TFS Basic Incidents
    - Weather and Climate (Temperature, Rain, Precipitation, Snow levels)
    - Building Energy Demands (Heating Degree Day (HDD)<sup>[8]</sup> and Cooling Degree Day (CDD)<sup>[]</sup>)

In [None]:
# Grab the aggregate statistics for the top 10 "FINAL_TYPE" categories
top_10 = [x for x in df["FINAL_TYPE"].value_counts().index[:10]]

# generate groupby aggregate statistics
df_TEMP_GROUPBY = (df.loc[df["FINAL_TYPE"].isin(top_10), ["MEAN_TEMP", "FINAL_TYPE"]]
                     .groupby("FINAL_TYPE")
                     .agg(["mean", "std"])
                     .dropna()
                     .reset_index())

# rename columns
df_TEMP_GROUPBY.columns = ["FINAL_TYPE", "MEAN_TEMP", "STD_TEMP"]

# set the index
df_TEMP_GROUPBY.index = df_TEMP_GROUPBY["FINAL_TYPE"].apply(lambda x: x.split("-")[0].strip())
df_TEMP_GROUPBY.index.name = "CODE"

# save the DataFrame to a csv
df_TEMP_GROUPBY.to_csv(os.path.join(ANALYSES_DIRECTORY, "Top_10_Calls_and_Temperature.csv"))

df_TEMP_GROUPBY

In [None]:
# Here is a bar plot to show the distribution of temperature for the top 10 calls
fig, axes = plt.subplots(1, 1, figsize=(5, 3.5))

# get y lim
y_lim1 = 0
y_lim2 = 25

# plot the time series
axes.bar(x=df_TEMP_GROUPBY.index,
          height=df_TEMP_GROUPBY["MEAN_TEMP"],
          yerr=df_TEMP_GROUPBY["STD_TEMP"],
          color="blue",
          ecolor="red")

# set axis limits
axes.set_ylim(y_lim1, y_lim2)

# set title, x axis title and y axis title
axes.set_title("Temperature of the Top 10 TFS Fire Incident Calls")
axes.set_xlabel("Codes of Fire Incident Calls")
axes.set_ylabel("Mean Temperature (Celsius)")

# magical padding
plt.tight_layout()

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "Top_10_Calls_and_Temperature.png"))

# As you can see from the standard deviation the temperatures are all over the place
# No relationship can be gleamed

In [None]:
# A series of Histograms for the Top 10 TFS Fire Incident Calls
for x in top_10:
    fig, axes = plt.subplots(1, 1, figsize=(3.5, 3.5))
    axes.hist(df.loc[df["FINAL_TYPE"] == x, "MEAN_TEMP"])
    axes.set_title(re.sub("(.{36})", "\\1\n", x, 0, re.DOTALL))
    axes.set_xlabel("Temperature (Celsius)")
    axes.set_ylabel("Frequency")
    # save the figure without any punctuation or whitespace
    fig.savefig(os.path.join(
        IMAGES_DIRECTORY,
        f"""{x.translate(str.maketrans('', '', string.punctuation)).replace(" ", "")}.png"""));

In [None]:
# Frequency of incidents to specific temperature ranges

# get max and min temperature
maximum_temp = df["MAX_TEMP"].max()
minimum_temp = df["MAX_TEMP"].min()

# get ceiling and floor
max_ceil = math.ceil(maximum_temp/10)*10
min_floor = math.floor(minimum_temp/10)*10

# generate cutting locations and labels
step=20
temp_range = list(np.arange(min_floor, max_ceil+1, step))
labels = [f"{value} to {temp_range[index+1]}" for (index, value) in enumerate(temp_range[:-1])]

# create column with Bins
df["MEAN_TEMP_BINS"] = pd.cut(df["MEAN_TEMP"],
                                    bins=temp_range,
                                    labels=labels)

# frequency of incidents corresponding to temperatures
df_MEAN_INCIDENTS = (df[["CAD_TYPE", "MEAN_TEMP_BINS"]]
                     .groupby("MEAN_TEMP_BINS")
                     .count()
                     .rename(columns={"CAD_TYPE":"Number of Incidents"}))
df_MEAN_INCIDENTS.index.name = "Mean Temperature (Celsius) Bins"

# save the dataframe
df_MEAN_INCIDENTS.to_csv(os.path.join(ANALYSES_DIRECTORY, "Temperature_Bins_and_TFS_Incidents.csv"))

df_MEAN_INCIDENTS

In [None]:
# delete the variables and drop the Temperature Bin
del df_TEMP_GROUPBY
df.drop(columns="MEAN_TEMP_BINS", inplace=True)

### Temperature varies widely among top 10 incident types
### Frequency distribution for Incidents peaks at a temperature range of 0 to 20
### In Conclusion I do not believe temperature plase an important role in predicting the type of incident TFS responds to

## Heating Degree Day (HDD) and Cooling Degree Day (CDD)

In [None]:
# bin the HDD and CDD data

# get the max and min for HDD and CDD
max_HDD = df["HDD"].max(); min_HDD = df["HDD"].min(); max_CDD = df["CDD"].max(); min_CDD = df["CDD"].min()

# get ceiling and floor
max_ceil_HDD = math.ceil(max_HDD/10)*10; min_floor_HDD = math.floor(min_HDD/10)*10;
max_ceil_CDD = math.ceil(max_CDD/10)*10; min_floor_CDD = math.floor(min_CDD/10)*10;

# generate HDD and CDD range
step_HDD=5
step_CDD=2
HDD_range = list(np.arange(min_floor_HDD, max_ceil_HDD+1, step_HDD))
CDD_range = list(np.arange(min_floor_CDD, max_ceil_CDD+1, step_CDD))

# get the HDD and CDD labels
labels_HDD = [f"{value} to {HDD_range[index+1]}" for (index, value) in enumerate(HDD_range[:-1])]
labels_CDD = [f"{value} to {CDD_range[index+1]}" for (index, value) in enumerate(CDD_range[:-1])]

# set the bin labels to the dataframe
df["HDD_BINS"] = pd.cut(df["HDD"], bins= HDD_range, right = False, labels= labels_HDD)
df["CDD_BINS"] = pd.cut(df["CDD"], bins= CDD_range, right = False, labels= labels_CDD)

df.head()

In [None]:
# GroupBy count for HDD and CDD and save to csv
df_HDD_GROUPBY = df[["HDD_BINS", "CAD_TYPE"]].groupby("HDD_BINS").count().rename(columns={"CAD_TYPE":"COUNT"})
df_CDD_GROUPBY = df[["CDD_BINS", "CAD_TYPE"]].groupby("CDD_BINS").count().rename(columns={"CAD_TYPE":"COUNT"})

# save to csv
df_HDD_GROUPBY.to_csv(os.path.join(ANALYSES_DIRECTORY, "TFS_Fire_Incidents_HDD.csv"))
df_CDD_GROUPBY.to_csv(os.path.join(ANALYSES_DIRECTORY, "TFS_Fire_Incidents_CDD.csv"))

display(df_HDD_GROUPBY)
display(df_CDD_GROUPBY)

In [None]:
# Here is a histogram of HDD and CDD
fig, axes = plt.subplots(1, 1, figsize=(5, 3.5))

# plot the time series
axes.hist("HDD", HDD_range, data=df)

# set title, x axis title and y axis title
axes.set_title("Frequency of Fire Incident Calls for given HDD")
axes.set_xlabel("HDD Bins (Celsius)")
axes.set_ylabel("Frequency")

# magical padding
plt.tight_layout()

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "TFS_Fire_Incidents_HDD.png"))

# A good majority occur on days with minimial to no heating requirements

In [None]:
# Here is a histogram of HDD and CDD
fig, axes = plt.subplots(1, 1, figsize=(5, 3.5))

# plot the time series
axes.hist("CDD", CDD_range, data=df)

# set title, x axis title and y axis title
axes.set_title("Frequency of Fire Incident Calls for given CDD")
axes.set_xlabel("CDD Bins (Celsius)")
axes.set_ylabel("Frequency")

# magical padding
plt.tight_layout()

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "TFS_Fire_Incidents_CDD.png"))

# A good majority occur on days with minimial to no heating requirements

In [None]:
# check data by month of year to see if HDD and CDD have no effect
df_HDD_CROSSTAB = pd.crosstab(df["DATETIME"].dt.month, df["HDD_BINS"], df["DATETIME"].dt.month, aggfunc="count")
df_HDD_CROSSTAB.index.name = "MONTH"

# write to csv
df_HDD_CROSSTAB.to_csv(os.path.join(ANALYSES_DIRECTORY, "HDD_Month_Crosstab.csv"))

df_HDD_CROSSTAB

In [None]:
# HDD Crosstab
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# plot the HDD Crosstab
df_HDD_CROSSTAB.plot(kind="bar", stacked=True, rot=0, ax=axes)
axes.set_ylabel("Count")
axes.set_xlabel("Month")
axes.set_title("HDD Bins and TFS Incident Numbers per Month")
axes.legend(title="HDD Values",bbox_to_anchor=(1, 1))
plt.tight_layout()
# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "HDD_Month_Crosstab.png"))

In [None]:
# check data by month of year to see if HDD and CDD have no effect
df_CDD_CROSSTAB = pd.crosstab(df["DATETIME"].dt.month, df["CDD_BINS"], df["DATETIME"].dt.month, aggfunc="count")
df_CDD_CROSSTAB.index.name = "MONTH"

# write to csv
df_CDD_CROSSTAB.to_csv(os.path.join(ANALYSES_DIRECTORY, "CDD_Month_Crosstab.csv"))

df_CDD_CROSSTAB

In [None]:
# CDD Crosstab
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# plot the CDD Crosstab
df_CDD_CROSSTAB.plot(kind="bar", stacked=True, rot=0, ax=axes)
axes.set_ylabel("Count")
axes.set_xlabel("Month")
axes.set_title("CDD Bins and TFS Incident Numbers per Month")
axes.legend(title="CDD Values",bbox_to_anchor=(1, 1))
plt.tight_layout()
# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "CDD_Month_Crosstab.png"))

In [None]:
# delete and drop variables / columns
del df_CDD_CROSSTAB, df_HDD_CROSSTAB, df_HDD_GROUPBY, df_CDD_GROUPBY
df.drop(columns=["HDD_BINS", "CDD_BINS"], inplace=True)

### There is no discerning trend to show a relationship between HDD or CDD and number of TFS emergency calls over the months

## Rain and Snow

In [None]:
# bin the rain and snow data

# get the max and min for HDD and CDD
max_rain = df["RAIN_MM"].max(); min_rain = df["RAIN_MM"].min(); max_snow = df["SNOW_CM"].max(); min_snow = df["SNOW_CM"].min()

# get ceiling and floor
max_ceil_rain = math.ceil(max_rain/10)*10; min_floor_rain = math.floor(min_rain/10)*10;
max_ceil_snow = math.ceil(max_snow/10)*10; min_floor_snow = math.floor(min_snow/10)*10;

# generate HDD and CDD range
step_rain=5
step_snow=5
rain_range = list(np.arange(min_floor_rain, max_ceil_rain+1, step_rain))
snow_range = list(np.arange(min_floor_snow, max_ceil_snow+1, step_snow))

# get the HDD and CDD labels
labels_rain = [f"{value} to {rain_range[index+1]}" for (index, value) in enumerate(rain_range[:-1])]
labels_snow = [f"{value} to {snow_range[index+1]}" for (index, value) in enumerate(snow_range[:-1])]

# set the bin labels to the dataframe
df["RAIN_BINS"] = pd.cut(df["RAIN_MM"], bins= rain_range, right = False, labels= labels_rain)
df["SNOW_BINS"] = pd.cut(df["SNOW_CM"], bins= snow_range, right = False, labels= labels_snow)

df.head()

In [None]:
# GroupBy count for HDD and CDD and save to csv
df_RAIN_GROUPBY = df[["RAIN_BINS", "CAD_TYPE"]].groupby("RAIN_BINS").count().rename(columns={"CAD_TYPE":"COUNT"})
df_SNOW_GROUPBY = df[["SNOW_BINS", "CAD_TYPE"]].groupby("SNOW_BINS").count().rename(columns={"CAD_TYPE":"COUNT"})

# save to csv
df_RAIN_GROUPBY.to_csv(os.path.join(ANALYSES_DIRECTORY, "TFS_Fire_Incidents_RAIN.csv"))
df_SNOW_GROUPBY.to_csv(os.path.join(ANALYSES_DIRECTORY, "TFS_Fire_Incidents_SNOW.csv"))

display(df_RAIN_GROUPBY)
display(df_SNOW_GROUPBY)

In [None]:
# Here is a histogram of HDD and CDD
fig, axes = plt.subplots(1, 1, figsize=(5, 3.5))

# plot the time series
axes.hist("RAIN_MM", rain_range, data=df)

# set title, x axis title and y axis title
axes.set_title("Frequency of Fire Incident Calls for given mm of Rain")
axes.set_xlabel("Rain Bins (mm)")
axes.set_ylabel("Frequency")

# magical padding
plt.tight_layout()

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "TFS_Fire_Incidents_Rain.png"))

# A good majority occur on days with no rain

In [None]:
# Here is a histogram of HDD and CDD
fig, axes = plt.subplots(1, 1, figsize=(5, 3.5))

# plot the time series
axes.hist("SNOW_CM", snow_range, data=df)

# set title, x axis title and y axis title
axes.set_title("Frequency of Fire Incident Calls for given cm of Snow")
axes.set_xlabel("Snow Bins (cm)")
axes.set_ylabel("Frequency")

# magical padding
plt.tight_layout()

# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "TFS_Fire_Incidents_Snow.png"))

# A good majority occur on days with no snow on the ground

In [None]:
# check data by month of year to see if rain or snow have an effect
df_RAIN_CROSSTAB = pd.crosstab(df["DATETIME"].dt.month, df["RAIN_BINS"], df["DATETIME"].dt.month, aggfunc="count")
df_RAIN_CROSSTAB.index.name = "MONTH"

# write to csv
df_RAIN_CROSSTAB.to_csv(os.path.join(ANALYSES_DIRECTORY, "RAIN_Month_Crosstab.csv"))

df_RAIN_CROSSTAB

In [None]:
# Rain Crosstab
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# plot the Rain Crosstab
df_RAIN_CROSSTAB.plot(kind="bar", stacked=True, rot=0, ax=axes)
axes.set_ylabel("Count")
axes.set_xlabel("Month")
axes.set_title("Rain Bins and TFS Incident Numbers per Month")
axes.legend(title="Rain Values",bbox_to_anchor=(1, 1))
plt.tight_layout()
# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "Rain_Month_Crosstab.png"))

In [None]:
# check data by month of year to see if rain or snow have an effect
df_SNOW_CROSSTAB = pd.crosstab(df["DATETIME"].dt.month, df["SNOW_BINS"], df["DATETIME"].dt.month, aggfunc="count")
df_SNOW_CROSSTAB.index.name = "MONTH"

# write to csv
df_SNOW_CROSSTAB.to_csv(os.path.join(ANALYSES_DIRECTORY, "SNOW_Month_Crosstab.csv"))

df_SNOW_CROSSTAB

In [None]:
# Snow Crosstab
fig, axes = plt.subplots(1, 1, figsize=(7, 3.5))

# plot the Snow Crosstab
df_SNOW_CROSSTAB.plot(kind="bar", stacked=True, rot=0, ax=axes)
axes.set_ylabel("Count")
axes.set_xlabel("Month")
axes.set_title("Snow Bins and TFS Incident Numbers per Month")
axes.legend(title="Snow Values",bbox_to_anchor=(1, 1))
plt.tight_layout()
# save the figure
fig.savefig(os.path.join(IMAGES_DIRECTORY, "Snow_Month_Crosstab.png"))

### Snow and Rain have no real impact on TFS Fire Incidents

# 7. The effect if Time and Day of the Week on TFS Fire Incidents
- do specific days of the week or times of the day affect the number of TFS Fire Incidents

# 7. Jupyter Notebook References

[1] "Python Documentation."  *Python Software Foundation*.  [Online](https://docs.python.org/).  [Accessed August 04, 2020]

[2] G. Niemeyer.  "dateutil - powerful extensions to datetime."  *dateutil*.  [Online](https://github.com/dateutil/dateutil).  [Accessed August 04, 2020]

[3] "pandas."  *PyData*.  [Online](https://pandas.pydata.org/).  [Accessed August 04, 2020]

[4] "NumPy - The fundamental package for scientific computing with Python."  *NumPy*.  [Online](https://numpy.org/).  [Accessed August 04, 2020]

[5] "Matplotlib:  Visualization with Python."  *The Matplotlib Development team*.  [Online](https://matplotlib.org/).  [Accessed August 04, 2020]

[6] M. Waskom.  "seaborn:  statistical data visualization."  *seaborn*.  [Online](https://seaborn.pydata.org/).  [Accessed August 04, 2020]

[7] "Toronto ice storm 2013: Photos show city looking like crime scene with taped-off downed branches."  *National Post*. December 23, 2013. [Online](https://nationalpost.com/news/canada/toronto-ice-storm-2013-photos-from-the-gtas-winter-nightmare).  [Accessed August 04, 2020]

[8] "Heating degree day." *Wikipedia*.  [Online](https://en.wikipedia.org/wiki/Heating_degree_day).  [Accessed August 04, 2020]









[6] "City of Toronto Open Data Portal."  *City of Toronto*.  [Online](https://open.toronto.ca/).  [Accessed August 04, 2020]

[7] "Fire Services Basic Incident Details."  *City of Toronto*.  [Online](https://open.toronto.ca/dataset/fire-services-basic-incident-details/).  [Accessed August 04, 2020]

[8] "Historical Climate Data."  *Government of Canada*.  [Online](https://climate.weather.gc.ca/).  [Accessed August 04, 2020]

[9] "URL based procedure to automatically download data in bulk from Climate Website"  *Government of Canada*.  [Online].  *ftp://client_climate@ftp.tor.ec.gc.ca/Pub/Get_More_Data_Plus_de_donnees/Readme.txt*.  [Accessed August 04, 2020]

[10] "Fire Station Locations."  *City of Toronto*.  [Online](https://open.toronto.ca/dataset/fire-station-locations/).  [Accessed August 04, 2020]