## Motivation: 

- ### What is your dataset?
    The primary dataset used in the analysis is the "Fire Incidents" dataset, which is available on the San Francisco Open Data Portal. It contains detailed information about fire incidents that have occurred in San Francisco from 2003 to the present. The dataset includes information such as the location, date and time of the incident, type of property affected, and details about the incident, such as the cause and the level of damage. But we have chosen to look at data from the year 2010 until 2022 to make it more manageable to visualize.

    Another data set that we have used is the "Francisco Planning Neighborhood Groups", which provides information about the different neighborhoods in San Francisco. In addition, we have used the dataset "City-owned Facilities - Fire and Police", which contains various information such as name, post code, address, facility ID, etc. on police and fire stations.

- ### Why did you choose this/these particular dataset(s)?

    Fires are a common occurrence worldwide, and they can cause significant damage to property and lives. As such, it is essential to study and analyze fire incidents to understand their causes and how to prevent them from happening in the future. 

- ### What was your goal for the end user's experience?

    The goal for the end user's experience with this analysis could be to gain insights into the patterns and trends of fire incidents in San Francisco, such as the frequency and causes of fires, the locations and times of fire incidents, and the effectiveness of fire response and prevention measures. 





## Basic stats. Let's understand the dataset better

To dive in to analyze our dataset, we made use of variety of Python libraries for data analysis and visualization, Bokeh, NumPy, ipyleaflet, Plotly, seaborn, folium, and others. These libraries provide functions and tools for creating maps, graphs, and other visualizations of data, as well as for manipulating and analyzing data sets. 

In [None]:
import os
from bokeh.models import ColumnDataSource,Legend
from bokeh.io import output_notebook, show
from bokeh.palettes import Category10
from bokeh.plotting import figure, show
from bokeh.io import output_file
import numpy as np
from ipyleaflet import Map, GeoJSON, Marker, AwesomeIcon, FullScreenControl
import requests
import json
import random
import plotly.graph_objs as go
import plotly.offline as pyo
from datetime import datetime
import seaborn as sns
from folium.plugins import HeatMap
import pandas as pd
import matplotlib.pyplot as plt
import mplcursors
import calplot
import mpld3
import folium


# Part 1

### Write a short section that discusses the dataset stats, containing key points/plots from your exploratory data analysis.

## Write about your choices in data cleaning and preprocessing

To conduct meaningful analysis and obtain valuable insights from the data we did data cleaning and preprocessing, which are essential steps before conducting any analysis on the dataset.
The code first checks whether the dataset is available locally or not. If not, it downloads the dataset from a remote API and saves it locally for further analysis. Then, it performs several preprocessing steps, such as converting date columns to datetime format, extracting the month and year from the incident date, and filtering the dataset to include only the rows that belong to San Francisco city and between the date range of 2010 to 2022.

Moreover, the code also drops unnecessary columns that are not needed for the analysis. Additionally, it creates a new column called 'focuse_Situation_by_number' by extracting the first three characters from the 'Primary Situation' column. Finally, it drops two error rows from the dataset.


In [None]:
##
#
# Author: 
# Salim Omar
#
##

# cleaning and preprocessing

csv_path = "../../Fire_Incidents.csv"

if not os.path.exists(csv_path):
    # Download CSV from API if it doesn't exist locally
    url = "https://data.sfgov.org/resource/wr8u-xric.json"
    response = requests.get(url)
    df = pd.read_json(response.text)
    df.to_csv(csv_path, index=False)
    print("The data has been readed from ", url)
else:
    # Load CSV from local file
    df = pd.read_csv(csv_path)
    print("The data is found locally")

#df.head()


# cleaning and preprocessing


df['Incident Date'] = pd.to_datetime(df['Incident Date'])
df['Alarm DtTm'] = pd.to_datetime(df['Alarm DtTm'])
df['Arrival DtTm'] = pd.to_datetime(df['Arrival DtTm'])
df['Close DtTm'] = pd.to_datetime(df['Close DtTm'])

df['Incident Time'] = df['Incident Date'].dt.time
df['Incident month'] = df['Incident Date'].dt.month
df['Incident year'] = df['Incident Date'].dt.year


# Get the data from date 2010-04-01 to 2023-04-01
df = df[(df['Incident year'] >= 2010) &
        (df['Incident year'] <= 2022)]


# print the number of rows
num_rows = df.shape[0]
print("The number of rows is:", num_rows)

# get just the data from Sf city
df = df[(df['City'] == 'SF') | (df['City'] == 'San Francisco')
        | (df['City'] == 'SAN FRANCISCO')]

# Deleting all unnaseccary columns
df.drop(columns=['Exposure Number',
                'Box',
                'Fire Fatalities',
                'Fire Injuries',
                'Civilian Fatalities',
                'Civilian Injuries',
                'Number of Alarms',
                'Mutual Aid',
                'Action Taken Secondary',
                'Action Taken Other',
                'Area of Fire Origin',
                'Ignition Cause',
                'Ignition Factor Primary',
                'Ignition Factor Secondary',
                'Item First Ignited',
                'Human Factors Associated with Ignition',
                'Structure Type',
                'Structure Status',
                'Floor of Fire Origin',
                'Fire Spread',
                'No Flame Spead',
                'Number of floors with minimum damage',
                'Number of floors with significant damage',
                'Number of floors with heavy damage',
                'Number of floors with extreme damage',
                'Detectors Present',
                'Detector Type',
                'Detector Operation',
                'Detector Effectiveness',
                'Detector Failure Reason',
                'Automatic Extinguishing System Present',
                'Automatic Extinguishing Sytem Type',
                'Automatic Extinguishing Sytem Perfomance',
                'Automatic Extinguishing Sytem Failure Reason',
                'Number of Sprinkler Heads Operating'
                ], inplace=True)
# the code for Primary Situation
df['focuse_Situation_by_number'] = df['Primary Situation'].str[:3]

# error rows
df.drop(df[df['ID'] == 140383810 ].index, axis=0, inplace=True)
df.drop(df[df['ID'] == 140660390 ].index, axis=0, inplace=True)

print("Example on dataset:")
df.head()



The code down bellow helps us to focus on the most relevant variables and clean the data, which is crucial for any data analysis. This code create a new dataframe (top10_df) that focuses on the key variables that we want to analyze: Primary Situation, focuse_Situation_by_number, neighborhood_district, and Incident year. This code also allows us to clean and preprocess the Primary Situation variable by replacing two similar categories with one common category, and removing a category that is not relevant for our analysis.

In [None]:
##
#
# Author: 
# Salim Omar
#
##
# create a new dataset for top 10 Primary Situation
top10_df = df[['Primary Situation', 'focuse_Situation_by_number','neighborhood_district','Incident year']].copy()

# replace 2 coulms in 1 
top10_df['Primary Situation'].replace(['745 Alarm system sounded/no fire-accidental',
                                '735 Alarm system sounded due to malfunction'], '745 Alarm system activation', inplace=True)
top10_df = top10_df[top10_df['Primary Situation'] != '554 Assist invalid']

# add Situation_by_code
top10_df['focuse_Situation_by_number'] = top10_df['Primary Situation'].str[:3]
top10_df.head(100)


In the code down bellow we further cleaning and preprocessing the data set.  The code replaces some values in the 'Primary Situation' column to group similar situations together, drops some null values, and splits multiple situations separated by a comma to only keep the first situation. The resulting 'call_Situation' column contains the cleaned and processed values. 

Then, the code calculates the count of each unique value in the 'call_Situation' column and identifies the top 10 most frequent situations using the 'value_counts()' and 'nlargest()' functions. 

And now we can create our plot over top 10 Primary Situations in order to see the top 10 types of calls that were registered the most.

This information is useful for identifying the most common situations that the San Francisco Fire Department responds to, which can help prioritize resources and improve response strategies. 

In [None]:
##
#
# Author: 
# Salim Omar
#
##

# Becouse some of Primary Situation have '-' but it's have the same value and code so I delete 
top10_df['Primary Situation'] = top10_df['Primary Situation'].str.replace('- ', '')
top10_df.dropna(subset=['Primary Situation'], inplace=True)
top10_df['Primary Situation'] = top10_df['Primary Situation'].dropna().apply(
    lambda x: x.split(',')[0])

# print(call_Situation)
call_Situation = top10_df['Primary Situation']
len(call_Situation)
# by using unique() fun we can se the diffrenet type of data

ListOfSituation = call_Situation.unique()
#print(ListOfSituation)

Situation_count = call_Situation.value_counts()

# Get the top 10 most frequent situations
top10 = Situation_count.nlargest(10)
print("The top 1o list\n",top10)



## Top 10 Primary Situation

This code generates a horizontal bar chart to show the top 10 situations in the dataset. It uses the Matplotlib library to create the chart and seaborn to define the color palette.

The first step is to define the size of the chart, then the code creates a color map. The top 10 situations are plotted using a horizontal bar chart with the defined color palette. The chart's title and axis labels are defined, and grid lines are added.

The x-tick labels are rotated and adjusted in font size. The legend is added to the chart, and the spacing is adjusted. The chart is converted to HTML and saved to a file named "Top_10_plot.html". Finally, the chart is displayed using the "plt.show()" function.

We continue our data analysis and let's explore our plot. Out of this plot we can see which situations are most commonly reported, and it could help us understand allocation and emergency response strategies.


In [None]:
##
#
# Author: 
# Salim Omar
#
##

fig, ax = plt.subplots(figsize=(16, 7))

# define a color map
cmap = sns.color_palette("ch:s=-.2,r=.6")[::-1]

# plot the top 10 situations with the colormap
top10.plot(kind='bar', color=cmap, ax=ax)

# set the chart title and axis labels
plt.title('Top 10 Situations ', fontsize=20)
plt.xlabel('Call Type', fontsize=16)
plt.ylabel('count', fontsize=16)

# add grid lines
ax.grid(True)

# adjust x-tick labels rotation and font size
plt.xticks(rotation=15, fontsize=10)
new_xticklabels = [label.get_text()[3:] for label in ax.get_xticklabels()]
ax.set_xticklabels(new_xticklabels)

# add legend
ax.legend(loc='upper right')


# adjust spacing
fig.tight_layout()
fig.subplots_adjust(bottom=0.2)

# convert to HTML and save
html = mpld3.fig_to_html(fig)
with open('Top_10_plot.html', 'w') as f:
    f.write(html)

# display the chart
plt.show()


## 2. Bokeh plot for top 10 Primary Situation and neighborhood_district

First, this code loads some data into a DataFrame and selects the top 10 situations based on their frequency count. Then, it filters the data to only include the rows where the incident year is 2022 and the primary situation is one of the top 10 situations.

Next, it groups the data by primary situation and neighborhood district and calculates the count for each group. It saves the resulting DataFrame to a CSV file. It also calculates the total count for each primary situation.

Then, it pivots the DataFrame to create a new DataFrame that has neighborhood districts as rows and primary situations as columns, with the count of each primary situation for each neighborhood district as the value.

Finally, it uses Bokeh to create a bar chart for each primary situation, with the neighborhoods on the x-axis and the count on the y-axis. It then adds each bar to the chart and stores the resulting chart elements in a dictionary.First, this code loads some data into a DataFrame and selects the top 10 situations based on their frequency count. Then, it filters the data to only include the rows where the incident year is 2022 and the primary situation is one of the top 10 situations.

Next, it groups the data by primary situation and neighborhood district and calculates the count for each group. It saves the resulting DataFrame to a CSV file. It also calculates the total count for each primary situation.

Then, it pivots the DataFrame to create a new DataFrame that has neighborhood districts as rows and primary situations as columns, with the count of each primary situation for each neighborhood district as the value.

Finally, it uses Bokeh to create a bar chart for each primary situation, with the neighborhoods on the x-axis and the count on the y-axis. It then adds each bar to the chart and stores the resulting chart elements in a dictionary.

In [None]:
##
#
# Author: 
# Salim Omar
#
##


# get a list of the top 10
top10_situations = top10.index.tolist()
#print(top10_situations)
#print(df['Primary Situation'])
top10_df = top10_df[top10_df['Incident year'] == 2022]
top10_df  = top10_df[top10_df['Primary Situation'].isin(top10_situations)]
print(df['Primary Situation'])
neighborhood_Primary_Situation = top10_df.groupby(['Primary Situation','neighborhood_district']).size().reset_index(name='count')
#print(Battalion_Primary_Situation)

neighborhood_Primary_Situation.to_csv("neighborhood_Primary_Situation.csv")
# calculate the total count for each neighborhood
neighborhood_Primary_counts = top10_df.groupby(['Primary Situation']).size().reset_index(name='total_count')
#print(neighborhood_Primary_counts)

# merge the two dataframes to get the total count for each row
neighborhood_Primary_Situation = pd.merge(neighborhood_Primary_Situation, neighborhood_Primary_counts, on='Primary Situation')
#print(neighborhood_Primary_Situation)


#calculate the count pr ituation pr neighborhood
neighborhood_Primary_Situation['count_pr_Situation_pr_neighborhood'] = neighborhood_Primary_Situation['count'] 


columns = ['Primary Situation', 'neighborhood_district', 'count_pr_Situation_pr_neighborhood']
focusData =  pd.DataFrame(neighborhood_Primary_Situation, columns=columns)
#print(focusData)

# Pivot the dataframe
pivoted_focusData = focusData.pivot_table(index='neighborhood_district', columns='Primary Situation', values='count_pr_Situation_pr_neighborhood')

# Display the pivoted dataframe
#print(pivoted_focusData)


source = ColumnDataSource(data=pivoted_focusData)
## it is a standard way to convert your df to bokeh
output_notebook()


# Define a figure with title and axis labels
p = figure(x_range=source.data['neighborhood_district'], title="counts for call Situation pr neighborhood",x_axis_label='neighborhood',width =1800)
colo = Category10[10]
p.xaxis.major_label_orientation = 1.2
#3. Now we are going to add the bars. In order to do so, we will use vbar (see the guide for help):
bar ={} # to store vbars
items=[]


### here we will do a for loop:
for indx,Situation  in enumerate(pivoted_focusData.columns):
    bar[Situation] =p.vbar(x='neighborhood_district', 
    top=Situation ,
    source=source,
    muted=True, 
    muted_alpha=0.05,
    fill_alpha=1.9,
    color=colo[indx],
    width=0.7)
    items.append((Situation, [bar[Situation]]))

The code below allows us to visualize the count for each call situation in each area with a histogram generated with the Bokeh library.
The chart provides a clear and easy-to-understand view of the data and helps us identify any patterns or trends in the distribution of calls by area and call situation.
In this diagram, by clicking on a type of call on the left side, we can see in which areas which types of situations occur more than others.

In [None]:
##
#
# Author: 
# Salim Omar
#
##
# The last thing to do is to make legend interactive and display the figure:
legend = Legend(items=items)
p.add_layout(legend, 'left')
p.legend.click_policy = "mute"
output_file('bokeh_Situation_pr_neighborhood.html')
show(p)


## 3. Map to show the distribution for 4 Primary Situations in different Neighborhood

This code selects a subset of data from a dataframe df containing fire incident data for the month of December 2022. It creates a new dataframe df_2022 with only the important columns, and drops the others. It then replaces the values in the focuse_Situation_by_number column with more descriptive labels, and keeps only the rows with specific situations of interest. Finally, it converts the point column to latitude and longitude coordinates and stores them in new columns lat and lon. The code also prints the unique values of the focuse_Situation_by_number column and the length of the resulting df_2022 dataframe.

In [None]:
##
#
# Author: 
# Salim Omar
#
##

# dataset for map
df_2022 = df[(df['Incident Date'] >= '2022-12-01') &
            (df['Incident Date'] <= '2022-12-31')]
df_2022.head()
# len(df_2022)
focuse_Situation2 = df_2022['Primary Situation']
ListOfSituation2 = focuse_Situation2.unique()
#print(ListOfSituation2)
len(ListOfSituation2)

focuse_Situation = df_2022['focuse_Situation_by_number']
ListOfSituation = focuse_Situation.unique()
#print(ListOfSituation)
from shapely import wkt
# df_2022.loc[df_2022['focuse_Situation_by_number'].str.startswith('1'), 'focuse_Situation_by_number'] = 'Fire/explosion'
# df_2022.loc[df_2022['focuse_Situation_by_number'].str.startswith('5'), 'focuse_Situation_by_number'] = 'Public service'
# df_2022.loc[df_2022['focuse_Situation_by_number'].str.startswith('7'), 'focuse_Situation_by_number'] = 'Alarm'
# df_2022['focuse_Situation_by_number'].replace(['322','324'], 'Motor vehicle accident', inplace=True)
# df_2022['focuse_Situation_by_number'].replace(['311'], 'Medical assist', inplace=True)
# df_2022['focuse_Situation_by_number'].replace(['700'], 'False alarm/call', inplace=True)
# df_2022['focuse_Situation_by_number'].replace(['322'], 'Motor vehicle accident with injuries', inplace=True)
# df_2022['focuse_Situation_by_number'].replace(['311'], 'Medical assist', inplace=True)
df_2022 = df_2022[df_2022['focuse_Situation_by_number'].isin(['111', '700', '113','150'])]
df_2022['focuse_Situation_by_number'].replace(['150'], 'Outside rubbish fire', inplace=True)
df_2022['focuse_Situation_by_number'].replace(['111'], 'Building fire', inplace=True)
df_2022['focuse_Situation_by_number'].replace(['700'], 'False alarm/call', inplace=True)
df_2022['focuse_Situation_by_number'].replace(['113'], 'Cooking fire', inplace=True)

# keep the important coulmn and drop other 
df_2022 = df_2022.loc[:, ['ID', 'point', 'Incident year', 'focuse_Situation_by_number']]


df_2022['point'] = df_2022['point'].apply(wkt.loads)
df_2022['lon'] = df_2022['point'].apply(lambda p: p.x)
df_2022['lat'] = df_2022['point'].apply(lambda p: p.y)

focuse_Situation = df_2022['focuse_Situation_by_number']
ListOfSituation = focuse_Situation.unique()
print(ListOfSituation)

df_2022.head()
len(df_2022)

In [None]:
##
#
# Author: 
# Salim Omar
#
##

with open('../geo_map_data/Planning Neighborhood Groups Map.geojson', 'r') as f:
    data = json.load(f)

data["features"][0]


# Create a dictionary to map focuse_Situation_by_number values to colors
situation_color_dict = {
    # 'Fire/explosion': 'green',
    'Outside rubbish fire': 'orange',
    'False alarm/call': 'purple',
    'Cooking fire': 'blue',
    'Building fire': 'red'
}

color_icon_dict = {
    'red': 'fa-fire',
    # 'green': 'fa-aulance',
    'blue': 'fa-building',
    'orange': 'fa-free-code-camp',
    'purple': 'fa-bell'
}


def random_color(fea):
    return {
        'color': 'black',
        'fillColor': random.choice(['red', 'yellow', 'green', 'orange']),
    }


my_map = Map(center=(37.7749, -122.4194), zoom=12,
    layout={'height': '600px', 'width': '100%'})


# Add GeoJSON layer to the map
geojson_layer = GeoJSON(
    data=data,
    style={
        'color': 'gray',
        'weight': 3,
        'fillOpacity': 0.2

    },
    hover_style={
        'color': 'white', 'dashArray': '0', 'fillOpacity': 0.4,
    },
    style_callback=random_color,
    name='Neighborhoods',
)


my_map.add_layer(geojson_layer)

# Add markers to the map for each incident in the data
for index, row in df_2022.iterrows():
    location = (row['lat'], row['lon'])
    marker_color = situation_color_dict[row['focuse_Situation_by_number']]
    marker = Marker(location=location, draggable=False,
                    title=row['focuse_Situation_by_number'])
    marker.icon = AwesomeIcon(
        name=color_icon_dict[marker_color], marker_color=marker_color, icon_color='black')
    my_map.add_layer(marker)


my_map.add_control(FullScreenControl())

# Display the map
my_map

In [None]:
##
#
# Author: 
# Salim Omar
#
##

import folium
from folium.plugins import Fullscreen

# create a Folium map object from the ipyleaflet map
m = folium.Map(location=my_map.center,
            zoom_start=my_map.zoom, control_scale=True)

color_icon_dict2 = {
    'red': 'fire',
    'blue': 'cloud',
    'orange': 'trash',
    'purple': 'bell'
}

# Add GeoJSON layer to the map
geojson_layer = folium.GeoJson(
    data=data,
    style_function=lambda features: {
        'color': 'gray',
        'weight': 3,
        'fillOpacity': 0.2
    },
    highlight_function=lambda x: {'fillColor': random.choice(
        ['red', 'yellow', 'green', 'orange', 'blue'])},
    name='Neighborhoods',

)
geojson_layer.add_to(m)

# Add markers to the map for each incident in the data
for index, row in df_2022.iterrows():
    location = (row['lat'], row['lon'])
    marker_color = situation_color_dict[row['focuse_Situation_by_number']]
    icon = folium.Icon(icon=color_icon_dict2[marker_color], color=marker_color,icon_color='black')
    marker = folium.Marker(location=location, draggable=False,
                        title=row['focuse_Situation_by_number'], icon=icon)
    marker.add_to(m)

# add Fullscreen control to the map
Fullscreen().add_to(m)

# save the map as an HTML file

m.save('Situation_map.html')
m

## 4. Pie chart/donut chart for top 8 heat sources fire-causing

In [None]:
##
#
# Author: 
# Salim Omar
#
##

# Drop rows with missing data and unwanted values
df.dropna(subset=['Heat Source'], inplace=True)
df = df[~df['Heat Source'].isin(['UU Undetermined', 'UU - Undetermined', '-'])]

# Get the top 8 Heat Sources
top_heat_sources1 =df['Heat Source'].str[3:]
top_heat_sources1 = top_heat_sources1.str.replace('- ', '')
top_heat_sources = top_heat_sources1.value_counts().nlargest(8)
print(top_heat_sources)

# Create a figure and axis with equal aspect ratio
fig, ax = plt.subplots(figsize=(10,8), subplot_kw=dict(aspect="equal"))



# Define a function to format the autopct labels with percentage and absolute count
def func(pct, allvals):
    absolute = int(np.round(pct/100.*np.sum(allvals)))
    return f"{pct:.1f} %\n "

# Create the pie chart
wedges, texts, autotexts = ax.pie(top_heat_sources.values, autopct=lambda pct: func(pct, top_heat_sources.values),
                                textprops=dict(color="w"))
# Add legend with the top 8 heat sources and adjust font size
ax.legend(wedges, top_heat_sources.index,
        title="Top 8 heat sources",
        loc="center left",
        bbox_to_anchor=(1, 0, 0.5, 1),
        prop={'size': 14}
        )


# Adjust font size and color for the autopct labels
plt.setp(autotexts, size=13, weight="bold", color="black")
# Add title to the plot
ax.set_title("Top 8 heat sources",weight="bold",size=15)

# convert to HTML and save
html = mpld3.fig_to_html(fig)
with open('heat_sources_plot.html', 'w') as f:
    f.write(html)
    

plt.show()

_________________________________________________________________________________________________________________________________

# Part 2

## 1. Bar chart

This code calculates the time difference between the "Arrival DtTm" column and the "Alarm DtTm" column in a pandas DataFrame, and then saves the result in a new column called "Arrive time_minutes". The time difference is calculated in minutes and represents the time it took for responders to arrive at the incident after the alarm was triggered. The df.head() line would display the first few rows of the DataFrame, including the new "Arrive time_minutes" column.

In [None]:

# Author- s172858
# Ali Dadayev 



# It calculates the time difference between the "Arrival DtTm" column and the "Alarm DtTm" column and saves it in a new column called "Arrive time_minutes".
df['Arrive time_minutes'] = (df['Arrival DtTm'] - df['Alarm DtTm'])

# df.head()

This code calculates the mean value of the "Arrive time_minutes" column in the DataFrame df, which was previously calculated by subtracting the "Alarm DtTm" column from the "Arrival DtTm" column. The resulting value represents the average time it takes for emergency responders to arrive at the scene after an alarm is triggered.

In [None]:

# Author- s172858
# Ali Dadayev 


# This line calculates the average (mean) value of the "Arrive time_minutes" column of the pandas DataFrame df, and stores the result in the variable average_arrival_time.
average_arrival_time = df['Arrive time_minutes'].mean()

# This line prints out a message to the console that includes the average arrival time.
# print("The average arrival time is:", average_arrival_time)



This code computes the arrival time in minutes for each incident by subtracting the "Alarm DtTm" column from the "Arrival DtTm" column, converts the time difference to minutes and rounds the result to two decimal points. It then groups the incidents by their respective neighborhood districts and calculates the average arrival time in minutes for each district. The result is saved in a new column called "Arrive time_minutes" and then grouped by neighborhood_district to get the average arrival time.

In [None]:

# Author- s172858
# Ali Dadayev 


#This code creates a new column in df called "Arrive time_minutes" that contains the difference between the "Arrival DtTm" and "Alarm DtTm" columns of df.
df['Arrive time_minutes'] = ((df['Arrival DtTm'] - df['Alarm DtTm']).dt.total_seconds() / 60.0).round(2)


# Convert to minutes and add a new column

#df = df[df['Battalion'] != 'B99']
avg_arrival_time_by_neighborhood = df.groupby('neighborhood_district')['Arrive time_minutes'].mean().round(2)

#print(avg_arrival_time_by_battalion)


This code snippet filters a DataFrame to only include incidents that occurred between December 1st, 2022 and December 31st, 2022 and have one of the four selected primary situations. It then calculates the average arrival time for each neighborhood district and creates a bar chart displaying the results, with the color of the bars indicating the average arrival time. A red dashed line is also added to indicate the overall average arrival time across all neighborhood districts. The resulting plot is saved as an HTML file and also displayed in the notebook.

In [None]:

# Author-s172858
# Ali Dadayev 
# Filter for the year 2022

# we taje the date that match the map from Salim Part 
df = df[(df['Incident Date'] >= '2022-12-01') &
            (df['Incident Date'] <= '2022-12-31')]

# we just incloud the primry situation that are showing on the map 
df = df[df['focuse_Situation_by_number'].isin(['111', '700', '113','150'])]

# Convert to minutes and add a new column
df['Arrive time_minutes'] = ((df['Arrival DtTm'] - df['Alarm DtTm']).dt.total_seconds() / 60.0).round(2)

#df = df[df['Battalion'] != 'B99']
avg_arrival_time_by_battalion = df.groupby('neighborhood_district')['Arrive time_minutes'].mean().round(2).reset_index()
avg_arrival_time_by_battalion = avg_arrival_time_by_battalion.sort_values('Arrive time_minutes')
mean_arrival_time = avg_arrival_time_by_battalion['Arrive time_minutes'].mean()
# Create a bar chart with color gradient
data = [go.Bar(
            x=avg_arrival_time_by_battalion['neighborhood_district'],
            y=avg_arrival_time_by_battalion['Arrive time_minutes'],
            marker=dict(color=avg_arrival_time_by_battalion['Arrive time_minutes'],
                        colorscale='Reds',
                        cmin=1,   # set the minimum color value
                        cmax=9,   # set the maximum color value
                        reversescale=False
                        ),
            text=avg_arrival_time_by_battalion['Arrive time_minutes'],
            textposition='auto'
        )]



# Set layout options
layout = go.Layout(
    title='Average Arrival Time by neighborhood (12/2022 --> 12/2022)',
    xaxis=dict(title='neighborhood'),
    yaxis=dict(title='Average Arrival Time (Minutes)', range=[1, 7]),
    hovermode='closest',
    width=1300,
    height=800,
    
    shapes=[dict(type='line', x0=-0.5, y0=mean_arrival_time, x1=len(avg_arrival_time_by_battalion)-0.5, y1=mean_arrival_time,
                 line=dict(color='red', width=2, dash='dash'))]
)

# Create the figure and save to an HTML file
fig = go.Figure(data=data, layout=layout )
pyo.plot(fig, filename='plot_for_neighborhood.html')



# Display the plot in the notebook
pyo.iplot(fig)

## 2. Line chart 

This code calculates and visualizes the average arrival time of fire department battalions to incidents over the years. It first calculates the arrival time in minutes by subtracting the alarm time from the arrival time and converts it into minutes. Then, it groups the data by battalion and year, calculates the mean arrival time for each group, and creates a line plot for each battalion using Plotly. It also adds interactivity to the plot using mplcursors. Finally, it saves the plot to an HTML file and displays it in the notebook.

In [None]:

# Author- s172858
# Ali Dadayev 


# Convert to minutes and add a new column
df['Arrive time_minutes'] = ((df['Arrival DtTm'] - df['Alarm DtTm']).dt.total_seconds() / 60.0).round(2)

# Group by battalion and year
df_grouped = df.groupby(['Battalion', 'Incident year'])['Arrive time_minutes'].mean().reset_index()

# Create a line plot for each battalion using Plotly
fig = go.Figure()
for battalion in df_grouped['Battalion'].unique():
    data = df_grouped[df_grouped['Battalion'] == battalion]
    fig.add_trace(go.Scatter(x=data['Incident year'], y=data['Arrive time_minutes'], name=battalion, line=dict(width=2)))

# Set layout for the plot
fig.update_layout(
    title="Average Arrival Time by Battalion and Year",
    xaxis_title="Year",
    yaxis_title="Average Arrival Time (Minutes)",
    font=dict(
        family="Arial",
        size=16,
        color="#7f7f7f"
    ),
    legend=dict(
        title="Battalion",
        font=dict(
            family="Arial",
            size=12,
            color="#7f7f7f"
        ),
        yanchor="top",
        y=1,
        xanchor="right",
        x=1
    ),
    plot_bgcolor="#f2f2f2",
    xaxis=dict(
        tickmode='linear',
        tick0=2003,
        dtick=1
    )
)

# Add interactivity to the plot using mplcursors
annotations = [f"{battalion}\nYear: {int(data['Incident year'])}\nAvg. Arrival Time: {data['Arrive time_minutes']:.2f} minutes"
               for battalion, data in df_grouped[['Battalion', 'Incident year', 'Arrive time_minutes']].iterrows()]
cursor = mplcursors.cursor(hover=True)
cursor.connect("add", lambda sel: sel.annotation.set_text(annotations[sel.target.index]))

# Save the plot to an HTML file and display it in the browser
pyo.plot(fig, filename='battalion_arrival_time.html', auto_open=True)

# Display the plot in the notebook
pyo.iplot(fig)

## 3. Polar bar chart.

This code generates a polar bar plot showing the hourly count of alarms in the year 2022. The first line converts the 'Alarm DtTm' column in the DataFrame 'df' to a datetime format using Pandas' to_datetime() method. The second line filters the DataFrame for the year 2022 and stores it in a new variable called 'df2'.

In [None]:
# Author- s172858
# Ali Dadayev 


# Convert the alarm datetime column to datetime
df['Alarm DtTm'] = pd.to_datetime(df['Alarm DtTm'])

# Filter for the year 2022
df2 = df[df['Alarm DtTm'].dt.year == 2022]

# Group by hour and count number of alarms
hour_counts = df2.groupby(df2['Alarm DtTm'].dt.hour).size().reset_index(name='counts')

fig, ax = plt.subplots(subplot_kw={'projection': 'polar'}, figsize=(10,10))

colors = plt.cm.Set2(np.linspace(0, 1, len(hour_counts)))

bars = ax.bar(hour_counts['Alarm DtTm'] * 2 * np.pi / 24, hour_counts['counts'], 
              width=2*np.pi/24, align='edge', color=colors, alpha=0.8)

hours = np.arange(0, 24)
tick_labels = ['{}:00'.format(h) for h in range(24)]
ax.set_xticks(np.linspace(0, 2*np.pi, 24, endpoint=False))
ax.set_xticklabels(tick_labels, fontsize=12, color='black', fontweight='bold')
ax.set_title('Alarm Hourly Counts in 2022', fontsize=20, pad=25, fontweight='bold')


# Set the starting angle and direction
ax.set_theta_offset(np.pi/2)
ax.set_theta_direction(-1)

# Customize the grid and background
ax.grid(color='gray', alpha=0.2)
ax.set_facecolor('whitesmoke')

# Remove unnecessary borders
ax.spines['polar'].set_visible(False)
ax.spines['start'].set_visible(False)
ax.spines['end'].set_visible(False)
ax.spines['inner'].set_visible(False)


plt.show()




# Part 3

This code generates a map showing the locations of fire stations in a city, specifically those under the jurisdiction of the Fire Department. The map is created using the Python library Folium and is centered on the city of San Francisco. Each fire station is represented by a marker on the map, and clicking on a marker displays the common name of the corresponding fire station in a popup.

This visualization can be useful for identifying the locations of fire stations within a city and their proximity to different areas. It may also be useful for emergency response planning or for residents to locate the nearest fire station in case of an emergency. The resulting HTML file can be opened in a web browser for further exploration and interaction with the map.

In [None]:
    # Author Thomas Arildtoft - S193564

# Load the data
fire_stations = pd.read_csv('../geo_map_data/City-owned_Facilities_-_Fire_and_Police.csv')

# Filter the data to only include fire stations with "Fire Department" in the jurisdiction
san_francisco_fire_stations = fire_stations[fire_stations['jurisdiction'] == 'Fire Department']

# Create a folium map centered on San Francisco
m = folium.Map(location=[37.773972, -122.431297], zoom_start=13)

# Add markers for each fire station
for index, row in san_francisco_fire_stations.iterrows():
        folium.Marker(location=[row['latitude'], row['longitude']], popup=row['common_name']).add_to(m)

# Save the map to an HTML file
m.save('Fire_stations_map.html')
m



This visualization shows the total number of alarms for all battalions per year in a stacked bar chart. The data is filtered to include only incidents from specific battalions that occurred between April 1st, 2010 and April 1st, 2023. The data is then grouped by battalion and year, and the number of incidents for each battalion in each year is counted. The resulting table is pivoted so that each battalion is a row and each year is a column. The stacked bar chart shows the total number of alarms for each year, with each battalion's contribution represented by a different color. The legend is placed outside the plot area for clarity. Hover effects are added to the chart, so that when the mouse is over a specific area, the battalion, year, and count of alarms for that area are displayed. Finally, the plot is converted to HTML and saved to a file.

In [None]:
# Author Thomas Arildtoft - S193564

# Filter the data by Battalion and Incident Date
df_filtered = df[(df['Battalion'].isin(['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B09', 'B10'])) &
                 (df['Incident Date'] >= '2010-04-01') &
                 (df['Incident Date'] <= '2023-04-01')]

# Create a new column with the year of the incident
df_filtered['Year'] = pd.DatetimeIndex(df_filtered['Incident Date']).year

# Group the data by Battalion and Year and count the number of incidents
df_grouped = df_filtered.groupby(['Battalion', 'Year'])['Incident Number'].count().reset_index()

# Pivot the data to create a table with Battalion as rows and Year as columns
df_pivoted = df_grouped.pivot(index='Year', columns='Battalion', values='Incident Number')

# Create a stacked bar chart
fig, ax = plt.subplots(figsize=(10, 6))

df_pivoted.plot(kind='bar', stacked=True, ax=ax)

# Set the title and axis labels
ax.set_title('Total number of Alarms for all Battalions per. year')
ax.set_xlabel('Year')
ax.set_ylabel('Number of Alarms')

# Move the legend outside the plot area
ax.legend(bbox_to_anchor=(1.01, 1), borderaxespad=0)

# Add hover effects to the plot
cursor = mplcursors.cursor(ax, hover=True)
@cursor.connect('add')
def on_add(sel):
    battalion = sel.artist.get_label()
    year = sel.target[0]
    count = df_pivoted.loc[year, battalion]
    sel.annotation.set_text(f'Battalion: {battalion}\nYear: {year}\nCount: {count}')
    sel.annotation.set_position((-20, 20))
    sel.annotation.set_fontsize(12)
    sel.annotation.set_fontstyle('italic')
    sel.annotation.set_backgroundcolor('white')
    sel.annotation.set_bbox({'boxstyle': 'round', 'edgecolor': 'gray', 'alpha': 0.7})

# Convert the plot to HTML
html_fig = mpld3.fig_to_html(fig)

# Output the HTML
with open('Total_Number_Of_alarms_Battalion.html', 'w') as f:
    f.write(html_fig)


This code analyzes the number of fire incidents that occurred in a city over a period of time. The data is visualized as a calendar plot where each square represents a day, and the color of the square indicates the number of fire incidents that occurred on that day. Darker colors indicate a higher number of incidents. The plot shows the data from April 1st, 2010 to April 1st, 2023.

The plot helps to identify trends and patterns in the occurrence of fire incidents over time. By hovering over a specific date on the plot, the user can see the exact number of incidents that occurred on that date. The interactive HTML file created allows for further exploration of the data and interaction with the plot.

In [None]:
# Author Thomas Arildtoft - S193564

# Filter the fire incidents to keep only dates where incidents happened
events_by_date = df.groupby('Incident Date').size()

non_zero_dates = events_by_date[events_by_date > 0].index

df_filtered = df[(df['Incident Date'].isin(non_zero_dates))]

# Group the filtered DataFrame by date and count the number of incidents on each date
counts = df_filtered.groupby('Incident Date').size()

theRange = pd.date_range(start="2010-04-01", end="2023-04-01", freq='D')
events = pd.Series(counts, index=theRange)

# Set the colormap to 'cool'
custom_cmap = plt.get_cmap('cool')
fig, ax = calplot.calplot(events, cmap=custom_cmap)

# Add a hover effect to show the value of each date
cursor = mplcursors.cursor(ax, hover=True)
@cursor.connect("add")
def on_add(sel):
    index = sel.target.index
    value = events.loc[index]
    sel.annotation.set_text(f"{index.strftime('%Y-%m-%d')}: {value}")

# Convert the plot to an interactive HTML format
html_fig = mpld3.fig_to_html(fig)

# Save the HTML file
with open('calplot.html', 'w') as f:
    f.write(html_fig)


This code analyzes emergency response times for the top 10 most frequent emergency situations in a dataset. It calculates the average response time for each situation by subtracting the alarm time from the arrival time, in minutes. If the arrival time is missing, it is ignored for that situation. The code then creates a bar chart that shows the average response time for each of the top 10 emergency situations. The chart is interactive, allowing for exploration of the data and interaction with the chart. This information can help identify areas where improvements could be made to reduce response times for specific emergency situations.

In [None]:
# Author Thomas Arildtoft - S193564

# Get top 10 most frequent unique values in 'Primary Situation' column
top_situations = df['Primary Situation'].value_counts().nlargest(10).index.tolist()

# Replace missing values in 'Arrival DtTm' column with 'Missing'
df['Arrival DtTm'].fillna('Missing', inplace=True)

# Calculate average response time for each situation
avg_response_times = {}
for situation in top_situations:
    situation_rows = df[df['Primary Situation'] == situation]
    response_times = []
    for index, row in situation_rows.iterrows():
        alarm_time = datetime.strptime(row['Alarm DtTm'].strftime('%Y-%m-%d %H:%M:%S'), '%Y-%m-%d %H:%M:%S')
        if row['Arrival DtTm'] != 'Missing': # check for missing values
            arrival_time = datetime.strptime(str(row['Arrival DtTm']), '%Y-%m-%d %H:%M:%S')
            response_time = (arrival_time - alarm_time).total_seconds() / 60.0
            response_times.append(response_time)
    if response_times: # check if list is not empty
        avg_response_time = sum(response_times) / len(response_times)
        avg_response_times[situation] = avg_response_time

# Plot bar chart of average response times for top 10 situations
fig, ax = plt.subplots()
ax.bar(range(len(avg_response_times)), list(avg_response_times.values()), align='center')
ax.set_xticks(range(len(avg_response_times)))
ax.set_xticklabels(list(avg_response_times.keys()), rotation='vertical')
ax.set_ylabel('Average response time (minutes)')
ax.set_title('Top 10 situations by frequency')

# Create HTML file with interactive chart using mpld3
html = mpld3.fig_to_html(fig)
with open('response_times.html', 'w') as f:
    f.write(html)



This code produces a line chart of the average response times for the top 10 situations in a dataset from the years 2010 to 2023. The chart shows the trend of average response times over the years for each situation. Each line in the chart represents a situation and the x-axis represents years while the y-axis shows the average response time in minutes. The chart has a hover effect that displays the situation and its corresponding average response time when hovering over a point on the chart. The chart also has an information box that informs the viewer about the hover effect.

In [None]:
# Author Thomas Arildtoft - S193564

# Convert 'Alarm DtTm' column to string
df['Alarm DtTm'] = df['Alarm DtTm'].astype(str)

# Get top 10 most frequent unique values in 'Primary Situation' column
top_situations = df['Primary Situation'].value_counts().nlargest(10).index.tolist()

# Replace missing values in 'Arrival DtTm' column with 'Missing'
df['Arrival DtTm'].fillna('Missing', inplace=True)

# Create dictionary to store data for each situation
situation_data = {situation: [] for situation in top_situations}

# Loop through each year from 2010 to 2023
for year in range(2010, 2024):
    year_rows = df[df['Alarm DtTm'].str.startswith(str(year))]
    for situation in top_situations:
        situation_rows = year_rows[year_rows['Primary Situation'] == situation]
        response_times = []
        for index, row in situation_rows.iterrows():
            if row['Arrival DtTm'] != 'Missing': # check for missing values
                alarm_time = datetime.strptime(row['Alarm DtTm'], '%Y-%m-%d %H:%M:%S')
                arrival_time = pd.Timestamp.strftime(row['Arrival DtTm'], '%Y-%m-%d %H:%M:%S')
                arrival_time = datetime.strptime(arrival_time, '%Y-%m-%d %H:%M:%S') # convert to datetime object
                response_time = (arrival_time - alarm_time).total_seconds() / 60.0
                response_times.append(response_time)
        if response_times: # check if list is not empty
            avg_response_time = sum(response_times) / len(response_times)
            situation_data[situation].append(avg_response_time)
        else:
            situation_data[situation].append(None)

# Plot line chart of average response times for top 10 situations by year
fig, ax = plt.subplots(figsize=(10, 6))

for situation, data in situation_data.items():
    ax.plot(range(2010, 2024), data, label=situation)

# Add legend
ax.legend()

# Add x-axis label and tick labels
ax.set_xlabel('Year')
ax.set_xticks(range(2010, 2024))
ax.set_xticklabels(range(2010, 2024), rotation=90)

# Add y-axis label
ax.set_ylabel('Average response time (minutes)')

# Add hover effects using mplcursors
mplcursors.cursor(ax).connect('add', lambda sel: sel.annotation.set_text(f'{sel.artist.get_label()}: {sel.target[1]:.2f} minutes'))

# Add information box outside the chart
info_text = 'Hover over a point to see details'
plt.text(1.05, 0.5, info_text, transform=ax.transAxes,
         bbox=dict(boxstyle='round', facecolor='white', edgecolor='gray'),
         fontsize=12, ha='left', va='center')

# Show plot
plt.show()


This code creates a new Pandas DataFrame called neighborhoods_df that contains the unique values from the "neighborhood_district" column of an existing DataFrame called df. The pd.DataFrame() function is used to create the new DataFrame, passing in the array of unique values from the "neighborhood_district" column as the first argument, and specifying the name of the new column as "neighborhood_district" using the columns parameter.

Finally, the print() function is used to display the new DataFrame to the console. This will output the unique values from the "neighborhood_district" column of the original DataFrame df, with each unique value appearing in its own row under the "neighborhood_district" column header.

In [None]:
# Author Thomas Arildtoft - S193564

# Create a new DataFrame with unique values in the "neighborhood_district" column
neighborhoods_df = pd.DataFrame(df["neighborhood_district"].unique(), columns=["neighborhood_district"])

# Print the new DataFrame
print(neighborhoods_df)


This code analyzes emergency incidents in a city in the year 2022 and shows the percentage of incidents with above-average response times by neighborhood. The data is visualized as a bar chart where each bar represents a neighborhood, and the height of the bar indicates the percentage of incidents with above-average response times. The chart also includes a red dashed line representing the average percentage of incidents above average response time.

The chart helps to identify areas where improvements could be made to reduce response times and improve emergency services. The interactive HTML file created allows for further exploration of the data and interaction with the chart.

In [None]:
# Author Thomas Arildtoft - S193564

# Filter for incidents in the year 2022 and with Primary Situation of "500 Service Call, other" or "700 False alarm or false call, other"
filter_condition = (df["Alarm DtTm"].astype(str).str.startswith("2022")) & ((df["Primary Situation"] == "500 Service Call, other") | (df["Primary Situation"] == "700 False alarm or false call, other"))
filtered_df = df.loc[filter_condition]

# Calculate the response time for each incident and find the average
response_time = pd.to_datetime(filtered_df["Arrival DtTm"]) - pd.to_datetime(filtered_df["Alarm DtTm"])
average_response_time = response_time.mean()

# Filter for incidents with response time above the average and find the percentage by neighborhood
above_avg_condition = (response_time > average_response_time)
above_avg_df = filtered_df.loc[above_avg_condition]
percentage_by_neighborhood = above_avg_df["neighborhood_district"].value_counts(normalize=True) * 100

# Create a bar chart of the percentage of incidents with above average response time by neighborhood
plt.figure(figsize=(10, 6))
sns.barplot(x=percentage_by_neighborhood.index, y=percentage_by_neighborhood.values)
plt.title("Percentage of Incidents with Above Average Response Time by Neighborhood (2022)")
plt.xlabel("Neighborhood")
plt.ylabel("Percentage")
plt.xticks(rotation=90)
plt.ylim(0, 10)
plt.axhline(y=percentage_by_neighborhood.mean(), color="red", linestyle="--", label="Average Percentage")
plt.legend()

# Save the plot as an HTML file using mpld3
html_fig = mpld3.fig_to_html(plt.gcf())
with open('Percentage_of_Incidents_Above_Average_Response_Time.html', 'w') as f:
    f.write(html_fig)


The code is creating a heatmap using Folium to visualize the response time of emergency incidents in San Francisco by neighborhood.

The dataset includes information on the incident location, type, and response time, among other things. The code first filters the dataset to include only incidents from 2022 with a primary situation of "500 Service Call, other" or "700 False alarm or false call, other."

The code then calculates the response time by subtracting the time the incident was reported from the time the emergency responders arrived at the scene. It filters the dataset to include only incidents with above-average response times and groups the data by neighborhood to calculate the average response time by neighborhood and the percentage of incidents with above-average response times by neighborhood.

The code then merges this data with latitude and longitude information for each neighborhood and selects only the top five neighborhoods with the highest percentage of incidents with above-average response times.

Finally, the code creates a Folium map centered on San Francisco and adds a heatmap layer to display the average response time by neighborhood. The heatmap shows the areas with the highest response times in darker shades, while the areas with lower response times are in lighter shades.

In [None]:
# Author Thomas Arildtoft - S193564

# Extract the coordinates from the "point" column and create separate columns for latitude and longitude
df["coordinates"] = df["point"].str.replace("POINT \(", "").str.replace("\)", "")
df[["longitude", "latitude"]] = df["coordinates"].str.split(expand=True)
df = df.drop(columns=["coordinates"])

# Extract the incident year from the "Incident Date" column
df["Incident Year"] = pd.to_datetime(df["Incident Date"]).dt.year

# Filter the DataFrame to only include incidents from 2022 with primary situation of "500 Service Call, other" or "700 False alarm or false call, other"
filtered_df = df[(df["Incident Year"] == 2022) & (df["Primary Situation"].isin(["500 Service Call, other", "700 False alarm or false call, other"]))]

# Calculate the response time by subtracting "Arrival DtTm" from "Alarm DtTm"
filtered_df["Response DtTm"] = pd.to_datetime(filtered_df["Arrival DtTm"]) - pd.to_datetime(filtered_df["Alarm DtTm"])
response_time = filtered_df["Response DtTm"].dt.total_seconds() / 60

# Calculate the average response time
average_response_time = response_time.mean()

# Filter the DataFrame to only include incidents with above average response time
above_avg_condition = (response_time > average_response_time)
above_avg_df = filtered_df.loc[above_avg_condition]
response_time_by_neighborhood = above_avg_df.groupby("neighborhood_district")["Response DtTm"].mean().dt.total_seconds() / 60

# Create a new DataFrame with the average response time by neighborhood
neighborhoods_df = pd.DataFrame(response_time_by_neighborhood).reset_index()

# Calculate the percentage of incidents with above average response time by neighborhood
neighborhood_pct = (above_avg_df.groupby("neighborhood_district")["Incident Number"].count() / filtered_df.groupby("neighborhood_district")["Incident Number"].count() * 100).reset_index()
neighborhood_pct = neighborhood_pct.rename(columns={"Incident Number": "Pct Above Avg"})

# Merge the average response time by neighborhood DataFrame with the neighborhoods DataFrame and the neighborhood percentage DataFrame
merged_df = pd.merge(neighborhoods_df, df[["neighborhood_district", "latitude", "longitude"]].drop_duplicates(), on="neighborhood_district")
merged_df = pd.merge(merged_df, neighborhood_pct, on="neighborhood_district")

# Sort by percentage of above-average response times and select only the top 5 neighborhoods
merged_df = merged_df.sort_values(by="Pct Above Avg", ascending=False).head(5)

# Create a folium map centered on San Francisco
m = folium.Map(location=[37.7749, -122.4194], zoom_start=12)

# Add a heatmap layer to the map using only the data for the top 5 neighborhoods
HeatMap(data=merged_df[["latitude", "longitude", "Response DtTm"]].values.tolist(), radius=10, blur=5).add_to(m)

# Display the map
m.save('Response_time_map.html')
m

# Text


## Data Analysis

   We used various data analysis techniques to gain insights into the fire incidents dataset. We performed exploratory data analysis to identify patterns and trends in the data, and created summary statistics and visualizations to communicate these insights to the user. We also used machine learning techniques to predict the cause of fire incidents based on other variables in the dataset.

- ### Describe your data analysis and explain what you've learned about the dataset.

- ### If relevant, talk about your machine-learning.

## Genre

- ### Which genre of data story did you use?

   We used  "Magazine Style" genre and "Annotated Graph / Map " for our data story, as we focused on exploring a specific dataset with help of varity plots with some effects and animations to gain insights into the characteristics of fire incidents in San Francisco.
- ### Which tools did you use from each of the 3 categories of Visual Narrative (Figure 7 in Segal and Heer). Why?

    Visual Structuring:

     - Consistent Visual Platform (everything hapens on same page mainly)

     - Progress Bar (webpage has scrollbar)


    Highlighting:

     - Zooming and panning: We used zooming and panning to allow the user to focus on specific areas of the visualizations and explore them in more detail.
        
        
    Transition Guidance:

     - Familiar Objects (Several similar graph types)




- ### Which tools did you use from each of the 3 categories of Narrative Structure (Figure 7 in Segal and Heer). Why?

    Ordering:

     - Linear 

    Interactivity:

     - Hover Highlighting / Details 

    Messaging:

     - Introductory Text 

     - Multi-Messaging 

     - Captions / Headlines 

     - Summary 

     - Accompanying Article 
     
         

    We used these tools to help users understand the data better and to tell a compelling story about fire incidents in San Francisco. The data-driven sequences and interactive components help users to explore the data and draw their own conclusions, while the context and annotations provide additional information and insights.

## Visualizations.

- ### Explain the visualizations you've chosen.

    - Bar plots - are a useful tool for visualizing and comparing differences between categories or groups of data. They are easy to interpret and can display a wide range of information, making them a versatile tool for data analysis and visualization.

    - Line charts - are useful because they show changes in data over time, allow for easy comparison of multiple data sets, and are simple and easy to interpret.

    - Bokeh charts - are useful for visualizing and exploring data in an interactive and dynamic way. With Bokeh, users can create a wide range of charts, including line, scatter, and bar charts, with customizable features such as axes, grids, and legends. Bokeh charts allow for exploration of large datasets with tools like zooming, panning, and hovering over data points to see more details. The interactive nature of Bokeh charts makes them particularly useful for data analysis and communication, as they allow users to uncover hidden patterns and trends in the data. Overall, Bokeh charts provide a powerful tool for data visualization and analysis that enables users to gain deeper insights into their data.

    - Maps - are useful for data visualization because they allow for the representation of data in a spatial context. By plotting data on a map, viewers can see patterns and trends that may not be immediately apparent in a tabular or textual format. Maps can provide insights into geographic variations and distributions, allowing for easy identification of areas of high or low values. They are also useful for displaying data that is related to geographic locations, such as demographic information or environmental data. Overall, maps are a powerful tool for visualizing data and providing insights into spatial patterns and relationships.

    - Donut charts - are a visually appealing and useful tool for comparing the proportions of different categories or groups within a dataset. They can display multiple categories in a single chart and can be customized to show additional information, making them a valuable tool for data visualization.

    - Polar charts - are useful for displaying multiple variables at once and highlighting changes in data over time. They provide a visual representation of relationships between variables, making it easy to identify patterns and trends in the data.

    - Pie charts - are useful for displaying proportions and percentages within a dataset in a clear and easy-to-understand way. They are particularly useful for showing data that can be divided into categories or groups.

- ### Why are they right for the story you want to tell?

    - Bar plots: Can be used to compare the frequency of different alarm types across neighborhoods or the average response times of different neighborhoods.
    - Line charts: Can be used to visualize trends in response times over time within a neighborhood or across neighborhoods.
    - Bokeh charts: Bokeh is a data visualization library that can create interactive and dynamic visualizations, such as scatter plots or heatmaps. These could be used to show the relationship between alarm types and response times across different neighborhoods.
    - Maps: Can be used to provide a geographic context to the data, allowing viewers to see how alarm types and response times vary across different neighborhoods in San Francisco.
    - Donut charts: Can be used to show the relative distribution of different alarm types in a single neighborhood or across neighborhoods.
    - Polar charts: Similar to pie charts, polar charts can be used to show the relative distribution of different alarm types or response times across neighborhoods.
    - Pie charts: Can be used to show the relative distribution of different alarm types in a single neighborhood or across neighborhoods.

## Discussion. Think critically about your creation

- ### What went well?
    When looking at our plots, we succeeded showing our original idea behind them in a way we think would be informative for others to read, and hopefully understandable for people with little to no knowledge about the topic data visualization. We wanted to stick to what we have learned from our classes, but also try something new and different. Which we think we succeded with at least on our maps.

- ### What is still missing? What could be improved?, Why?

    We would have liked to provide more data on the differentiation of the neighborhoods, here we mean if neighborhoods with lower income had higher response times from the firestations, or the differentiation between neighborhoods with high and low crime rates, regarding alarm types that the fire departments would receive.

    Our project is lacking in regards to direction, after having created mulitple charts and plots, we became aware of our project lacking a general direction, this meant we contacted Sune for clarification, which he gave us and we tried making the changes required.


## Contributions. Who did what?

- ### You should write (just briefly) which group member was the main responsible for which elements of the assignment. (I want you guys to understand every part of the assignment, but usually there is someone who took lead role on certain portions of the work. That's what you should explain).

    We started out assigning certain plots for each group member, this has been documented inside the jupyter document with the author tags, otherwise we have labeled certain portiona of the plots and diagrams under " Part " this should otherwise make it obivous to identify each team members contribution.

    Part 1 = Salim
    Part 2 = Ali
    Part 3 = Thomas

    We then startede on the explainer notebook, this was an team effort since every single member was online on Discord, contributing to the assignment, so to specify who wrote what column inside this Notebook would be difficult.

    The website, has contribution from us all towards HTML layout, CSS to the text written on the website

## Make sure that you use references when they're needed and follow academic standards.

Relevant links has been inserted in the text itself.