All the data in this dataset is from "Montgomery County, Pennsylvania".


In this kernel, we will Analyze and Visualize the 911 calls data based on different variables. I try to make all of my kernels very easy to understand for someone who is just starting their journey on Kaggle and here to learn something because this is what I am doing and I am just a novice starting on my kaggle journey. I hope you like my kernel and give it an Upvote.


I wish everyone good luck for their learning journey on kaggle.

In [None]:
import numpy as np 
import pandas as pd
import seaborn as sb
import plotly.express as px
import matplotlib.pyplot as plt
from matplotlib import rcParams
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import datetime as dt
import folium
from folium.plugins import MarkerCluster
from plotly.offline import iplot
import cufflinks
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True, theme='pearl')
plt.style.use('ggplot')





import warnings
warnings.filterwarnings('ignore')

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Loading Data

In [None]:
data = pd.read_csv('../input/montcoalert/911.csv')

# Quick Look at Data

In [None]:
data.shape

In [None]:
data.head()

In [None]:
data.tail()

In [None]:
data.describe()

In [None]:
data.info()

In [None]:
data.sample(frac=0.00002, random_state=1)

# Handling Missing Data

In [None]:
data.isnull().sum()

Lets leave zip for now we will only use it for the top 10 zip codes for most number of 911 calls.

In [None]:
data.dropna(subset=['twp'], inplace=True)

In [None]:
data.isnull().sum()

Alright, Lets move forward !!

# Feature Engineering

In [None]:
data['timeStamp'].head(3)

Our column 'timeStamp' is in Categorical format and hence we must convert it to 'Datetime' format.

In [None]:
data['timeStamp'] = pd.to_datetime(data['timeStamp'])

In [None]:
data['timeStamp'].head(3)

## Adding new datetime columns

In [None]:
data['year'] = data['timeStamp'].dt.year
data['day'] = data['timeStamp'].dt.day
data['month'] = data['timeStamp'].dt.month
data['dayofweek'] = data['timeStamp'].dt.dayofweek
data['dayofyear'] = data['timeStamp'].dt.dayofyear
data['hour'] = data['timeStamp'].dt.hour

In [None]:
data.head(3)

Adding one column to our data : 'reason_cat' for the category of reason. This will enable us to better understand and visualize the dataset.

### Category for different reasons

In [None]:
data['reason_cat'] = data['title'].apply(lambda x:x.split(':')[0])
data['reason_cat'].unique()

### All the reasons for 911 calls

In [None]:
data['title'] = data['title'].apply(lambda x:x.split(':')[1])

data['title'] = data['title'].apply(lambda x:x.split('-')[0]).apply(lambda x:x.strip())

# Now our data looks more badass
data['title'].unique()

## How many different reasons?

In [None]:
data['title'].nunique()

# Data Visualizations

Lets get out hands dirty

In [None]:
data.columns

## Different reason category for 911 calls

In [None]:
reason = pd.DataFrame({'reason':['EMS', 'Traffic', 'Fire'], 'count':[3265850, 227045, 98797]})
reason.style.background_gradient(cmap='Blues', subset=['count'])

In [None]:
# Bar Chart
fig1 = px.bar(reason, reason['reason'], reason['count'], color_discrete_sequence=[px.colors.qualitative.Pastel], text=reason['count'])

fig1.update_layout(title={
                  'text': "Category of Reasons for 911 Calls",
                  'y':0.98,
                  'x':0.5,
                  'xanchor': 'center',
                  'yanchor': 'top'},
                  xaxis_title='Reason Category',
                  yaxis_title='Count',
                  showlegend=False,
                  template='ggplot2')

# -----------------------------------------------------------

# Pie Chart
fig2 = px.pie(reason, reason['reason'], reason['count'], 
              color_discrete_sequence=px.colors.qualitative.Pastel, hole=0.5)

fig2.update_layout(title={
                  'text': "Category of Reasons for 911 Calls (Pie Chart)",
                  'y':0.98,
                  'x':0.5,
                  'xanchor': 'center',
                  'yanchor': 'top'},
                   height=600,
                  template='plotly_white')

fig2.update_traces(textposition='inside', textinfo='percent+label', pull=[0.2, 0, 0])

fig2.data[0].marker.line.width = 2
fig2.data[0].marker.line.color = "black"

# -----------------------------------------------------------

fig1.show()
fig2.show()

* EMS (Emergency Medical Services) calls are the most frequent.
* Traffic calls are less frequent and Fire calls being the least frequent.

## Different reasons for 911 calls

Top 10 reasons for 911 calls in "Montgomery County, Pennsylvania".

In [None]:
data['title'].value_counts()\
             .head(10)\
             .to_frame(name='count')\
             .reset_index()\
             .style.background_gradient(cmap='Reds', subset=['count'])

In [None]:
data['title'].value_counts().head(10).iplot(kind='bar', 
                 color='red',
                 gridcolor='white',
                 linecolor='black',
                 theme='pearl',
                 title='Township',
                 yTitle='Number of 911 Calss',
                 bargap=0.4,
                 xTitle='Day of Week'
                 )


# Seaborn Chart
# plt.subplots(figsize=(20, 7))

# ax = sb.countplot(data['title'], data=data)
# ax.set(xlabel='Reasons', ylabel='Count')
# plt.title('Different Reasons For 911 Calls')

# plt.xticks(rotation=90)

# plt.show()

In [None]:
plt.subplots(figsize=(20, 7))

ax = sb.countplot(data['title'], hue=data['reason_cat'], data=data)
ax.set(xlabel='Reasons', ylabel='Count')
plt.title('Different Reasons For 911 Calls')

plt.xticks(rotation=90)

plt.show()

In [None]:
top_10_reasons = data['title'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
fig = px.bar(top_10_reasons, color_discrete_sequence=[px.colors.qualitative.Pastel], orientation='h', text=top_10_reasons['count'].sort_values(ascending=True))

fig.update_layout(title={
                  'text': "Top 10 Reasons for 911 Calls",
                  'y':0.98,
                  'x':0.5,
                  'xanchor': 'center',
                  'yanchor': 'top'},
                  xaxis_title='Reasons',
                  yaxis_title='Count',
                  showlegend=False,
                  template='plotly_white')

fig.show()

* Approximately 28% of all calls are for vehicle accident.
* Followed by disabled vehicle calls which constitutes nearly 7% of all calls.

## 911 calls based on different Zip codes

In [None]:
data['zip'].value_counts()\
           .head(20)\
           .to_frame(name='count')\
           .reset_index()\
           .style.background_gradient(cmap='Greens', subset=['count'])

In [None]:
top_10_zip = data['zip'].value_counts().to_frame(name='count').head(10).sort_values(by='count')

In [None]:
plt.subplots(figsize=(18, 8))
ax = sb.barplot(top_10_zip.index, top_10_zip['count'])
plt.xticks(rotation=90)
ax.set(xlabel='Zip Codes', ylabel='Number of 911 Calls')
plt.title('Top 10 Zip Codes For 911 Calls')

plt.show()

## 911 calls based on different Township

In [None]:
data['twp'].value_counts()\
           .head(20)\
           .to_frame(name='count')\
           .reset_index()\
           .style.background_gradient(cmap='Blues', subset=['count'])

In [None]:
top_10_twp = data['twp'].value_counts().to_frame(name='count').head(10).sort_values(by='count') 

top_10_twp.iplot(kind='bar', 
                 color='red',
                 gridcolor='white',
                 linecolor='black',
                 theme='pearl',
                 title='Township',
                 yTitle='Number of 911 Calss',
                 bargap=0.3,
                 xTitle='Day of Week'
                 )

# Seaborn Way
# sb.set_style('whitegrid')
# plt.subplots(figsize=(14, 7))
# ax = sb.barplot(top_10_twp.index, top_10_twp['count'])
# plt.xticks(rotation=90)
# ax.set(xlabel='Township', ylabel='Number of 911 Calls')
# plt.title('Top 10 Township For 911 Calls')
# plt.show()

## Datetime

In [None]:
print(f"Minimum year in data: {data['year'].min()}")
print(f"Most recent year in data: {data['year'].max()}")
print(f"Total number of years: {data['year'].max()-data['year'].min()}")

# Analysis on Time: 2015•2019

In [None]:
data.drop(data[data['lng']>-73].index, inplace=True)
data.drop(data[data['lng']<-76].index, inplace=True)

In [None]:
data_15 = data[data['year']==2015]
data_16 = data[data['year']==2016]
data_17 = data[data['year']==2017]
data_18 = data[data['year']==2018]
data_19 = data[data['year']==2019]
data_20 = data[data['year']==2020]

## Top 10 townships from 2015-2020 for 911 calls

In [None]:
top_twp_15 = data_15['twp'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_twp_16 = data_16['twp'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_twp_17 = data_17['twp'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_twp_18 = data_18['twp'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_twp_19 = data_19['twp'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_twp_20 = data_20['twp'].value_counts().to_frame(name='count').head(10).sort_values(by='count')

In [None]:
fig = make_subplots(rows=3, cols=2, 
                   subplot_titles=("Year 2015","Year 2016", "Year 2017", "Year 2018", "Year 2019", "Year 2020"))

fig.add_trace(go.Bar(x = top_twp_15.index, y=top_twp_15['count'], name='2015'), row=1, col=1) 

fig.add_trace(go.Bar(x = top_twp_16.index, y=top_twp_16['count'], name='2016'), row=1, col=2)

fig.add_trace(go.Bar(x = top_twp_17.index, y=top_twp_17['count'], name='2017'), row=2, col=1)

fig.add_trace(go.Bar(x = top_twp_18.index, y=top_twp_18['count'], name='2018'), row=2, col=2)

fig.add_trace(go.Bar(x = top_twp_19.index, y=top_twp_19['count'], name='2019'), row=3, col=1)

fig.add_trace(go.Bar(x = top_twp_20.index, y=top_twp_20['count'], name='2020'), row=3, col=2)

fig.update_layout(title_text='Top Townships for 911 Calls', height=1500, template='plotly_white')

fig.show()

In [None]:
top_twp_20 = data_20['twp'].value_counts().to_frame(name='count').head(10).sort_values(by='count', ascending=False)
top_twp_20.style.background_gradient(cmap='Blues', subset=['count'])

In [None]:
sb.lmplot(x='month', y='twp', data=data.groupby('month').count().reset_index(), height=6, aspect=2)

## Top 10 Reasons from 2015-2020 for 911 calls

In [None]:
top_title_15 = data_15['title'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_title_16 = data_16['title'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_title_17 = data_17['title'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_title_18 = data_18['title'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_title_19 = data_19['title'].value_counts().to_frame(name='count').head(10).sort_values(by='count')
top_title_20 = data_20['title'].value_counts().to_frame(name='count').head(10).sort_values(by='count')

In [None]:
fig = make_subplots(rows=3, cols=2, 
                   subplot_titles=("Year 2015","Year 2016", "Year 2017", "Year 2018", "Year 2019", "Year 2020"))

fig.add_trace(go.Bar(x = top_title_15.index, y=top_title_15['count'], name='2015'), row=1, col=1) 

fig.add_trace(go.Bar(x = top_title_16.index, y=top_title_16['count'], name='2016'), row=1, col=2)

fig.add_trace(go.Bar(x = top_title_17.index, y=top_title_17['count'], name='2017'), row=2, col=1)

fig.add_trace(go.Bar(x = top_title_18.index, y=top_title_18['count'], name='2018'), row=2, col=2)

fig.add_trace(go.Bar(x = top_title_19.index, y=top_title_19['count'], name='2019'), row=3, col=1)

fig.add_trace(go.Bar(x = top_title_20.index, y=top_title_20['count'], name='2020'), row=3, col=2)

fig.update_layout(title_text='Top Reasons for 911 Calls', height=1500, template='plotly_white')

fig.show()

# Folium Map (Location)

In [None]:
map_plot_20 = data[['lat', 'lng', 'twp']].groupby('twp').mean().reset_index()
map_plot_20.head(15).style.background_gradient(cmap='Reds', subset=['lat'])\
                    .background_gradient(cmap='Greens', subset=['lng'])

In [None]:
map_data = folium.Map(location=(40.2547, -75.3405), zoom_start = 11, width='70%', max_zoom=11, min_zoom=11)

for lat, lng, twp in zip(map_plot_20.lat, map_plot_20.lng, map_plot_20.twp):
    folium.Marker(
    radius=1,
    location=[lat, lng],
    color='crimson',
    popup=twp,
    icon=folium.Icon(color='red'),
    ).add_to(map_data)



#  display interactive map
display(map_data)

In [None]:
map_data = folium.Map(location=(40.2547, -75.3405), zoom_start = 11, width='70%', max_zoom=11, min_zoom=11)

for lat, lng in zip(data_20.lat.unique(), data_20.lng.unique()):
    folium.CircleMarker(
    radius=0.7,
    location=[lat, lng],
    color='crimson',
    fill=True,
    fillcolor='crimson'
    ).add_to(map_data)



#  display interactive map
display(map_data)

# Its About Time !

In [None]:
dmap={0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

mmap={1:'Jan',2:'Feb',3:'Mar',4:'Apr',5:'May',6:'Jun',7:'Jul',8:'Aug',9:'Sep',10:'Oct',11:'Nov',12:'Dec'}

data['month']=data['month'].map(mmap)

data['dayofweek']=data['dayofweek'].map(dmap)

## 911 Calls During Week

In [None]:
data['dayofweek'].value_counts()

In [None]:
data['dayofweek'].value_counts().iplot(kind='bar', 
                                       color='red',
                                       gridcolor='white',
                                       linecolor='black',
                                       theme='pearl',
                                       title='Calls During Week',
                                       yTitle='Count',
                                       bargap=0.4,
                                       opacity=0.7,
                                       xTitle='Day of Week'
                                      )

In [None]:
plt.style.use('ggplot')
plt.subplots(figsize=(18, 8))
sb.countplot(data['dayofweek'], hue=data['reason_cat'])

In [None]:
dayHour=data.groupby(by=['dayofweek','hour']).count()['reason_cat'].unstack()

plt.figure(figsize=(18,5))
sb.heatmap(dayHour,cmap='viridis',linewidths=0.1, linecolor='#0f0f0f')

## 911 Calls During Month

In [None]:
data['month'].value_counts()

In [None]:
data['month'].value_counts().iplot(kind='bar', 
                                       color='green',
                                       gridcolor='white',
                                       linecolor='black',
                                       theme='pearl',
                                       bargap=0.3,
                                       opacity=0.7,
                                       title='Calls During Month',
                                       yTitle='Count',
                                       xTitle='Month'
                                      )

In [None]:
plt.style.use('ggplot')
plt.subplots(figsize=(18, 8))
sb.countplot(data['month'], hue=data['reason_cat'])

In [None]:
dayHour=data.groupby(by=['month','dayofweek']).count()['reason_cat'].unstack()

plt.figure(figsize=(18,5))
sb.heatmap(dayHour,cmap='viridis',linewidths=0.1, linecolor='#0f0f0f')

## 911 Calls Yearly

In [None]:
data['year'].value_counts()

In [None]:
data['year'].value_counts().iplot(kind='bar', 
                                       gridcolor='white',
                                       linecolor='black',
                                       theme='pearl',
                                       bargap=0.3,
                                       opacity=0.7,
                                       title='Yearly Calls',
                                       yTitle='Count',
                                       xTitle='Year'
                                      )

In [None]:
dayHour=data.groupby(by=['year','month']).count()['reason_cat'].unstack()

plt.figure(figsize=(18,5))
sb.heatmap(dayHour,cmap='viridis',linewidths=0.1, linecolor='#0f0f0f')

Here, grey boxes represent the lack of data.

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:110%;
           font-family:Verdana;
           letter-spacing:0.9px">
    <p style="padding: 10px;
              color:white;
              font-size:110%">
        If you like this notebook, please give it an <span style="color:#F28835;"><b><i>upvote</i></b></span> as it keeps me motivated to create more quality kernels.<br>Keep on Learning !!
    </p>
</div>