# 911 Calls Capstone Project

For this capstone project we will be analyzing some 911 call data from [Kaggle](https://www.kaggle.com/mchirico/montcoalert). The data contains the following fields:

* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address
* e: String variable, Dummy variable (always 1)

Just go along with this notebook and try to complete the instructions or answer the questions in bold using your Python and Data Science skills!

## Data and Setup

____
** Import numpy and pandas **

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
df = pd.read_csv('911.csv')

FileNotFoundError: [Errno 2] No such file or directory: '911.csv'

In [None]:
df.info()

In [None]:
df.head()

In [None]:
df.isnull().sum()

In [None]:
df['zip'].fillna(df['zip'].mode()[0],inplace=True)
df['twp'].fillna(df['twp'].mode()[0],inplace=True)
df['addr'].fillna(df['addr'].mode()[0],inplace=True)

In [None]:
df['zip'].value_counts().head(5)

In [None]:
df['twp'].value_counts().head(5)

In [None]:
df['title'].nunique()

In [None]:
df['reason'] = df['title'].apply(lambda x: x.split(':')[0])
df['reason']

## Data Loading and Preparation

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

df = pd.read_csv('911.csv')

df.info()
df.head()
df.isnull().sum()

df['zip'].fillna(df['zip'].mode()[0],inplace=True)
df['twp'].fillna(df['twp'].mode()[0],inplace=True)
df['addr'].fillna(df['addr'].mode()[0],inplace=True)

## Analyzing Call Reasons

In [None]:
df['zip'].value_counts().head(5)
df['twp'].value_counts().head(5)
df['title'].nunique()

df['reason'] = df['title'].apply(lambda x: x.split(':')[0])
df['reason'].value_counts()

sns.countplot(x='reason',data=df)

## Time-based Analysis

In [None]:
type(df['timeStamp'].iloc[0])
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
type(df['timeStamp'].iloc[0])

time = df['timeStamp'].iloc[0]
time.hour

df['hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['month'] = df['timeStamp'].apply(lambda time: time.month)
df['day of week'] = df['timeStamp'].apply(lambda time: time.dayofweek)
dmap = ({0:'Mon', 1:'Tue', 2:'Wed', 3:'Thu', 4:'Fri', 5:'Sat', 6:'Sun'})
df['day of week'] = df['day of week'].map(dmap)

df.head()

## Visualizing Time-based Data

In [None]:
sns.countplot(x='day of week', data=df, hue='reason')
sns.countplot(x='month',data=df,hue='reason',palette='viridis')
plt.legend(bbox_to_anchor=(1.05,1),loc=2,borderaxespad=0.)

## Analyzing Calls per Month and Date

In [None]:
byMonth = df.groupby('month').count()
byMonth.head()
byMonth['lat'].plot()

sns.lmplot(x='month',y='twp',data=byMonth.reset_index())

In [None]:
df['Date']=df['timeStamp'].apply(lambda time: time.date())
df['Date'].head()

byDate = df.groupby('Date').count()
byDate['lat'].plot()
plt.tight_layout()

## Analyzing Specific Call Reasons Over Time

In [None]:
df[df['reason'] == 'Traffic'].groupby('Date').count()['lat'].plot()
plt.tight_layout()
plt.title('Traffic')

df[df['reason'] == 'Fire'].groupby('Date').count()['lat'].plot()
plt.tight_layout()
plt.title('Fire')

df[df['reason'] == 'EMS'].groupby('Date').count()['lat'].plot()
plt.tight_layout()
plt.title('EMS')

## Heatmaps and Clustermaps

In [None]:
dayHour = df.groupby(by=['day of week','hour']).count()['reason'].unstack()
dayHour.head()

sns.heatmap(dayHour,cmap='viridis')
sns.clustermap(dayHour,cmap='viridis')

In [None]:
dayMonth = df.groupby(by=['day of week','month']).count()['reason'].unstack()
dayMonth.head()

sns.heatmap(dayMonth,cmap='viridis')
sns.clustermap(dayMonth,cmap='viridis')

In [None]:
df['reason'].value_counts()

In [None]:
sns.countplot(x='reason',data=df)

In [None]:
type(df['timeStamp'].iloc[0])

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])

In [None]:
type(df['timeStamp'].iloc[0])

In [None]:
time = df['timeStamp'].iloc[0]
time.hour

In [None]:
df['hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['month'] = df['timeStamp'].apply(lambda time: time.month)
df['day of week'] = df['timeStamp'].apply(lambda time: time.dayofweek)
dmap = ({0:'Mon', 1:'Tue', 2:'Wed', 3:'Thu', 4:'Fri', 5:'Sat', 6:'Sun'})
df['day of week'] = df['day of week'].map(dmap)


In [None]:
df.head()

In [None]:
sns.countplot(x='day of week', data=df, hue='reason')


In [None]:
sns.countplot(x='month',data=df,hue='reason',palette='viridis')
plt.legend(bbox_to_anchor=(1.05,1),loc=2,borderaxespad=0.)

In [None]:
byMonth = df.groupby('month').count()
byMonth.head()

** Now create a simple plot off of the dataframe indicating the count of calls per month. **

In [None]:
byMonth['lat'].plot()

In [None]:
sns.lmplot(x='month',y='twp',data=byMonth.reset_index())

In [None]:
df['Date']=df['timeStamp'].apply(lambda time: time.date())
df['Date'].head()

In [None]:
byDate = df.groupby('Date').count()
byDate['lat'].plot()
plt.tight_layout()

In [None]:
df[df['reason'] == 'Traffic'].groupby('Date').count()['lat'].plot()
plt.tight_layout()
plt.title('Traffic')

In [None]:
df[df['reason'] == 'Fire'].groupby('Date').count()['lat'].plot()
plt.tight_layout()
plt.title('Fire')

In [None]:
df[df['reason'] == 'EMS'].groupby('Date').count()['lat'].plot()
plt.tight_layout()
plt.title('EMS')

In [None]:
dayHour = df.groupby(by=['day of week','hour']).count()['reason'].unstack()
dayHour.head()

In [None]:
sns.heatmap(dayHour,cmap='viridis')

In [None]:
sns.clustermap(dayHour,cmap='viridis')

In [None]:
dayMonth = df.groupby(by=['day of week','month']).count()['reason'].unstack()
dayMonth.head()

In [None]:
sns.heatmap(dayMonth,cmap='viridis')

In [None]:
sns.clustermap(dayMonth,cmap='viridis')