# 911 Calls Capstone Project

Analyzing some 911 call data from [Kaggle](https://www.kaggle.com/mchirico/montcoalert). The data contains the following fields:

* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address
* e: String variable, Dummy variable (always 1)

## Data and Setup

____
** Import numpy and pandas **

In [None]:
import numpy as np
import pandas as pd

** Import visualization libraries and set %matplotlib inline. **

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

** Read in the csv file as a dataframe called df **

In [None]:
df=pd.read_csv("911.csv")

** Check the info() of the df **

In [None]:
df.info()

** Check the head of df **

In [None]:
df.head()

Top 5 zipcodes for 911 calls

In [None]:
df['zip'].value_counts().head(5)

Top 5 townships (twp) for 911 calls

In [None]:
df['twp'].value_counts().head(5)

Number of unique title codes

In [None]:
df['title'].value_counts().count()

## Creating new features

Used a lambda expression to create a new column called "Reason" that contains a certain string value

**For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS. **

In [None]:
df['Reason']=df['title'].apply(lambda x: x.split(':')[0])
df

Most common Reason for a 911 call based off of this new column

In [None]:
df['Reason'].value_counts()

Seaborn to create a countplot of 911 calls by Reason.

In [None]:
sns.set_style("darkgrid")
sns.countplot(x='Reason',data=df)
sns.despine()

Data type of the objects in the timeStamp column

In [None]:
type(df['timeStamp'].iloc[0])

Used [pd.to_datetime](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html) to convert the column from strings to DateTime objects.

In [None]:
df['timeStamp']=pd.to_datetime(df['timeStamp'])

usec .apply() to create 3 new columns called Hour, Month, and Day of Week

In [None]:
df['Hour']=df['timeStamp'].apply(lambda time:time.hour)
df['Month']=df['timeStamp'].apply(lambda time: time.month)
df['Day of Week']=df['timeStamp'].apply(lambda time: time.dayofweek)

Used the .map() with this dictionary to map the actual string names to the day of the week:

In [None]:
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
df['Day of Week']=df['Day of Week'].map(dmap)

Used seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column.

In [None]:
sns.countplot(x='Day of Week',data=df, hue='Reason',palette='viridis')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

same for Month

In [None]:
sns.countplot(x='Month',data=df, hue='Reason',palette='viridis')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

Created a groupby object called byMonth, where you group the DataFrame by the month column and use the count() method for aggregation.

In [None]:
byMonth=df.groupby('Month').count()
byMonth.head()

Created a simple plot off of the dataframe indicating the count of calls per month.

In [None]:
byMonth['twp'].plot()

Used seaborn's lmplot() to create a linear fit on the number of calls per month.

In [None]:
sns.lmplot(x='Month',y='twp',data=byMonth.reset_index())

Created a new column called 'Date' that contains the date from the timeStamp column.

In [None]:
df['Date']=df['timeStamp'].apply(lambda dte: dte.date())

Used groupby on Date column with the count() aggregate and create a plot of counts of 911 calls.

In [None]:
byDate=df.groupby('Date').count()
byDate['twp'].plot()
plt.tight_layout()

Recreated this plot with 3 separate plots with each plot representing a Reason for the 911 call.

In [None]:
df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.tight_layout()
plt.title("Traffic")

In [None]:
df[df["Reason"]=='Fire'].groupby('Date').count()['twp'].plot()
plt.tight_layout()
plt.title("Fire")

In [None]:
df[df["Reason"]=='EMS'].groupby('Date').count()['twp'].plot()
plt.tight_layout()
plt.title("Fire")

Creating  heatmaps with seaborn and our data. Restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week using [unstack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html) method.

In [None]:
dayHour = df.groupby(by=['Day of Week','Hour']).count()['Reason'].unstack()
dayHour.head()

Create a HeatMap using this new DataFrame.

In [None]:
sns.heatmap(dayHour,linecolor='white',linewidths=1)

Created a clustermap using this DataFrame.

In [None]:
sns.clustermap(dayHour)

Repeated these same plots and operations, for a DataFrame that shows the Month as the column.

In [None]:
dayMonth=df.groupby(by=['Day of Week','Month']).count()['Reason'].unstack()
dayMonth.head()

In [None]:
sns.heatmap(dayMonth, linecolor='white',linewidths=1)

In [None]:
sns.clustermap(dayMonth)

**Continue exploring the Data however you see fit!**
# Great Job!