# 911 Calls Exploratory Data Analysis

*If you have any comments/suggestions feel free to comment below.*

*As the saying goes, 'There is always scope for improvement' (This saying is wholly true for me because I'm new to data science 😃)*

The data contains:

* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address
* e: String variable, Dummy variable (always 1)

## Data and Setup

Importing necessary visualization libraries and reading the dataset.

In [None]:
import numpy as np
import pandas as pd

In [None]:
%matplotlib inline
import matplotlib.pyplot as plot
import seaborn as sns
sns.set_style('whitegrid')


In [None]:
plot.rcParams["figure.figsize"]=(30,15)
SMALL_SIZE = 20
MEDIUM_SIZE = 22
BIG_SIZE = 25

plot.rc('font', size=BIG_SIZE)          # controls default text sizes
plot.rc('axes', titlesize=BIG_SIZE)     # fontsize of the axes title
plot.rc('axes', labelsize=BIG_SIZE)    # fontsize of the x and y labels
plot.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plot.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plot.rc('legend', fontsize=MEDIUM_SIZE)    # legend fontsize
plot.rc('figure', titlesize=BIG_SIZE)  # fontsize of the figure title

In [None]:
df=pd.read_csv("../input/montcoalert/911.csv")

In [None]:
df.info()

In [None]:
df.head()

## Exploring the basics of the database

**The top 5 zipcodes for 911 calls (and crimes ☠️)**

In [None]:
df['zip'].value_counts().head(5)

**The top 5 townships (twp) for 911 calls  (and crimes ☠️)**

In [None]:
df['twp'].value_counts().head(5)

**The top 5 addresses (addr) for 911 calls (and crimes ☠️)**

In [None]:
df['addr'].value_counts().head(5)

## Creating new features from the dataset

On a quick glance we can see that the Department and injury types are in the 'title' column and it is structured as **Department:Injury type**.

Let's extract Dept of call from the 'title' column.

In [None]:
def extract(Title):
   new=Title.split(':',1)
   return new[0]

df['Dept']=df.apply(lambda x:extract(x["title"]),axis=1)

Now, let's extract the injury type from the 'title' column.

In [None]:
def extract2(Title):
   new=Title.split(':',1)
   return new[1]

df['ExactR']=df.apply(lambda x:extract(x["title"]),axis=1)

# Let's plot these new features now

**A countplot of 911 calls by Departments**

In [None]:
fig,ax = plot.subplots()
b=sns.countplot(df['Dept'],data=df,palette='Set1',ax=ax)
b.axes.set_title("Dept vs Number of calls")
b.set_xlabel("Department")
b.set_ylabel("Number of calls")
plot.savefig('noofcalls.jpg', format='jpeg', dpi=70)


**A countplot of top 5 reasons for 911 calls**

In [None]:
fig,ax = plot.subplots()
b=sns.countplot(x=df['ExactR'],palette='Set1',data=df,order=df.ExactR.value_counts().iloc[:5].index,ax=ax)
b.axes.set_title("Reasons vs Number of calls")
b.set_xlabel("Reason")
b.set_ylabel("Number of calls")
plot.savefig('noofcallsbyreason.jpg', format='jpeg', dpi=70)

___
**Now let us begin to focus on time of these calls. What is the data type of the objects in the timeStamp column?**

In [None]:
df['timeStamp'].astype('str')

**Let's convert object types (which are actually strings pointed by object) to the date time format**

In [None]:
df['timeStamp']=pd.to_datetime(df['timeStamp'])
df['timeStamp']

**You can now grab specific attributes from a Datetime object by calling them. For example:**

    time = df['timeStamp'].iloc[0]
    time.hour

**Creating 3 new columns called Hour, Month, and Day of Week to better compare and analyze the data. The day month and hour of the accident matter if we want to analysize the data better and works towards reducing the number of calls (and mishaps/accidents)**

In [None]:
df['Hour']=pd.Series(df['timeStamp'].apply(lambda time: time.hour))
df['Month']=pd.Series(df['timeStamp'].apply(lambda time: time.month))
df['Day']=pd.Series(df['timeStamp'].apply(lambda time: time.day_name()))

In [None]:
df.info()

I have converted weekdays to their shorter forms for better labelling in plots.

In [None]:
df['Day'] = df['Day'].map({'Monday': 'Mon', 'Tuesday': 'Tue','Wednesday': 'Wed','Thursday': 'Thu','Friday': 'Fri','Saturday': 'Sat','Sunday': 'Sun'}).astype(str)

**911 calls on days of the week**

In [None]:
fig,ax=plot.subplots()
sns.countplot(df['Day'],hue=df['Dept'],palette='Set1',order=df['Day'].value_counts().index,ax=ax) # Checking for the weekday with most calls
plot.legend(bbox_to_anchor=(1.05, 1),loc=2, borderaxespad=0)

Interesting, Fridays are at the number 1 position for 911 calls. (Freaky Friday for real 😨.)

Let's now see whether this trend continues if we split the data to different seasons of the month.

In [None]:
fig,ax=plot.subplots(ncols=3)
sns.countplot(df['Day'],data=df,palette='Set1',order = df['Day'].value_counts().index, ax=ax[0]).set_title('All seasons')
sns.countplot(df['Day'][df['Month'].isin([10,11,12,1,2,3])],palette='Set1',order = df['Day'].value_counts().index,ax=ax[1]).set_title('Winter')
sns.countplot(df['Day'][df['Month'].isin([4,5,6,7,8,9])],palette='Set1',order = df['Day'].value_counts().index,ax=ax[2]).set_title('Summer')

Even after seperating the data in seasons (summer/winter), 
Fridays still have a higher than usual count for 911 calls.

*The data suggests that if the police wants to effectively(crimes vs patrols) boost patrols in a chosen weekday, it should be Fridays.*

**911 calls by Month:**

In [None]:
fig,ax=plot.subplots()
sns.countplot(df['Month'],hue=df['Dept'],palette='Set1',ax=ax)
plot.legend(bbox_to_anchor=(1.05,1), loc=2, borderaxespad=0)

Let's try another way of analyzing the data and plot it according to months.

**We create a group by object called byMonth, where you group the DataFrame by the month column and use the count() method for aggregation.**

In [None]:
bymonth=df.groupby('Month')['Dept'].count() # Used only one columns to save space and reduce redundancy
bymonth

**A simple plot off of the dataframe indicating the count of calls per month**

In [None]:
bymonth.plot()

The number of calls go up as the season changes from summers to winters and falls back to the lowest around September (9th month)

**Let's use seaborn's lmplot() to create a linear fit on the number of calls per month. We needed to reset index because month is not an column yet.**

In [None]:
sns.lmplot(x='Month',y='Dept',data=bymonth.reset_index(),height=8,aspect=2)

**The linear regression model suggests a similar story.** 

"*In winters not only the weather becomes cold, people do too.*"

Let's split the dates from 'timestamp' for further analysis

**Created a new column called 'Date' that contains the date from the timeStamp column.** 

In [None]:
df['Date']=pd.Series(df['timeStamp'].apply(lambda x:x.date()))

**Grouping by this Date column with the count() aggregate and creating a plot of counts of 911 calls.**

In [None]:
df.groupby('Date').count()['title'].plot()
plot.title('All calls to 911')

**Calls due to Traffic, according to date**

In [None]:
df[df['Dept']=='Traffic'].groupby('Date').count()['Dept'].plot()
plot.title('911 calls due to traffic')

**Calls due to Fire, according to date**

In [None]:
df[df['Dept']=='Fire'].groupby('Date').count()['Dept'].plot()
plot.title('911 calls due to Fire')

**Calls to EMS, according to date**

In [None]:
df[df['Dept']=='EMS'].groupby('Date').count()['Dept'].plot()
plot.title('911 calls to Emergency Medical Services')

____
**Now let's move on to creating  heatmaps with seaborn and our data. We'll first need to restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week. We do this by trying to combine groupby with an [unstack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html) method.** 

Note that without unstack, the values would be stacked according to the weekdays in axis 0. Example-

In [None]:
df.groupby(by=['Day','Hour']).count()['Dept']

# Heatmaps and Clustermaps for a fresh perspective at the data.

Create dataframe using groupby on Days and Hours and using the unstack() method.

In [None]:
de=df.groupby(by=['Day','Hour']).count()['Dept'].unstack()
de.head()

**A HeatMap of Days vs Hours**

In [None]:

sns.heatmap(de,cmap='summer_r')

**A clustermap using this DataFrame**

In [None]:
sns.clustermap(de,cmap='summer_r')

Creating a dataframe using groupby on Days and Months

In [None]:
dm=df.groupby(by=['Day','Month']).count()['Dept'].unstack()
dm.head()

**Heatmap of Days vs Months.**

In [None]:
sns.heatmap(dm,cmap='summer_r')

**A cluster map according to months.**

In [None]:
sns.clustermap(dm,cmap='summer_r')

**Thank you for time. I hope this was a useful exploratory data analysis.** 


*If you have any comments/suggestions feel free to comment below.*

*As the saying goes, 'There is always scope for improvement' (Haha this saying is wholly true for me because I'm spanking new to this ☺️)*