<a href="https://colab.research.google.com/github/Sriram-Sudharsan/911DataAnalytics/blob/main/EDA_911.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [70]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 
import seaborn as sns


In [71]:
from google.colab import drive
drive.mount("/content/gdrive")

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [72]:
#Reading the data from the csv file
df=pd.read_csv("/content/gdrive/My Drive/Dataset/911.csv")
df

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
0,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,2015-12-10 17:10:52,NEW HANOVER,REINDEER CT & DEAD END,1
1,40.258061,-75.264680,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,2015-12-10 17:29:21,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1
2,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,2015-12-10 14:39:21,NORRISTOWN,HAWS AVE,1
3,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,2015-12-10 16:47:36,NORRISTOWN,AIRY ST & SWEDE ST,1
4,40.251492,-75.603350,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,2015-12-10 16:56:52,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1
...,...,...,...,...,...,...,...,...,...
663517,40.157956,-75.348060,SUNSET AVE & WOODLAND AVE; EAST NORRITON; 2020...,19403.0,Traffic: VEHICLE ACCIDENT -,2020-07-29 15:46:51,EAST NORRITON,SUNSET AVE & WOODLAND AVE,1
663518,40.136306,-75.428697,EAGLEVILLE RD & BUNTING CIR; LOWER PROVIDENCE...,19403.0,EMS: GENERAL WEAKNESS,2020-07-29 15:52:19,LOWER PROVIDENCE,EAGLEVILLE RD & BUNTING CIR,1
663519,40.013779,-75.300835,HAVERFORD STATION RD; LOWER MERION; Station 3...,19041.0,EMS: VEHICLE ACCIDENT,2020-07-29 15:52:52,LOWER MERION,HAVERFORD STATION RD,1
663520,40.121603,-75.351437,MARSHALL ST & HAWS AVE; NORRISTOWN; 2020-07-29...,19401.0,Fire: BUILDING FIRE,2020-07-29 15:54:08,NORRISTOWN,MARSHALL ST & HAWS AVE,1


# **Understanding the Data**

### **Analyzing the shape of the 911 dataset**

In [73]:
df.shape
#No of rows, No of columns are displayed

(663522, 9)

### **Analyzing the column information**

In [74]:
len(df.columns)

9

In [75]:
df.columns

Index(['lat', 'lng', 'desc', 'zip', 'title', 'timeStamp', 'twp', 'addr', 'e'], dtype='object')

In [76]:
df.describe()

Unnamed: 0,lat,lng,zip,e
count,663522.0,663522.0,583323.0,663522.0
mean,40.158162,-75.300105,19236.055791,1.0
std,0.220641,1.672884,298.222637,0.0
min,0.0,-119.698206,1104.0,1.0
25%,40.100344,-75.392735,19038.0,1.0
50%,40.143927,-75.305143,19401.0,1.0
75%,40.229008,-75.211865,19446.0,1.0
max,51.33539,87.854975,77316.0,1.0


**Categorical Data - Stastical Description**

In [77]:
categorical_data=df.dtypes[df.dtypes==object].index   
df[categorical_data].describe() 

Unnamed: 0,desc,title,timeStamp,twp,addr
count,663522,663522,663522,663229,663522
unique,663282,148,640754,68,41292
top,CITY AVE & CARDINAL AVE; LOWER MERION; Statio...,Traffic: VEHICLE ACCIDENT -,2018-10-06 19:26:38,LOWER MERION,SHANNONDELL DR & SHANNONDELL BLVD
freq,5,148372,9,55490,7285


### **To check for NULL values**

In [78]:
df.isnull().sum()

lat              0
lng              0
desc             0
zip          80199
title            0
timeStamp        0
twp            293
addr             0
e                0
dtype: int64

### Observation : Zipcode and twp columns have null values

Using forward filling method, we fill the null values in the two columns

In [79]:
columns_fill=["zip","twp"]
df.loc[:,columns_fill]=df.loc[:,columns_fill].ffill()

In [80]:
df.isnull().sum()

lat          0
lng          0
desc         0
zip          0
title        0
timeStamp    0
twp          0
addr         0
e            0
dtype: int64

We notice all Null values have been removed

In [81]:
df.e.unique

<bound method Series.unique of 0         1
1         1
2         1
3         1
4         1
         ..
663517    1
663518    1
663519    1
663520    1
663521    1
Name: e, Length: 663522, dtype: int64>

Dropping the column name labelled "e" as it provides no useful data

In [82]:
df.drop("e",axis=1,inplace=True)

In [83]:
df.columns
#confirms column "e" has been removed

Index(['lat', 'lng', 'desc', 'zip', 'title', 'timeStamp', 'twp', 'addr'], dtype='object')


### DATA MANIPULATION


Mapping the week day to its corresponding number

In [84]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
time = df['timeStamp'].iloc[31]

print('Hour:',time.hour)
print('Month:',time.month)
print('Day of Week:',time.dayofweek)

df['Month'] = df['timeStamp'].apply(lambda x: x.month)
df['Day'] = df['timeStamp'].apply(lambda x: x.dayofweek)
df['Hour'] = df['timeStamp'].apply(lambda x: x.hour)
#Extracting year from timeStamp
df["Year"]=df["timeStamp"].dt.year
#Extracting date
df["Date"]=df["timeStamp"].dt.date

datemap = {0:'Sun',1:'Mon',2:'Tue',3:'Wed',4:'Thu',5:'Fri',6:'Sat'}

df['Day'] = df['Day'].map(datemap)

Hour: 18
Month: 12
Day of Week: 3


In [85]:
#df.drop("Day of Week",axis=1,inplace=True)

In [86]:
df.head(1)

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,Month,Day,Hour,Year,Date
0,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,2015-12-10 17:10:52,NEW HANOVER,REINDEER CT & DEAD END,12,Wed,17,2015,2015-12-10


### **Part2: Splitting title column into category of the emergency and reason**

In [87]:
df["Category"]=df["title"].str.split(":",expand=True)[0]

df["Reason"]=df["title"].str.split(":",expand=True)[1]
1
df.drop("title",axis=1,inplace=True)


In [88]:
df.columns  

Index(['lat', 'lng', 'desc', 'zip', 'timeStamp', 'twp', 'addr', 'Month', 'Day',
       'Hour', 'Year', 'Date', 'Category', 'Reason'],
      dtype='object')