# Capstone Project - Predicting Vehicle Collision Severity
### Applied Data Science Capstone by Carlos Fortin

## Introduction: Business Problem <a name="introduction"></a>

### Background

Today, most people in the developed world drive some type of motor vehicle daily. They drive to and from work, take their loved ones on vacations trips, their children to school and often do daily chores such as going to the supermarket utilizing their vehicles. Unfortunately, the more vehicles the more accidents take place. Some of these accidents are severe and can cost lives. 
For an accident to occur, there are many factors involved such as traffic violations (speeding, running a red light, etc.), mechanical failures such as a flat tire causing the driver to lose control and sometimes weather or road conditions might be poor causing the driver to struggle to keep control. 


### Problem 
The purpose of this report is to identify key features that increase the likelihood of an accident to occur and use them to create a model that can predict the severity of that accident so that future drivers might be able to understand the risk associated with their driving under specific conditions. 

## Data Aqcuisition and Cleaning 
### Source 
For this analysis and the model, the data from https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv was used. This data originally contained +190,000 rows and 38 columns (features). 

### Data Cleaning
Before any analysis and modeling, the data was first studied and cleansed. At first glance, there were several features that seemed to be redundant and missing so those were removed. Additionally, extreme outliers or data that seemed to be entered in mistake were also removed. 

In [3]:
import pandas as pd
import numpy as np 
import os

In [8]:
print("Hello Capstone Project Course!")

Hello Capstone Project Course!


In [4]:
#read US accident Data 
df = pd.read_csv("Data-Collisions.csv")
df.head(5)

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [21]:
df.shape

(194673, 38)

In [22]:
df.describe()

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,INTKEY,SEVERITYCODE.1,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,SDOT_COLCODE,SDOTCOLNUM,SEGLANEKEY,CROSSWALKKEY
count,194673.0,189339.0,189339.0,194673.0,194673.0,194673.0,65070.0,194673.0,194673.0,194673.0,194673.0,194673.0,194673.0,114936.0,194673.0,194673.0
mean,1.298901,-122.330518,47.619543,108479.36493,141091.45635,141298.811381,37558.450576,1.298901,2.444427,0.037139,0.028391,1.92078,13.867768,7972521.0,269.401114,9782.452
std,0.457778,0.029976,0.056157,62649.722558,86634.402737,86986.54211,51745.990273,0.457778,1.345929,0.19815,0.167413,0.631047,6.868755,2553533.0,3315.776055,72269.26
min,1.0,-122.419091,47.495573,1.0,1001.0,1001.0,23807.0,1.0,0.0,0.0,0.0,0.0,0.0,1007024.0,0.0,0.0
25%,1.0,-122.348673,47.575956,54267.0,70383.0,70383.0,28667.0,1.0,2.0,0.0,0.0,2.0,11.0,6040015.0,0.0,0.0
50%,1.0,-122.330224,47.615369,106912.0,123363.0,123363.0,29973.0,1.0,2.0,0.0,0.0,2.0,13.0,8023022.0,0.0,0.0
75%,2.0,-122.311937,47.663664,162272.0,203319.0,203459.0,33973.0,2.0,3.0,0.0,0.0,2.0,14.0,10155010.0,0.0,0.0
max,2.0,-122.238949,47.734142,219547.0,331454.0,332954.0,757580.0,2.0,81.0,6.0,2.0,12.0,69.0,13072020.0,525241.0,5239700.0


In [26]:
#drops all columns that will not be used for this analysis and create new Data Frame 
clean_df= df.drop(columns=['ADDRTYPE','INTKEY','X','Y','INCDATE','PEDROWNOTGRNT','INATTENTIONIND','ST_COLCODE','ST_COLDESC','HITPARKEDCAR', 'JUNCTIONTYPE','COLDETKEY','EXCEPTRSNCODE','EXCEPTRSNDESC','SEVERITYCODE.1','SDOT_COLCODE','SDOT_COLDESC','SDOTCOLNUM','SEGLANEKEY', 'CROSSWALKKEY','STATUS','REPORTNO'])

In [28]:
#Shows remaining Columns in DF 
list(clean_df.columns)


['SEVERITYCODE',
 'OBJECTID',
 'INCKEY',
 'LOCATION',
 'SEVERITYDESC',
 'COLLISIONTYPE',
 'PERSONCOUNT',
 'PEDCOUNT',
 'PEDCYLCOUNT',
 'VEHCOUNT',
 'INCDTTM',
 'UNDERINFL',
 'WEATHER',
 'ROADCOND',
 'LIGHTCOND',
 'SPEEDING']

In [29]:
clean_df.shape

(194673, 16)

In [30]:
#will replace all blank entries for weather, roadcondition and light condition with nan to later drop these 
#>>> df['columnname'].replace('', np.nan, inplace=True)
#>>> df.dropna(subset=['Tenant'], inplace=True)

clean_df['WEATHER'].replace('',np.nan,inplace=True)
clean_df['ROADCOND'].replace('',np.nan,inplace=True)
clean_df['LIGHTCOND'].replace('',np.nan,inplace=True)
clean_df['SPEEDING'].replace('',np.nan,inplace=True)
clean_df['UNDERINFL'].replace('',np.nan,inplace=True)

clean_df.dropna(subset=['WEATHER'], inplace=True)
clean_df.dropna(subset=['ROADCOND'], inplace=True)
clean_df.dropna(subset=['LIGHTCOND'], inplace=True)
clean_df.dropna(subset=['SPEEDING'], inplace=True)
clean_df.dropna(subset=['UNDERINFL'], inplace=True)

clean_df

Unnamed: 0,SEVERITYCODE,OBJECTID,INCKEY,LOCATION,SEVERITYDESC,COLLISIONTYPE,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INCDTTM,UNDERINFL,WEATHER,ROADCOND,LIGHTCOND,SPEEDING
24,2,33,1268,RAINIER AVE S AND S BRANDON ST,Injury Collision,Rear Ended,3,0,0,2,3/31/2013 10:05:00 AM,N,Clear,Dry,Daylight,Y
43,2,53,56100,OLSON PL SW BETWEEN 2ND AVE SW AND 3RD AVE SW,Injury Collision,Other,1,0,0,1,9/13/2006 10:46:00 PM,0,Raining,Wet,Dark - Street Lights On,Y
62,1,74,32000,35TH AVE SW BETWEEN 37TH AVE SW AND MARINE VIE...,Property Damage Only Collision,Parked Car,4,0,0,4,6/24/2004 7:43:00 PM,0,Clear,Dry,Daylight,Y
123,1,140,29700,MARION ST BETWEEN 2ND AVE AND 3RD AVE,Property Damage Only Collision,Rear Ended,2,0,0,2,3/5/2004,0,Raining,Wet,Daylight,Y
124,2,141,1135,HARVARD AVE AND E DENNY WAY,Injury Collision,Angles,2,0,0,2,3/29/2013 4:34:00 PM,Y,Clear,Dry,Daylight,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
194414,1,219238,309651,S HOLLY ST BETWEEN 30TH AVE S AND 31ST AVE S,Property Damage Only Collision,Parked Car,2,0,0,2,1/4/2019 12:24:00 AM,N,Raining,Wet,Dark - Street Lights On,Y
194428,2,219255,309595,10TH AVE E AND E ROY E ST,Injury Collision,Pedestrian,2,1,0,1,12/22/2018 3:15:00 PM,N,Overcast,Dry,Dusk,Y
194481,2,219317,308340,AIRPORT WAY S BETWEEN S HARDY ST AND S OTHELLO ST,Injury Collision,Other,3,0,0,2,12/11/2018 9:15:00 AM,N,Raining,Wet,Daylight,Y
194492,1,219329,308810,BATTERY ST TUNNEL SB BETWEEN AURORA AVE N AND ...,Property Damage Only Collision,Rear Ended,4,0,0,3,10/25/2018 8:52:00 PM,N,Raining,Wet,Dark - Street Lights On,Y


In [31]:
clean_df.shape


(9319, 16)