# Cars Accident Severity Prediction

By: Angga Bayu Prakhosha

Goal: Make a classifier for severity accident

Data: https://data-seattlecitygis.opendata.arcgis.com/datasets/5b5c745e0f1f48e7a53acec63a0022ab_0

Last Updated: 
 
This project is part of Data Science Capstone by Coursera. In this project, I try to classify car accident based on its severity. The severity of an accident can be classified into 4 categories: 

1. 0 = Unknown
2. 1, = Prop damage
3. 2 = Injury
4. 2b = Serious injury
5. 3 = Fatality

In this notebook, I try to use these variables as the features for the classifier:

1. ADDRTYPE : Collision address type: Alley, Block, Intersection.
2. PERSONCOUNT : The total number of people involved in the collision.
3. PEDCOUNT : The number of pedestrians involved in the collision.
4. PEDCYLCOUNT : The number of bicycles involved in the collision.
5. VEHCOUNT : The number of vehicles involved in the collision. 
6. INJURIES : The number of total injuries in the collision.
7. SERIOUSINJURIES : The number of serious injuries in the collision.
8. FATALITIES : The number of fatalities in the collision.
9. JUNCTIONTYPE : Category of junction at which collision took place.
10. INATTENTIONIND : Whether or not collision was due to inattention.
11. UNDERINFL : Whether or not a driver involved was under the influence of drugs or alcohol.

The rest of the attributes is described on this website: https://data-seattlecitygis.opendata.arcgis.com/datasets/5b5c745e0f1f48e7a53acec63a0022ab_0

In [59]:
import pandas as pd
import numpy as np

In [60]:
print('Hello Capstone Project Course!')

Hello Capstone Project Course!


In [61]:
data = pd.read_csv('Data-Collisions.csv')
data.dtypes

X                  float64
Y                  float64
OBJECTID             int64
INCKEY               int64
COLDETKEY            int64
REPORTNO            object
STATUS              object
ADDRTYPE            object
INTKEY             float64
LOCATION            object
EXCEPTRSNCODE       object
EXCEPTRSNDESC       object
SEVERITYCODE        object
SEVERITYDESC        object
COLLISIONTYPE       object
PERSONCOUNT          int64
PEDCOUNT             int64
PEDCYLCOUNT          int64
VEHCOUNT             int64
INJURIES             int64
SERIOUSINJURIES      int64
FATALITIES           int64
INCDATE             object
INCDTTM             object
JUNCTIONTYPE        object
SDOT_COLCODE       float64
SDOT_COLDESC        object
INATTENTIONIND      object
UNDERINFL           object
WEATHER             object
ROADCOND            object
LIGHTCOND           object
PEDROWNOTGRNT       object
SDOTCOLNUM         float64
SPEEDING            object
ST_COLCODE          object
ST_COLDESC          object
S

In [62]:
data.head()

Unnamed: 0,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,LOCATION,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,-122.356511,47.517361,1,327920,329420,3856094,Matched,Intersection,34911.0,17TH AVE SW AND SW ROXBURY ST,...,Dry,Daylight,,,,10,Entering at angle,0,0,N
1,-122.361405,47.702064,2,46200,46200,1791736,Matched,Block,,HOLMAN RD NW BETWEEN 4TH AVE NW AND 3RD AVE NW,...,Wet,Dusk,,5101020.0,,13,From same direction - both going straight - bo...,0,0,N
2,-122.317414,47.664028,3,1212,1212,3507861,Matched,Block,,ROOSEVELT WAY NE BETWEEN NE 47TH ST AND NE 50T...,...,Dry,Dark - Street Lights On,,,,30,From opposite direction - all others,0,0,N
3,-122.318234,47.619927,4,327909,329409,EA03026,Matched,Intersection,29054.0,11TH AVE E AND E JOHN ST,...,Wet,Dark - Street Lights On,,,,0,Vehicle going straight hits pedestrian,0,0,N
4,-122.351724,47.560306,5,104900,104900,2671936,Matched,Block,,WEST MARGINAL WAY SW BETWEEN SW ALASKA ST AND ...,...,Ice,Dark - Street Lights On,,9359012.0,Y,50,Fixed object,0,0,N


## Data Preparation

In [79]:
features = ['ADDRTYPE', 'PERSONCOUNT', 'PEDCOUNT', 'PEDCYLCOUNT', 'VEHCOUNT', 'INJURIES', 'SERIOUSINJURIES', 'FATALITIES', 'JUNCTIONTYPE', 'INATTENTIONIND', 'UNDERINFL', 'SEVERITYCODE']

df = data[features]
df.head()

Unnamed: 0,ADDRTYPE,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INJURIES,SERIOUSINJURIES,FATALITIES,JUNCTIONTYPE,INATTENTIONIND,UNDERINFL,SEVERITYCODE
0,Intersection,2,0,0,2,0,0,0,At Intersection (intersection related),,N,1
1,Block,2,0,0,2,0,0,0,Mid-Block (not related to intersection),Y,0,1
2,Block,2,0,0,2,1,0,0,Mid-Block (not related to intersection),,N,2
3,Intersection,3,1,0,1,1,0,0,At Intersection (intersection related),,N,2
4,Block,2,0,0,1,1,0,0,Mid-Block (not related to intersection),,0,2


Remove any missing data, but first we have to convert 'NaN' in INATTENTIONIND to 0 and also change 'Y' to 1. We should also change 'N' in 'UNDERINFL' to 0 and 'Y' to 1.

In [80]:
df['INATTENTIONIND'].fillna(0, inplace=True)
df["INATTENTIONIND"].replace('Y', 1, inplace=True)
df["UNDERINFL"].replace('Y', 1, inplace=True)
df["UNDERINFL"].replace('N', 0, inplace=True)
df["UNDERINFL"].replace('1', 1, inplace=True)
df["UNDERINFL"].replace('0', 0, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().fillna(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().replace(


Remove any missing values

In [81]:
df = df[df['JUNCTIONTYPE'] != 'Unknown']
df.dropna(inplace=True)

We should also encode our categorical data so that it can be fed into the model

In [82]:
from sklearn.preprocessing import LabelEncoder

ADDRTYPE_encoder = LabelEncoder()
JUNCTIONTYPE_encoder = LabelEncoder()
df['ADDRTYPE'] = ADDRTYPE_encoder.fit_transform(df['ADDRTYPE'])
df['JUNCTIONTYPE'] = JUNCTIONTYPE_encoder.fit_transform(df['JUNCTIONTYPE'])

Our final data would look like this

In [83]:
df.head()

Unnamed: 0,ADDRTYPE,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INJURIES,SERIOUSINJURIES,FATALITIES,JUNCTIONTYPE,INATTENTIONIND,UNDERINFL,SEVERITYCODE
0,2,2,0,0,2,0,0,0,1,0,0.0,1
1,1,2,0,0,2,0,0,0,4,1,0.0,1
2,1,2,0,0,2,1,0,0,4,0,0.0,2
3,2,3,1,0,1,1,0,0,1,0,0.0,2
4,1,2,0,0,1,1,0,0,4,0,0.0,2


## Model Creation

In [86]:
# Decision Tree