# Car Accidents Severity Prediction using Machine Learning




#### By: Ahmad Khairi Bin Ahmad Khir

### Business Understanding

Multiple of factors and conditions can influence the severity of a car accident. With the help of an algorithm, the prediction of severity of an accident can be done with the help of variable such as weather condition, road condition, light condition and so on. Using this data, a model can be construct to predict the severity of an accident bound to happen based on the past historical dataset that contains the attribute that factors to an accident. Thus, in order to reduce the frequency of accidents in a certain area, machine learning can be apply to these problem which will helps the local government or GPS service provider to determine which sort of conditions and factors may contributed to a car accidents and exercise necessary action to minimize the probability of a road user from being involved in a car accidents.

### Data Understanding

In this project, we used the datasets from the Seattle SDOT Traffic Management Division which titled Collisions – All years. We cleaned the datasets by using the ‘SEVERITYCODE’ column as our target variable with degree of 0 to 5 in terms of severity. The predictor variables or features used in this project are the ‘JUNCTIONTYPE’, ‘WEATHER’, ‘ROADCOND’, ‘LIGHTCOND’ and ‘SPEEDING’. The features which of originally datatype object were converted into datatype numerical for analysis and also the dataset were balanced which the disparity between the severity categories may resulted in inaccurate prediction by the model.

In [None]:
import pandas as pd
import numpy as np

In [None]:
dataurl = 'https://raw.githubusercontent.com/astradrel/Capstone-Project/main/Data-Collisions.csv'
df = pd.read_csv(dataurl)

In [20]:
df.head()

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [111]:
df_accident = df[['JUNCTIONTYPE', 'ROADCOND', 'WEATHER', 'LIGHTCOND', 'SPEEDING','UNDERINFL','OBJECTID','SEVERITYCODE',]]
df_accident.head()

Unnamed: 0,JUNCTIONTYPE,ROADCOND,WEATHER,LIGHTCOND,SPEEDING,UNDERINFL,OBJECTID,SEVERITYCODE
0,At Intersection (intersection related),Wet,Overcast,Daylight,,N,1,2
1,Mid-Block (not related to intersection),Wet,Raining,Dark - Street Lights On,,0,2,1
2,Mid-Block (not related to intersection),Dry,Overcast,Daylight,,0,3,1
3,Mid-Block (not related to intersection),Dry,Clear,Daylight,,N,4,1
4,At Intersection (intersection related),Wet,Raining,Daylight,,0,5,2


In [112]:
# Cleaning the column JUNCTIONTYPE

df_accident = df_accident[df_accident['JUNCTIONTYPE'].notna()]

df_accident = df_accident[df_accident['JUNCTIONTYPE'] != 'Unknown']

df_accident['JUNCTIONTYPE'].replace({"Mid-Block (not related to intersection)": "Mid-Block", 
                                     "Mid-Block (but intersection related)": "Mid-Block",
                                     "At Intersection (but not related to intersection)": "Intersection",
                                     "At Intersection (intersection related)": "Intersection",
                                    "Driveway Junction" : "Driveway",
                                    "Ramp Junction" : "Ramp"}, inplace=True)

df_accident['JUNCTIONTYPE'].value_counts()

Mid-Block       112590
Intersection     64908
Driveway         10671
Ramp               166
Name: JUNCTIONTYPE, dtype: int64

In [113]:
# Cleaning the column ROADCOND

df_accident = df_accident[df_accident['ROADCOND'].notna()]

df_accident = df_accident[df_accident['ROADCOND'] != 'Unknown']
df_accident = df_accident[df_accident['ROADCOND'] != 'Other']

df_accident['ROADCOND'].replace({"Ice": "Wet", 
                                 "Snow/Slush": "Wet", 
                                 "Standing Water": "Wet", 
                                 "Sand/Mud/Dirt" : "Dry",
                                "Oil" : "Wet"}, inplace=True)

df_accident['ROADCOND'].value_counts()

Dry    122530
Wet     49146
Name: ROADCOND, dtype: int64

In [114]:
# Cleaning the column WEATHER

df_accident = df_accident[df_accident['WEATHER'].notna()]

df_accident['WEATHER'].replace({"Unknown" : "Other",
                                 "Snowing": "Other", 
                                 "Fog/Smog/Smoke": "Other",
                                "Sleet/Hail/Freezing Rain" : "Other",
                                 "Blowing Sand/Dirt" : "Other",
                                "Severe Crosswind" : "Other",
                                "Partly Cloudy" : "Other"}, inplace=True)

df_accident['WEATHER'].value_counts()

Clear       108778
Raining      32659
Overcast     26899
Other         3240
Name: WEATHER, dtype: int64

In [115]:
# Cleaning the column LIGHTCOND

df_accident = df_accident[df_accident['LIGHTCOND'].notna()]

df_accident['LIGHTCOND'].replace({"Dark - Street Lights On" : "Dark",
                                 "Unknown": "Other", 
                                 "Dark - No Street Lights": "Dark",
                                "Dark - Street Lights Off" : "Dark",
                                 "Dark - Unknown Lighting" : "Dark"}, inplace=True)

df_accident['LIGHTCOND'].value_counts()

Daylight    111622
Dark         48970
Dusk          5618
Other         2804
Dawn          2399
Name: LIGHTCOND, dtype: int64

In [116]:
# Cleaning the column SPEEDING

df_accident['SPEEDING'].fillna('N', inplace=True)

In [117]:
# Cleaning the column UNDERINFL

df_accident['UNDERINFL'].replace({"0" : "N", "1": "Y"}, inplace=True)

In [130]:
# Resetting the index
df_accident.reset_index(inplace=True)
df_accident = df_accident.drop(['index'], axis=1)
df_accident = df_accident.rename_axis('ID')

In [138]:
# Display the cleaned and formatted datasets
df_accident

Unnamed: 0_level_0,JUNCTIONTYPE,ROADCOND,WEATHER,LIGHTCOND,SPEEDING,UNDERINFL,OBJECTID,SEVERITYCODE
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,Intersection,Wet,Overcast,Daylight,N,N,1,2
1,Mid-Block,Wet,Raining,Dark,N,N,2,1
2,Mid-Block,Dry,Overcast,Daylight,N,N,3,1
3,Mid-Block,Dry,Clear,Daylight,N,N,4,1
4,Intersection,Wet,Raining,Daylight,N,N,5,2
...,...,...,...,...,...,...,...,...
171408,Mid-Block,Dry,Clear,Daylight,N,N,219543,2
171409,Mid-Block,Wet,Raining,Daylight,N,N,219544,1
171410,Intersection,Dry,Clear,Daylight,N,N,219545,2
171411,Intersection,Dry,Clear,Dusk,N,N,219546,2


In [140]:
df_accident.to_csv(r'C:\\Users\\Ahmad Khairi\\Desktop\\Data Science Capstone\\Capstone-Project\\Exported-Collisions-Data.csv', index = False)