# IBM Data Science Capstone Project #

This notebook is used for data science capstone project.

## Introduction ##

In this project, we aim to predict the severity of traffic accidents from road condition. Although this project is a rudimentary modelling project, when coupled with a map, this type of tool can be used for planning travel routes and time in order to avoid run-in with traffic jams caused by collisions or even being in a collision ourselves. Hence, aside from any driver, this project may be of interest as an extended feature to online map and navigation service providers.

## Data ##

The data set we will be using is on one on __[collisions in Seattle area](https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv)__ collected by SPD with metadata available __[here](https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Metadata.pdf)__. 

The collisions are classified into 5 different level of severity: fatality, serious injury, injury, prop damage and unknown. The data also provides detail of the crash, including the type of collision, the parties involved and the number of people killed and injured. The circumstance around the collision is recorded in terms of the road condition, the weather, light condition, whether drugs or alcohol or speeding was involved

Given this data set, the most straightforward model is to used the circumstances around the collisions as attributes to predict the severity of the collision, disregarding the parties involved and the type of collision in the first attempt of the model for simplicity. Supervised learning will be used to obtain this model as the data is labeled.


In [2]:
#importing the data from data asset

import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

In [3]:
# The code was removed by Watson Studio for sharing.

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


## Methodology ##



In [4]:
#For importing libraries
import numpy as np


In [26]:
#First we will create the dataframe that contains only the attributes that we will use and the target with just the severity code.
X = df_collision[['INATTENTIONIND','UNDERINFL','WEATHER','ROADCOND','LIGHTCOND','SPEEDING']]
X.head()

Unnamed: 0,INATTENTIONIND,UNDERINFL,WEATHER,ROADCOND,LIGHTCOND,SPEEDING
0,,N,Overcast,Wet,Daylight,
1,,0,Raining,Wet,Dark - Street Lights On,
2,,0,Overcast,Dry,Daylight,
3,,N,Clear,Dry,Daylight,
4,,0,Raining,Wet,Daylight,


In [27]:
Y=df_collision[['SEVERITYCODE']]
Y.head()

Unnamed: 0,SEVERITYCODE
0,2
1,1
2,1
3,1
4,2


In [36]:
##Now we need to pre-process X

#Let's start with INATTENTION column
#X['INATTENTIONIND'].unique()
#The output is [nan, 'Y'] but it should be Y/N only, so we're converting Y->1 and nan->0
X['INATTENTIONIND'].fillna(0, inplace=True)
X['INATTENTIONIND'].replace('Y',1, inplace=True)

X['UNDERINFL'].unique() 
#The ouput is ['N', '0', nan, '1', 'Y'] but we only want yes and no without the need to know if it's drug or alcohol, so we'll convert anything to 0 and 1
X['UNDERINFL'].replace('Y',1, inplace=True)
X['UNDERINFL'].replace('1',1, inplace=True)
X['UNDERINFL'].replace('N',0, inplace=True)
X['UNDERINFL'].replace('0',0, inplace=True)
#check how many nan
X['UNDERINFL'].isna().sum() #4884 rows, a lot. Let's say no drug or alcohol is involved. Innocence until proven guilty.
X['UNDERINFL'].fillna(0, inplace=True) 

#Now let's work on speeding
X['SPEEDING'].unique()
#Again we have [nan, 'Y'] instead of Y/N
X['SPEEDING'].fillna(0, inplace=True)
X['SPEEDING'].replace('Y',1, inplace=True)


In [40]:
#Now we look into how to procss the columns with text input

X['WEATHER'].unique()

#So the result is 
#['Overcast', 'Raining', 'Clear', nan, 'Unknown', 'Other', 'Snowing',
#       'Fog/Smog/Smoke', 'Sleet/Hail/Freezing Rain', 'Blowing Sand/Dirt',
#       'Severe Crosswind', 'Partly Cloudy'

#We should be able to group these inputs in the 3 categories: not an obstruction to driving (0), obstruction to driving (1), and unknown (2)
# (2) nan, 'Unknown', 'Other'
# (0)'Clear', 'Overcast', Partly Cloudy'
# (1)'Snowing', 'Fog/Smog/Smoke', 'Sleet/Hail/Freezing Rain', 'Blowing Sand/Dirt', 'Severe Crosswind', 'Raining'

#Unknown category
X['WEATHER'].fillna(2, inplace=True)
X['WEATHER'].replace('Unknown',2, inplace=True)
X['WEATHER'].replace('Other',2, inplace=True)

#Clear category
X['WEATHER'].replace('Clear',0, inplace=True)
X['WEATHER'].replace('Overcast',0, inplace=True)
X['WEATHER'].replace('Partly Cloudy',0, inplace=True)

#Obstructive
X['WEATHER'].replace('Snowing',1, inplace=True)
X['WEATHER'].replace('Fog/Smog/Smoke',1, inplace=True)
X['WEATHER'].replace('Sleet/Hail/Freezing Rain',1, inplace=True)
X['WEATHER'].replace('Blowing Sand/Dirt',1, inplace=True)
X['WEATHER'].replace('Severe Crosswind',1, inplace=True)
X['WEATHER'].replace('Raining',1, inplace=True)

X.head()

Unnamed: 0,INATTENTIONIND,UNDERINFL,WEATHER,ROADCOND,LIGHTCOND,SPEEDING
0,0,0.0,0,Wet,Daylight,0
1,0,0.0,1,Wet,Dark - Street Lights On,0
2,0,0.0,0,Dry,Daylight,0
3,0,0.0,0,Dry,Daylight,0
4,0,0.0,1,Wet,Daylight,0


In [43]:
X['ROADCOND'].unique()

#The result is 
# ['Wet', 'Dry', nan, 'Unknown', 'Snow/Slush', 'Ice', 'Other',
#       'Sand/Mud/Dirt', 'Standing Water', 'Oil']

#Again, let's divide into 3 categories
# (2) nan, 'Unknown', 'Other',
# (0) 'Dry',
# (1) 'Wet', 'Snow/Slush', 'Ice', 'Sand/Mud/Dirt', 'Standing Water', 'Oil'

#Unknown category
X['ROADCOND'].fillna(2, inplace=True)
X['ROADCOND'].replace('Unknown',2, inplace=True)
X['ROADCOND'].replace('Other',2, inplace=True)

#Clear category
X['ROADCOND'].replace('Dry',0, inplace=True)

#Obstructive
X['ROADCOND'].replace('Wet',1, inplace=True)
X['ROADCOND'].replace('Snow/Slush',1, inplace=True)
X['ROADCOND'].replace('Ice',1, inplace=True)
X['ROADCOND'].replace('Sand/Mud/Dirt',1, inplace=True)
X['ROADCOND'].replace('Standing Water',1, inplace=True)
X['ROADCOND'].replace('Oil',1, inplace=True)
    
X.head()

Unnamed: 0,INATTENTIONIND,UNDERINFL,WEATHER,ROADCOND,LIGHTCOND,SPEEDING
0,0,0.0,0,1,Daylight,0
1,0,0.0,1,1,Dark - Street Lights On,0
2,0,0.0,0,0,Daylight,0
3,0,0.0,0,0,Daylight,0
4,0,0.0,1,1,Daylight,0


In [46]:
X['LIGHTCOND'].unique()

#The result is ['Daylight', 'Dark - Street Lights On', 'Dark - No Street Lights',
#       nan, 'Unknown', 'Dusk', 'Dawn', 'Dark - Street Lights Off',
#       'Other', 'Dark - Unknown Lighting']

# (2) nan, 'Unknown',
# (0) 'Daylight',
# (1)

array(['Daylight', 'Dark - Street Lights On', 'Dark - No Street Lights',
       nan, 'Unknown', 'Dusk', 'Dawn', 'Dark - Street Lights Off',
       'Other', 'Dark - Unknown Lighting'], dtype=object)