# Capstone Project - Car accident severity

### Applied Data Science Capstone by IBM/Coursera

## Table of contents
            1.Introduction: Business Problem
            2.Data

## Introduction: Business Problem and Background ¶

The Seattle government is going to prevent avoidable car accidents by employing methods that alert drivers, health system, and police to remind them to be more careful in critical situations. In most cases, not paying enough attention during driving, abusing drugs and alcohol or driving at very high speed are the main causes of occurring accidents that can be prevented by enacting harsher regulations.

Besides the aforementioned reasons, weather, visibility, or road conditions are the major uncontrollable factors that can be prevented by revealing hidden patterns in the data and announcing warning to the local government, police and drivers on the targeted roads.

The target audience of the project is local Seattle government, police, rescue groups, and last but not least, car insurance institutes. The model and its results are going to provide some advice for the target audience to make insightful decisions for reducing the number of accidents and injuries for the city.

## Data

We chose the unbalanced dataset provided by the Seattle Department of Transportation Traffic Management Division with 194673 rows (accidents) and 37 columns (features) where each accident is given a severity code. It covers accidents from January 2004 to May 2020. Some of the features in this dataset include and are not limited to Severity code, Location/Address of accident, Weather condition at the incident site, Driver state (whether under influence or not), collision type. Hence we think its a good generalized dataset which will help us in creating an accurate predictive model. The unbalance with respect to the severity code in the dataset is as follows.

SEVERITY CODE Count

1 — 136485

2 — 58188

Other important variables include:

        1.ADDRTYPE: Collision address type: Alley, Block, Intersection
        2.LOCATION: Description of the general location of the collision
        3.PERSONCOUNT: The total number of people involved in the collision helps identifyseverity level
        4.PEDCOUNT: The number of pedestrians involved in the collision helps identify severity level
        5.PEDCYLCOUNT: The number of bicycles involved in the collision helps identify severity level
        6.VEHCOUNT: The number of vehicles involved in the collision identify severity level
        7.JUNCTIONTYPE: Category of junction at which collision took place helps identify where most collisions occur
        8.WEATHER: A description of the weather conditions during the time of the collision
        9.ROADCOND: The condition of the road during the collision
        10.LIGHTCOND: The light conditions during the collision
        11.SPEEDING: Whether or not speeding was a factor in the collision (Y/N)
        12.SEGLANEKEY: A key for the lane segment in which the collision occurred
        13.CROSSWALKKEY: A key for the crosswalk at which the collision occurred
        14.HITPARKEDCAR: Whether or not the collision involved hitting a parked car


## Getting Data

In [3]:
import pandas as pd
import numpy as np
df = pd.read_csv("C:/Users/DELL/Downloads/Data-Collisions.csv")
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [4]:
print(df["SEVERITYCODE"].value_counts())
print('-'*50)
y = df["SEVERITYCODE"].values
df.drop(["SEVERITYCODE"], axis=1, inplace=True)
df.drop(["SEVERITYCODE.1"], axis=1, inplace=True)
print("Number of data points in data", df.shape)
print("Number of data points in label", y.shape)

1    136485
2     58188
Name: SEVERITYCODE, dtype: int64
--------------------------------------------------
Number of data points in data (194673, 36)
Number of data points in label (194673,)


In [5]:
df.columns

Index(['X', 'Y', 'OBJECTID', 'INCKEY', 'COLDETKEY', 'REPORTNO', 'STATUS',
       'ADDRTYPE', 'INTKEY', 'LOCATION', 'EXCEPTRSNCODE', 'EXCEPTRSNDESC',
       'SEVERITYDESC', 'COLLISIONTYPE', 'PERSONCOUNT', 'PEDCOUNT',
       'PEDCYLCOUNT', 'VEHCOUNT', 'INCDATE', 'INCDTTM', 'JUNCTIONTYPE',
       'SDOT_COLCODE', 'SDOT_COLDESC', 'INATTENTIONIND', 'UNDERINFL',
       'WEATHER', 'ROADCOND', 'LIGHTCOND', 'PEDROWNOTGRNT', 'SDOTCOLNUM',
       'SPEEDING', 'ST_COLCODE', 'ST_COLDESC', 'SEGLANEKEY', 'CROSSWALKKEY',
       'HITPARKEDCAR'],
      dtype='object')

## Data Preprocessing and Data Cleaning

### removal of unnecessary columns 

In [6]:
df.drop(["OBJECTID", "INCKEY", "COLDETKEY", "REPORTNO", "STATUS","INTKEY", "EXCEPTRSNCODE", "EXCEPTRSNDESC", "INATTENTIONIND", "UNDERINFL", "PEDROWNOTGRNT", "SDOT_COLDESC", "LOCATION"], axis=1, inplace=True)

In [8]:
df.shape

(194673, 23)

In [9]:
df.head()

Unnamed: 0,X,Y,ADDRTYPE,SEVERITYDESC,COLLISIONTYPE,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INCDATE,...,WEATHER,ROADCOND,LIGHTCOND,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,-122.323148,47.70314,Intersection,Injury Collision,Angles,2,0,0,2,2013/03/27 00:00:00+00,...,Overcast,Wet,Daylight,,,10,Entering at angle,0,0,N
1,-122.347294,47.647172,Block,Property Damage Only Collision,Sideswipe,2,0,0,2,2006/12/20 00:00:00+00,...,Raining,Wet,Dark - Street Lights On,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,-122.33454,47.607871,Block,Property Damage Only Collision,Parked Car,4,0,0,3,2004/11/18 00:00:00+00,...,Overcast,Dry,Daylight,4323031.0,,32,One parked--one moving,0,0,N
3,-122.334803,47.604803,Block,Property Damage Only Collision,Other,3,0,0,3,2013/03/29 00:00:00+00,...,Clear,Dry,Daylight,,,23,From same direction - all others,0,0,N
4,-122.306426,47.545739,Intersection,Injury Collision,Angles,2,0,0,2,2004/01/28 00:00:00+00,...,Raining,Wet,Daylight,4028032.0,,10,Entering at angle,0,0,N


In [11]:
df.replace(r'^\s*$', np.nan, regex=True)
df.replace("Unknown", np.nan, inplace=True)
df.replace("Other", np.nan, inplace=True)

In [12]:
df["WEATHER"].value_counts()

Clear                       111135
Raining                      33145
Overcast                     27714
Snowing                        907
Fog/Smog/Smoke                 569
Sleet/Hail/Freezing Rain       113
Blowing Sand/Dirt               56
Severe Crosswind                25
Partly Cloudy                    5
Name: WEATHER, dtype: int64

In [13]:
df["ROADCOND"].value_counts()

Dry               124510
Wet                47474
Ice                 1209
Snow/Slush          1004
Standing Water       115
Sand/Mud/Dirt         75
Oil                   64
Name: ROADCOND, dtype: int64

In [14]:
df["LIGHTCOND"].value_counts()

Daylight                    116137
Dark - Street Lights On      48507
Dusk                          5902
Dawn                          2502
Dark - No Street Lights       1537
Dark - Street Lights Off      1199
Dark - Unknown Lighting         11
Name: LIGHTCOND, dtype: int64

In [15]:
df["SEVERITYDESC"].value_counts()

Property Damage Only Collision    136485
Injury Collision                   58188
Name: SEVERITYDESC, dtype: int64