# **Car Accident Warning System :: Seattle City**
---

## Introduction / Business Understanding

Everyone who commutes daily to work would know that traveling can be a stressful and time wasting activity if not planned properly. Traveling time is one of the main factors that determine how pleasant your commute to and from work would be. One of the factors that needs to be taken into account is facing a Congestion/Traffic Jam because of an accident that took place on the route you take. The aim of this project is to see if we can build a model to be able to predict the severity and/or probability of an accident taking place taking into account different environmental/traffic/geographical factors. This will be able to help both, law enforcing agencies as well as daily commuters.

* **Law enforcing agencies** stand to gain by being able to proactively avert such accidents if a certain set of conditions arrive and being able to take appropriate actions if and when it does, so that they can ensure minimum impact on traffic flow.

* **Commuters** stand to gain by being forewarned about the accidents and planning/rerouting their journey accordingly. They can also be more vigilant in certain conditions that are prone to accidents.

In the end, we all stand to gain collectively as a society as we will have less accidents, safer roads, less pollution (noise and air) due to less traffic jams and an over improvement in daily commute both in terms of time and stress.

## Data Analysis

To realize the solution to the problem at hand, we needed an appropriate data source that contains data on past incidents, the conditions they took place in and outcomes, related to traffic related accidents. We got a data source from the Government of Seattle Website (https://data.seattle.gov/Land-Base/Collisions/9kas-rb8d) that contains the latest dateset for us to analyze and build a model to be able to predict the desired results.

There are a total of **40 Variables** and **221267 Data points/Observations**. Looking at the data set, we see some columns that look useful, including 

* **LOCATION** - Description of the general location of the collision
* **SEVERITYCODE** - A code that corresponds to the severity of the collision ( 3—fatality, 2b—serious injury, 2—injury, 1—prop damage, 0—unknown)
* **SEVERITYDESC** - A detailed description of the severity of the collision
* **JUNCTIONTYPE** - Category of junction at which collision took place
* **UNDERINFL** - Whether or not a driver involved was under the influence of drugs or alcohol
* **INCDTTM** - The date and time of the incident (Time of the day might be of importance here)
* **WEATHER** - A description of the weather conditions during the time of the collision
* **ROADCOND** - The condition of the road during the collision
* **LIGHTCOND** - The condition of the road during the collision

Our Dependent/Predicted Variable will be **SEVERITYCODE** and during data processing and subsequent stages, we will go into in-depth analysis to see how each independent variable varies/is related to the dependent variable.

Note that data filtering will be needed to remove unwanted Columns/Variables and to remove and Null/Empty/Unwanted data observations. We will also need to do other data processing steps such a type casting, standardization, dummy variable creation etc.

In [12]:
import numpy as np 
import pandas as pd
#!conda install -c anaconda xlrd --yes

%matplotlib inline 

import matplotlib as mpl
import matplotlib.pyplot as plt

import pylab as pl
import numpy as np
%matplotlib inline

mpl.style.use('ggplot')

In [21]:
DataSet = pd.read_csv("http://data-seattlecitygis.opendata.arcgis.com/datasets/5b5c745e0f1f48e7a53acec63a0022ab_0.csv")

In [22]:
DataSet.head()

Unnamed: 0,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,LOCATION,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,-122.320757,47.609408,1,328476,329976,EA08706,Matched,Block,,BROADWAY BETWEEN E COLUMBIA ST AND BOYLSTON AVE,...,Wet,Dark - Street Lights On,,,,11.0,From same direction - both going straight - bo...,0,0,N
1,-122.319561,47.662221,2,328142,329642,EA06882,Matched,Block,,8TH AVE NE BETWEEN NE 45TH E ST AND NE 47TH ST,...,Dry,Daylight,,,,32.0,One parked--one moving,0,0,Y
2,-122.327525,47.604393,3,20700,20700,1181833,Unmatched,Block,,JAMES ST BETWEEN 6TH AVE AND 7TH AVE,...,,,,4030032.0,,,,0,0,N
3,-122.327525,47.708622,4,332126,333626,M16001640,Unmatched,Block,,NE NORTHGATE WAY BETWEEN 1ST AVE NE AND NE NOR...,...,,,,,,,,0,0,N
4,-122.29212,47.559009,5,328238,329738,3857118,Unmatched,Block,,M L KING JR ER WAY S BETWEEN S ANGELINE ST AND...,...,,,,,,,,0,0,N


In [23]:
DataSet.describe()

Unnamed: 0,X,Y,OBJECTID,INCKEY,COLDETKEY,INTKEY,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INJURIES,SERIOUSINJURIES,FATALITIES,SDOT_COLCODE,SDOTCOLNUM,SEGLANEKEY,CROSSWALKKEY
count,213918.0,213918.0,221389.0,221389.0,221389.0,71884.0,221389.0,221389.0,221389.0,221389.0,221389.0,221389.0,221389.0,221388.0,127205.0,221389.0,221389.0
mean,-122.330756,47.620199,110695.0,144708.701914,144936.934541,37612.330964,2.227161,0.038136,0.02735,1.731057,0.373962,0.015209,0.001685,13.383115,7971063.0,261.29632,9583.127
std,0.030055,0.056043,63909.64371,89126.729589,89501.31292,51886.084219,1.47019,0.201815,0.164508,0.829259,0.732158,0.158072,0.044701,7.301668,2611523.0,3247.953616,71483.11
min,-122.419091,47.495573,1.0,1001.0,1001.0,23807.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1007024.0,0.0,0.0
25%,-122.34928,47.577151,55348.0,71634.0,71634.0,28652.75,2.0,0.0,0.0,2.0,0.0,0.0,0.0,11.0,6007029.0,0.0,0.0
50%,-122.330363,47.616053,110695.0,127184.0,127184.0,29973.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,11.0,8033011.0,0.0,0.0
75%,-122.311998,47.66429,166042.0,209783.0,210003.0,33984.0,3.0,0.0,0.0,2.0,1.0,0.0,0.0,14.0,10181010.0,0.0,0.0
max,-122.238949,47.734142,221389.0,333843.0,335343.0,757580.0,93.0,6.0,2.0,15.0,78.0,41.0,5.0,87.0,13072020.0,525241.0,5239700.0


In [56]:
#IntData = DataSet[['LOCATION','SEVERITYCODE','WEATHER','ROADCOND','LIGHTCOND']]

IntData = DataSet[['LOCATION','SEVERITYCODE','JUNCTIONTYPE','UNDERINFL','WEATHER','ROADCOND','LIGHTCOND']]

In [46]:
IntData.head()

Unnamed: 0,LOCATION,SEVERITYCODE,JUNCTIONTYPE,UNDERINFL,WEATHER,ROADCOND,LIGHTCOND
0,BROADWAY BETWEEN E COLUMBIA ST AND BOYLSTON AVE,1,Mid-Block (not related to intersection),N,Raining,Wet,Dark - Street Lights On
1,8TH AVE NE BETWEEN NE 45TH E ST AND NE 47TH ST,1,Mid-Block (not related to intersection),N,Clear,Dry,Daylight
2,JAMES ST BETWEEN 6TH AVE AND 7TH AVE,0,Mid-Block (but intersection related),,,,
3,NE NORTHGATE WAY BETWEEN 1ST AVE NE AND NE NOR...,0,Mid-Block (not related to intersection),,,,
4,M L KING JR ER WAY S BETWEEN S ANGELINE ST AND...,0,Mid-Block (not related to intersection),,,,


In [47]:
IntData.dtypes

LOCATION        object
SEVERITYCODE    object
JUNCTIONTYPE    object
UNDERINFL       object
WEATHER         object
ROADCOND        object
LIGHTCOND       object
dtype: object

In [45]:
#IntData.LOCATION.value_counts()

In [31]:
IntData.SEVERITYCODE.value_counts()

1     137596
2      58747
0      21594
2b      3102
3        349
Name: SEVERITYCODE, dtype: int64

In [32]:
IntData.JUNCTIONTYPE.value_counts()

Mid-Block (not related to intersection)              101632
At Intersection (intersection related)                69178
Mid-Block (but intersection related)                  24405
Driveway Junction                                     11496
At Intersection (but not related to intersection)      2495
Ramp Junction                                           190
Unknown                                                  21
Name: JUNCTIONTYPE, dtype: int64

In [33]:
IntData.UNDERINFL.value_counts()

N    103874
0     81676
Y      5399
1      4230
Name: UNDERINFL, dtype: int64

In [35]:
IntData.WEATHER.value_counts()

Clear                       114694
Raining                      34036
Overcast                     28543
Unknown                      15131
Snowing                        919
Other                          860
Fog/Smog/Smoke                 577
Sleet/Hail/Freezing Rain       116
Blowing Sand/Dirt               56
Severe Crosswind                26
Partly Cloudy                   10
Blowing Snow                     1
Name: WEATHER, dtype: int64

In [36]:
IntData.ROADCOND.value_counts()

Dry               128535
Wet                48734
Unknown            15139
Ice                 1232
Snow/Slush          1014
Other                136
Standing Water       119
Sand/Mud/Dirt         77
Oil                   64
Name: ROADCOND, dtype: int64

In [37]:
IntData.LIGHTCOND.value_counts()

Daylight                    119448
Dark - Street Lights On      50125
Unknown                      13532
Dusk                          6082
Dawn                          2608
Dark - No Street Lights       1579
Dark - Street Lights Off      1239
Other                          244
Dark - Unknown Lighting         23
Name: LIGHTCOND, dtype: int64

In [57]:
IntData.isnull().sum()

LOCATION         4588
SEVERITYCODE        1
JUNCTIONTYPE    11972
UNDERINFL       26210
WEATHER         26420
ROADCOND        26339
LIGHTCOND       26509
dtype: int64

In [58]:
IntData.shape

(221389, 7)

In [59]:
IntData.dropna(axis=0, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [60]:
IntData.shape

(187982, 7)

In [61]:
IntData.isnull().sum()

LOCATION        0
SEVERITYCODE    0
JUNCTIONTYPE    0
UNDERINFL       0
WEATHER         0
ROADCOND        0
LIGHTCOND       0
dtype: int64