# Predict The Primary Contributory Cause of Car Accidents

## Business Understanding

This project aims to support the Vehicle Safety Board by leveraging data from the Chicago Police Department's E-Crash system to identify patterns and primary contributory causes of traffic accidents. Through predictive modeling and data analysis, the project seeks to generate actionable insights to enable data-driven interventions to enhance road safety, optimize resource allocation, and inform policy decisions.

The transportation sector plays a pivotal role in any country's social and economic growth. However, the increasing number of vehicles on the roads has heightened the challenges of traffic management and safety. By efficiently analyzing traffic crash data, this project will contribute to significant advancements in road safety measures, public awareness, strategic policy-making, and effective resource deployment, ultimately fostering safer streets and proactive accident prevention.

##### Problem Statement

Traffic accidents in Chicago remain a significant public health and economic challenge, despite advancements in vehicle safety technology, traffic management systems, and the availability of data from the Chicago Police Department’s E-Crash system. The lack of detailed analysis and predictive capabilities hinders effective mitigation efforts and limits understanding of the primary contributory causes. This project aims to develop a predictive model to identify key factors driving traffic crashes, uncover actionable insights, and enable data-driven strategies to enhance road safety. By supporting the Vehicle Safety Board, the project seeks to reduce accidents, optimize resource allocation, and make Chicago’s roads safer for everyone.


##### Objectives

1. Predict the primary contributory cause of a car accident.
2. Identify high-risk locations for traffic accidents to prioritize interventions like increased patrols, signage, or infrastructure improvements.
3. Analyze the role of driver behaviors, such as speeding or distracted driving, to guide public awareness campaigns and enforcement strategies
4. Understand how the timing of accidents impacts crash severity to optimize resource deployment during high-risk periods.

##### Research Questions

1. What are the most common primary contributory causes of traffic accidents?
2. Which locations in Chicago experience the highest frequency of traffic accidents, and what patterns contribute to their high risk?
3. How do driver behaviors, such as speeding or distracted driving, influence the likelihood and severity of traffic accidents?
4. How does the timing of accidents (e.g., time of day, day of the week, or season) affect crash severity, and what insights can be drawn to optimize resource deployment?


##### Stakeholders

- Government and Public Agencies i.e Vehicle Safety Board
- Policymakers and Urban Planners
- General Public and Road Users
- Public Health and Safety Organizations

## Data Understanding

##### Importing Libraries

In [62]:
# Import necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [64]:
# Import the dataset to be used and set the option to display all columns

df = pd.read_csv('Traffic_Crashes_-_Crashes.csv')
pd.set_option('display.max_columns', None)

In [65]:
# Inspect the data

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 794956 entries, 0 to 794955
Data columns (total 48 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   CRASH_RECORD_ID                794956 non-null  object 
 1   CRASH_DATE_EST_I               59606 non-null   object 
 2   CRASH_DATE                     794956 non-null  object 
 3   POSTED_SPEED_LIMIT             794956 non-null  int64  
 4   TRAFFIC_CONTROL_DEVICE         794956 non-null  object 
 5   DEVICE_CONDITION               794956 non-null  object 
 6   WEATHER_CONDITION              794956 non-null  object 
 7   LIGHTING_CONDITION             794956 non-null  object 
 8   FIRST_CRASH_TYPE               794956 non-null  object 
 9   TRAFFICWAY_TYPE                794956 non-null  object 
 10  LANE_CNT                       199006 non-null  float64
 11  ALIGNMENT                      794956 non-null  object 
 12  ROADWAY_SURFACE_COND          

In [66]:
# Inspect the stats for numerical columns

df.describe()

Unnamed: 0,POSTED_SPEED_LIMIT,LANE_CNT,STREET_NO,BEAT_OF_OCCURRENCE,NUM_UNITS,INJURIES_TOTAL,INJURIES_FATAL,INJURIES_INCAPACITATING,INJURIES_NON_INCAPACITATING,INJURIES_REPORTED_NOT_EVIDENT,INJURIES_NO_INDICATION,INJURIES_UNKNOWN,CRASH_HOUR,CRASH_DAY_OF_WEEK,CRASH_MONTH,LATITUDE,LONGITUDE
count,794956.0,199006.0,794956.0,794951.0,794956.0,793215.0,793215.0,793215.0,793215.0,793215.0,793215.0,793215.0,794956.0,794956.0,794956.0,789567.0,789567.0
mean,28.402547,13.33032,3688.085435,1242.916397,2.034919,0.189448,0.001195,0.020032,0.107025,0.061195,2.004313,0.0,13.2028,4.122344,6.787978,41.854788,-87.67345
std,6.179272,2961.623,2887.990566,705.268208,0.452475,0.565654,0.037456,0.165673,0.422095,0.318451,1.158715,0.0,5.568727,1.980411,3.405398,0.337523,0.686153
min,0.0,0.0,0.0,111.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,-87.936193
25%,30.0,2.0,1248.0,714.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,9.0,2.0,4.0,41.782429,-87.721709
50%,30.0,2.0,3201.0,1211.0,2.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,14.0,4.0,7.0,41.874743,-87.674094
75%,30.0,4.0,5600.0,1822.0,2.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,17.0,6.0,10.0,41.924336,-87.6333
max,99.0,1191625.0,451100.0,6100.0,18.0,21.0,4.0,10.0,21.0,15.0,61.0,0.0,23.0,7.0,12.0,42.02278,0.0


In [67]:
# Preview the first 5 rows

df.head()

Unnamed: 0,CRASH_RECORD_ID,CRASH_DATE_EST_I,CRASH_DATE,POSTED_SPEED_LIMIT,TRAFFIC_CONTROL_DEVICE,DEVICE_CONDITION,WEATHER_CONDITION,LIGHTING_CONDITION,FIRST_CRASH_TYPE,TRAFFICWAY_TYPE,LANE_CNT,ALIGNMENT,ROADWAY_SURFACE_COND,ROAD_DEFECT,REPORT_TYPE,CRASH_TYPE,INTERSECTION_RELATED_I,NOT_RIGHT_OF_WAY_I,HIT_AND_RUN_I,DAMAGE,DATE_POLICE_NOTIFIED,PRIM_CONTRIBUTORY_CAUSE,SEC_CONTRIBUTORY_CAUSE,STREET_NO,STREET_DIRECTION,STREET_NAME,BEAT_OF_OCCURRENCE,PHOTOS_TAKEN_I,STATEMENTS_TAKEN_I,DOORING_I,WORK_ZONE_I,WORK_ZONE_TYPE,WORKERS_PRESENT_I,NUM_UNITS,MOST_SEVERE_INJURY,INJURIES_TOTAL,INJURIES_FATAL,INJURIES_INCAPACITATING,INJURIES_NON_INCAPACITATING,INJURIES_REPORTED_NOT_EVIDENT,INJURIES_NO_INDICATION,INJURIES_UNKNOWN,CRASH_HOUR,CRASH_DAY_OF_WEEK,CRASH_MONTH,LATITUDE,LONGITUDE,LOCATION
0,23a79931ef555d54118f64dc9be2cf2dbf59636ce253f7...,,09/05/2023 07:05:00 PM,30,TRAFFIC SIGNAL,FUNCTIONING PROPERLY,CLEAR,DUSK,ANGLE,"FIVE POINT, OR MORE",,STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,INJURY AND / OR TOW DUE TO CRASH,Y,,,"OVER $1,500",09/05/2023 07:05:00 PM,UNABLE TO DETERMINE,NOT APPLICABLE,5500,S,WENTWORTH AVE,225.0,,,,,,,2,INCAPACITATING INJURY,3.0,0.0,1.0,2.0,0.0,2.0,0.0,19,3,9,,,
1,2675c13fd0f474d730a5b780968b3cafc7c12d7adb661f...,,09/22/2023 06:45:00 PM,50,NO CONTROLS,NO CONTROLS,CLEAR,"DARKNESS, LIGHTED ROAD",REAR END,DIVIDED - W/MEDIAN BARRIER,,STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,NO INJURY / DRIVE AWAY,,,,"OVER $1,500",09/22/2023 06:50:00 PM,FOLLOWING TOO CLOSELY,FOLLOWING TOO CLOSELY,7900,S,CHICAGO SKYWAY OB,411.0,,,,,,,2,NO INDICATION OF INJURY,0.0,0.0,0.0,0.0,0.0,2.0,0.0,18,6,9,,,
2,5f54a59fcb087b12ae5b1acff96a3caf4f2d37e79f8db4...,,07/29/2023 02:45:00 PM,30,TRAFFIC SIGNAL,FUNCTIONING PROPERLY,CLEAR,DAYLIGHT,PARKED MOTOR VEHICLE,DIVIDED - W/MEDIAN (NOT RAISED),,STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,NO INJURY / DRIVE AWAY,,,Y,"OVER $1,500",07/29/2023 02:45:00 PM,FAILING TO REDUCE SPEED TO AVOID CRASH,"OPERATING VEHICLE IN ERRATIC, RECKLESS, CARELE...",2101,S,ASHLAND AVE,1235.0,,,,,,,4,NO INDICATION OF INJURY,0.0,0.0,0.0,0.0,0.0,1.0,0.0,14,7,7,41.85412,-87.665902,POINT (-87.665902342962 41.854120262952)
3,7ebf015016f83d09b321afd671a836d6b148330535d5df...,,08/09/2023 11:00:00 PM,30,NO CONTROLS,NO CONTROLS,CLEAR,"DARKNESS, LIGHTED ROAD",SIDESWIPE SAME DIRECTION,NOT DIVIDED,,STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,NO INJURY / DRIVE AWAY,,,,"OVER $1,500",08/09/2023 11:40:00 PM,FAILING TO YIELD RIGHT-OF-WAY,NOT APPLICABLE,10020,W,BALMORAL AVE,1650.0,,,,,,,2,NO INDICATION OF INJURY,0.0,0.0,0.0,0.0,0.0,2.0,0.0,23,4,8,,,
4,6c1659069e9c6285a650e70d6f9b574ed5f64c12888479...,,08/18/2023 12:50:00 PM,15,OTHER,FUNCTIONING PROPERLY,CLEAR,DAYLIGHT,REAR END,OTHER,,STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,INJURY AND / OR TOW DUE TO CRASH,,,,"OVER $1,500",08/18/2023 12:55:00 PM,FOLLOWING TOO CLOSELY,DISTRACTION - FROM INSIDE VEHICLE,700,W,OHARE ST,1654.0,,,,,,,2,NONINCAPACITATING INJURY,1.0,0.0,0.0,1.0,0.0,1.0,0.0,12,6,8,,,


In [68]:
# Drop columns with too many missing values & irrelevant info
df.drop(columns=['LOCATION', 'CRASH_RECORD_ID', 'STREET_NO', 'BEAT_OF_OCCURRENCE', 'INJURIES_INCAPACITATING', 
                 'INJURIES_NON_INCAPACITATING', 'INJURIES_REPORTED_NOT_EVIDENT', 'INJURIES_NO_INDICATION', 
                 'INJURIES_UNKNOWN', 'CRASH_DATE_EST_I', 'LANE_CNT', 'INTERSECTION_RELATED_I', 'NOT_RIGHT_OF_WAY_I', 
                 'HIT_AND_RUN_I', 'PHOTOS_TAKEN_I', 'STATEMENTS_TAKEN_I', 'DOORING_I', 'WORK_ZONE_I', 'WORK_ZONE_TYPE', 
                 'WORKERS_PRESENT_I'])

Unnamed: 0,CRASH_DATE,POSTED_SPEED_LIMIT,TRAFFIC_CONTROL_DEVICE,DEVICE_CONDITION,WEATHER_CONDITION,LIGHTING_CONDITION,FIRST_CRASH_TYPE,TRAFFICWAY_TYPE,ALIGNMENT,ROADWAY_SURFACE_COND,ROAD_DEFECT,REPORT_TYPE,CRASH_TYPE,DAMAGE,DATE_POLICE_NOTIFIED,PRIM_CONTRIBUTORY_CAUSE,SEC_CONTRIBUTORY_CAUSE,STREET_DIRECTION,STREET_NAME,NUM_UNITS,MOST_SEVERE_INJURY,INJURIES_TOTAL,INJURIES_FATAL,CRASH_HOUR,CRASH_DAY_OF_WEEK,CRASH_MONTH,LATITUDE,LONGITUDE
0,09/05/2023 07:05:00 PM,30,TRAFFIC SIGNAL,FUNCTIONING PROPERLY,CLEAR,DUSK,ANGLE,"FIVE POINT, OR MORE",STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,INJURY AND / OR TOW DUE TO CRASH,"OVER $1,500",09/05/2023 07:05:00 PM,UNABLE TO DETERMINE,NOT APPLICABLE,S,WENTWORTH AVE,2,INCAPACITATING INJURY,3.0,0.0,19,3,9,,
1,09/22/2023 06:45:00 PM,50,NO CONTROLS,NO CONTROLS,CLEAR,"DARKNESS, LIGHTED ROAD",REAR END,DIVIDED - W/MEDIAN BARRIER,STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,NO INJURY / DRIVE AWAY,"OVER $1,500",09/22/2023 06:50:00 PM,FOLLOWING TOO CLOSELY,FOLLOWING TOO CLOSELY,S,CHICAGO SKYWAY OB,2,NO INDICATION OF INJURY,0.0,0.0,18,6,9,,
2,07/29/2023 02:45:00 PM,30,TRAFFIC SIGNAL,FUNCTIONING PROPERLY,CLEAR,DAYLIGHT,PARKED MOTOR VEHICLE,DIVIDED - W/MEDIAN (NOT RAISED),STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,NO INJURY / DRIVE AWAY,"OVER $1,500",07/29/2023 02:45:00 PM,FAILING TO REDUCE SPEED TO AVOID CRASH,"OPERATING VEHICLE IN ERRATIC, RECKLESS, CARELE...",S,ASHLAND AVE,4,NO INDICATION OF INJURY,0.0,0.0,14,7,7,41.854120,-87.665902
3,08/09/2023 11:00:00 PM,30,NO CONTROLS,NO CONTROLS,CLEAR,"DARKNESS, LIGHTED ROAD",SIDESWIPE SAME DIRECTION,NOT DIVIDED,STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,NO INJURY / DRIVE AWAY,"OVER $1,500",08/09/2023 11:40:00 PM,FAILING TO YIELD RIGHT-OF-WAY,NOT APPLICABLE,W,BALMORAL AVE,2,NO INDICATION OF INJURY,0.0,0.0,23,4,8,,
4,08/18/2023 12:50:00 PM,15,OTHER,FUNCTIONING PROPERLY,CLEAR,DAYLIGHT,REAR END,OTHER,STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,INJURY AND / OR TOW DUE TO CRASH,"OVER $1,500",08/18/2023 12:55:00 PM,FOLLOWING TOO CLOSELY,DISTRACTION - FROM INSIDE VEHICLE,W,OHARE ST,2,NONINCAPACITATING INJURY,1.0,0.0,12,6,8,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
794951,11/06/2023 07:00:00 AM,30,NO CONTROLS,NO CONTROLS,CLEAR,DAYLIGHT,PARKED MOTOR VEHICLE,ONE-WAY,STRAIGHT AND LEVEL,DRY,NO DEFECTS,NOT ON SCENE (DESK REPORT),NO INJURY / DRIVE AWAY,"$501 - $1,500",11/06/2023 02:12:00 PM,IMPROPER BACKING,UNABLE TO DETERMINE,W,BELDEN AVE,2,NO INDICATION OF INJURY,0.0,0.0,7,2,11,41.923618,-87.656457
794952,10/31/2023 06:30:00 PM,30,NO CONTROLS,NO CONTROLS,UNKNOWN,UNKNOWN,SIDESWIPE SAME DIRECTION,OTHER,STRAIGHT AND LEVEL,UNKNOWN,UNKNOWN,NOT ON SCENE (DESK REPORT),NO INJURY / DRIVE AWAY,"$501 - $1,500",11/07/2023 07:14:00 PM,UNABLE TO DETERMINE,NOT APPLICABLE,S,STONY ISLAND AVE,2,NO INDICATION OF INJURY,0.0,0.0,18,3,10,41.722257,-87.585276
794953,11/04/2023 11:00:00 PM,30,NO CONTROLS,NO CONTROLS,CLEAR,DAYLIGHT,PARKED MOTOR VEHICLE,ONE-WAY,STRAIGHT AND LEVEL,DRY,NO DEFECTS,NOT ON SCENE (DESK REPORT),NO INJURY / DRIVE AWAY,"OVER $1,500",11/07/2023 04:50:00 PM,IMPROPER BACKING,IMPROPER BACKING,E,104TH ST,3,NO INDICATION OF INJURY,0.0,0.0,23,7,11,41.705477,-87.621737
794954,11/06/2023 06:35:00 PM,30,NO CONTROLS,NO CONTROLS,CLEAR,"DARKNESS, LIGHTED ROAD",TURNING,NOT DIVIDED,STRAIGHT AND LEVEL,DRY,NO DEFECTS,ON SCENE,INJURY AND / OR TOW DUE TO CRASH,"OVER $1,500",11/06/2023 06:40:00 PM,UNABLE TO DETERMINE,NOT APPLICABLE,W,63RD ST,2,NO INDICATION OF INJURY,0.0,0.0,18,2,11,41.778165,-87.750264
