# Deciphering Road Accidents: A Deep Dive into Hidden Patterns and Risk Factors


## Business Understanding


### Problem statement
Our primary objective is to uncover hidden patterns and risk factors contributing to road accidents beyond commonly recognized factors like speeding and impaired driving. By identifying these additional risk elements, we aim to enhance road safety strategies, improve accident prevention measures, and reduce the severity of road accidents.
### Why this Topic
Road accidents are a major concern worldwide, causing injuries and deaths that have a profound impact on families and communities. In Kenya, road accidents remain a significant public health issue, with thousands of people losing their lives each year due to road-related incidents. Our dataset is Not specific to Kenya.  According to the World Health Organization (WHO), road traffic injuries are among the top 10 leading causes of death worldwiede, highlighting the urgent need for effective prevention strategies. By studying the reasons behind these accidents and identifying hidden patterns and risk factors, we can develop targeted interventions to improve road safety and reduce the number of accidents, injuries, and fatalities on roads worldwide.

### Domain
This project applies to the transportation and public safety domain, with potential implications for government agencies, transportation companies, and advocacy groups.
### Target Audience
Government policymakers, transportation experts, safety agencies, and the general public will benefit from insights and recommendations generated by this analysis.
### Real-World Impact
Implementation of findings could lead to a reduction in road accidents, injuries, fatalities, and associated economic costs on a global scale.
### Pre-existing Projects
Our project aims to uncover additional hidden factors and patterns using advanced data analysis techniques, complementing existing research on road accident causes and prevention strategies.

## Objectives

1. To identify Key Predictive Factors: Investigate the influence of various roadway and environmental factors (e.g., road surface, lighting conditions, weather) on crash occurrence and severity to identify key predictive attributes.
2. To assess Driver Behavior Impact: Analyze the correlation between driver behavior attributes (e.g., speed, alcohol/drug involvement) and crash severity to understand the role of driver actions in accidents.
3. To develop Early Warning Systems: Create predictive models to anticipate high-risk areas and times for accidents, enabling proactive measures such as enhanced law enforcement or targeted road safety campaigns.
4. To evaluate Infrastructure Vulnerability: Assess the impact of road infrastructure features (e.g., intersections, road markings) on crash frequency and severity to prioritize infrastructure improvements for accident prevention.
5. To optimize Resource Allocation: Utilize predictive analytics to optimize resource allocation for emergency response and medical services by forecasting the likelihood and severity of future accidents in specific regions or road segments.

### Data Understanding
* Data Collection:  Comprehensive road accident data will be collected from international sources, including government databases, police reports, healthcare records, and insurance databases.
* Source of Data: The dataset utilized for this analysis was obtained from the official New Zealand Government data website. It encompasses comprehensive information regarding road accidents spanning from the year 2000 through April 2024.

# Exploratory Data Analysis

### Understanding the data

In [3]:
import pandas as pd


Unnamed: 0,X,Y,OBJECTID,advisorySpeed,areaUnitID,bicycle,bridge,bus,carStationWagon,cliffBank,...,train,tree,truck,unknownVehicleType,urban,vanOrUtility,vehicle,waterRiver,weatherA,weatherB
0,178.03184,-38.669793,1,,544801.0,0.0,,0.0,2.0,,...,,,0.0,0.0,Urban,0.0,,,Fine,Null
1,175.264695,-37.785862,2,,528900.0,0.0,,0.0,2.0,,...,,,0.0,0.0,Urban,0.0,,,Fine,Null
2,174.751715,-36.708328,3,,507000.0,0.0,,0.0,0.0,,...,,,0.0,0.0,Urban,1.0,,,Fine,Null
3,172.394398,-43.609495,4,,597513.0,0.0,0.0,0.0,2.0,0.0,...,0.0,0.0,0.0,0.0,Open,0.0,0.0,0.0,Fine,Null
4,168.385299,-46.417826,5,,611500.0,0.0,,0.0,1.0,,...,,,0.0,0.0,Urban,1.0,,,Fine,Null


In [12]:
#Loading the dataset
df = pd.read_csv("data.csv")


In [13]:
#Checking out the first 10 rows
df.head(10)

Unnamed: 0,X,Y,OBJECTID,advisorySpeed,areaUnitID,bicycle,bridge,bus,carStationWagon,cliffBank,...,train,tree,truck,unknownVehicleType,urban,vanOrUtility,vehicle,waterRiver,weatherA,weatherB
0,178.03184,-38.669793,1,,544801.0,0.0,,0.0,2.0,,...,,,0.0,0.0,Urban,0.0,,,Fine,Null
1,175.264695,-37.785862,2,,528900.0,0.0,,0.0,2.0,,...,,,0.0,0.0,Urban,0.0,,,Fine,Null
2,174.751715,-36.708328,3,,507000.0,0.0,,0.0,0.0,,...,,,0.0,0.0,Urban,1.0,,,Fine,Null
3,172.394398,-43.609495,4,,597513.0,0.0,0.0,0.0,2.0,0.0,...,0.0,0.0,0.0,0.0,Open,0.0,0.0,0.0,Fine,Null
4,168.385299,-46.417826,5,,611500.0,0.0,,0.0,1.0,,...,,,0.0,0.0,Urban,1.0,,,Fine,Null
5,169.739754,-46.262517,6,30.0,607300.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,Open,0.0,0.0,0.0,Fine,Null
6,174.982435,-41.159414,7,,568101.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,Open,0.0,0.0,0.0,Light rain,Null
7,174.937477,-37.062911,8,,525420.0,0.0,,0.0,1.0,,...,,,1.0,0.0,Urban,0.0,,,Fine,Null
8,175.235256,-37.751316,9,,528403.0,0.0,,0.0,2.0,,...,,,0.0,0.0,Open,0.0,,,Fine,Null
9,174.874776,-36.964072,10,,523601.0,0.0,0.0,0.0,2.0,0.0,...,0.0,0.0,0.0,0.0,Urban,0.0,1.0,0.0,Fine,Null


In [15]:
#Checking the data shape
df.shape

(821744, 72)

In [16]:
# Checking out the data types in the columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 821744 entries, 0 to 821743
Data columns (total 72 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   X                          821744 non-null  float64
 1   Y                          821744 non-null  float64
 2   OBJECTID                   821744 non-null  int64  
 3   advisorySpeed              31344 non-null   float64
 4   areaUnitID                 821647 non-null  float64
 5   bicycle                    821739 non-null  float64
 6   bridge                     332913 non-null  float64
 7   bus                        821739 non-null  float64
 8   carStationWagon            821739 non-null  float64
 9   cliffBank                  332913 non-null  float64
 10  crashDirectionDescription  821744 non-null  object 
 11  crashFinancialYear         821744 non-null  object 
 12  crashLocation1             821744 non-null  object 
 13  crashLocation2             82

**Observations**
* Some columns have no data at all
* Some rows have missing data
* Some rows have the wrong data type

In [7]:
#Summary statistics
df.describe()

Unnamed: 0,X,Y,OBJECTID,advisorySpeed,areaUnitID,bicycle,bridge,bus,carStationWagon,cliffBank,...,tlaId,trafficIsland,trafficSign,train,tree,truck,unknownVehicleType,vanOrUtility,vehicle,waterRiver
count,821744.0,821744.0,821744.0,31344.0,821647.0,821739.0,332913.0,821739.0,821739.0,332913.0,...,818556.0,332913.0,332913.0,332913.0,332913.0,821739.0,821739.0,821739.0,332913.0,332913.0
mean,174.268497,-39.324055,654113.2,54.437851,546241.601791,0.028963,0.013724,0.01587,1.311054,0.106319,...,52.4099,0.028815,0.048709,0.001511,0.101555,0.080399,0.003057,0.175788,0.025046,0.009967
std,5.565359,2.970823,379702.4,18.175564,32537.949166,0.171136,0.117603,0.126396,0.78449,0.309588,...,24.000807,0.168825,0.21697,0.038841,0.305681,0.283694,0.056963,0.410052,0.158098,0.099696
min,-176.760762,-46.904849,1.0,15.0,500100.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,174.198951,-41.231382,329628.8,40.0,519400.0,0.0,0.0,0.0,1.0,0.0,...,31.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,174.784685,-37.889268,655346.5,55.0,536642.0,0.0,0.0,0.0,1.0,0.0,...,60.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,175.229901,-36.90811,988178.2,65.0,573523.0,0.0,0.0,0.0,2.0,0.0,...,76.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,178.544357,-34.430214,1318963.0,95.0,626801.0,5.0,4.0,3.0,11.0,3.0,...,76.0,4.0,4.0,1.0,3.0,5.0,3.0,6.0,4.0,2.0


### Data cleaning

# Modelling

All modelling will be done here

# Conclusion

Conclusion will be on this part