<h1>DATA ANALYTICS OF UK ROAD ACCIDENT</h1>

<h3>Analysis: Carla Mae Biñas</h3>

In [1]:
import numpy as np 
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
from scipy.stats import f_oneway

<h3>Importing Dataset as DataFrame</h3>

In [2]:
accident= pd.read_csv('datasets\\uk_road_accident.csv')

In [3]:
accident

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type
0,200701BS64157,Serious,5/6/2019,51.506187,Darkness - lights lit,Kensington and Chelsea,-0.209082,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car
1,200701BS65737,Serious,2/7/2019,51.495029,Daylight,Kensington and Chelsea,-0.173647,1,2,Wet or damp,Single carriageway,Urban,Raining no high winds,Car
2,200701BS66127,Serious,26-08-2019,51.517715,Darkness - lighting unknown,Kensington and Chelsea,-0.210215,1,3,Dry,,Urban,,Taxi/Private hire car
3,200701BS66128,Serious,16-08-2019,51.495478,Daylight,Kensington and Chelsea,-0.202731,1,4,Dry,Single carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats)
4,200701BS66837,Slight,3/9/2019,51.488576,Darkness - lights lit,Kensington and Chelsea,-0.192487,1,2,Dry,,Urban,,Other vehicle
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660674,201091NM01760,Slight,18-02-2022,57.374005,Daylight,Highland,-3.467828,2,1,Dry,Single carriageway,Rural,Fine no high winds,Car
660675,201091NM01881,Slight,21-02-2022,57.232273,Darkness - no lighting,Highland,-3.809281,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660676,201091NM01935,Slight,23-02-2022,57.585044,Daylight,Highland,-3.862727,1,3,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660677,201091NM01964,Serious,23-02-2022,57.214898,Darkness - no lighting,Highland,-3.823997,1,2,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc


<h3>Using the describe </h3>

In [4]:
accident.describe()

Unnamed: 0,Latitude,Longitude,Number_of_Casualties,Number_of_Vehicles
count,660654.0,660653.0,660679.0,660679.0
mean,52.553866,-1.43121,1.35704,1.831255
std,1.406922,1.38333,0.824847,0.715269
min,49.91443,-7.516225,1.0,1.0
25%,51.49069,-2.332291,1.0,1.0
50%,52.315641,-1.411667,1.0,2.0
75%,53.453452,-0.232869,1.0,2.0
max,60.757544,1.76201,68.0,32.0


<h3>Cleaning the Null Values</h3>

In [5]:
accident.isnull().sum()

Index                          0
Accident_Severity              0
Accident Date                  0
Latitude                      25
Light_Conditions               0
District Area                  0
Longitude                     26
Number_of_Casualties           0
Number_of_Vehicles             0
Road_Surface_Conditions      726
Road_Type                   4520
Urban_or_Rural_Area           15
Weather_Conditions         14128
Vehicle_Type                   0
dtype: int64

In [6]:
accident['Latitude'].mean()

np.float64(52.553865761110956)

In [7]:
accident['Longitude'].mean()

np.float64(-1.431210368502073)

In [8]:
accident['Urban_or_Rural_Area'].mode()

0    Urban
Name: Urban_or_Rural_Area, dtype: object

In [9]:
accident['Latitude']=accident['Latitude'].fillna(accident['Latitude']).mean()
accident['Longitude']=accident['Longitude'].fillna(accident['Longitude']).mean()
accident['Urban_or_Rural_Area']=accident['Urban_or_Rural_Area'].fillna(accident['Urban_or_Rural_Area']).mode()[0]

In [10]:
accident.isnull().sum()

Index                          0
Accident_Severity              0
Accident Date                  0
Latitude                       0
Light_Conditions               0
District Area                  0
Longitude                      0
Number_of_Casualties           0
Number_of_Vehicles             0
Road_Surface_Conditions      726
Road_Type                   4520
Urban_or_Rural_Area            0
Weather_Conditions         14128
Vehicle_Type                   0
dtype: int64

<h3>Making the big null values unaccounted</h3>

In [11]:
accident["Road_Surface_Conditions"] = accident["Road_Surface_Conditions"].fillna("Unaccounted")
accident["Road_Type"] = accident["Road_Type"].fillna("Unaccounted")
accident["Weather_Conditions"] = accident["Weather_Conditions"].fillna("Unaccounted")

In [12]:
accident.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

<h3>Changing the DataType</h3>

In [13]:
accident.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660679 entries, 0 to 660678
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   Index                    660679 non-null  object 
 1   Accident_Severity        660679 non-null  object 
 2   Accident Date            660679 non-null  object 
 3   Latitude                 660679 non-null  float64
 4   Light_Conditions         660679 non-null  object 
 5   District Area            660679 non-null  object 
 6   Longitude                660679 non-null  float64
 7   Number_of_Casualties     660679 non-null  int64  
 8   Number_of_Vehicles       660679 non-null  int64  
 9   Road_Surface_Conditions  660679 non-null  object 
 10  Road_Type                660679 non-null  object 
 11  Urban_or_Rural_Area      660679 non-null  object 
 12  Weather_Conditions       660679 non-null  object 
 13  Vehicle_Type             660679 non-null  object 
dtypes: f

In [14]:
accident['Index']=accident['Index'].astype('category')
accident['Accident_Severity']=accident['Accident_Severity'].astype('category')
accident['Accident Date']=accident['Accident Date'].astype('datetime64[ns]')
accident['Light_Conditions']=accident['Light_Conditions'].astype('category')
accident['District Area']=accident['District Area'].astype('category')
accident['Number_of_Casualties']=accident['Number_of_Casualties'].astype('category')
accident['Number_of_Vehicles']=accident['Number_of_Vehicles'].astype('category')
accident['Urban_or_Rural_Area']=accident['Urban_or_Rural_Area'].astype('category')
accident['Vehicle_Type']=accident['Vehicle_Type'].astype('category')
accident["Road_Surface_Conditions"] = accident["Road_Surface_Conditions"].astype("category")
accident["Road_Type"] = accident["Road_Type"].astype("category")
accident["Weather_Conditions"] = accident["Weather_Conditions"].astype("category")

<h3>Adding another column of the Date.</h3>

In [15]:
accident['Accident_Year']=accident['Accident Date'].dt.year
accident['Accident_Month']=accident['Accident Date'].dt.month_name
accident['Accident_Day']=accident['Accident Date'].dt.day
accident['Day_of_Week']=accident['Accident Date'].dt.dayofweek

In [16]:
accident.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660679 entries, 0 to 660678
Data columns (total 18 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   Index                    660679 non-null  category      
 1   Accident_Severity        660679 non-null  category      
 2   Accident Date            660679 non-null  datetime64[ns]
 3   Latitude                 660679 non-null  float64       
 4   Light_Conditions         660679 non-null  category      
 5   District Area            660679 non-null  category      
 6   Longitude                660679 non-null  float64       
 7   Number_of_Casualties     660679 non-null  category      
 8   Number_of_Vehicles       660679 non-null  category      
 9   Road_Surface_Conditions  660679 non-null  category      
 10  Road_Type                660679 non-null  category      
 11  Urban_or_Rural_Area      660679 non-null  category      
 12  Weather_Conditio

<h1>DATA ANALYSIS</h1>

<h3>Univariate</h3>

<h2>1. What are the most common vehicles involved in accidents? And what is the total number of accident of the vehicle.</h2>

In [17]:
accident['Vehicle_Type'].value_counts()

Vehicle_Type
Car                                      497992
Van / Goods 3.5 tonnes mgw or under       34160
Bus or coach (17 or more pass seats)      25878
Motorcycle over 500cc                     25657
Goods 7.5 tonnes mgw and over             17307
Motorcycle 125cc and under                15269
Taxi/Private hire car                     13294
Motorcycle over 125cc and up to 500cc      7656
Motorcycle 50cc and under                  7603
Goods over 3.5t. and under 7.5t            6096
Other vehicle                              5637
Minibus (8 - 16 passenger seats)           1976
Agricultural vehicle                       1947
Pedal cycle                                 197
Data missing or out of range                  6
Ridden horse                                  4
Name: count, dtype: int64

<h3>Insight#1: Most road accidents involve cars. With the total 497,992 cases, cars are the most common vehicle type in accidents compared to all others. </h3>
<h3>Insight#2: While car are the most common vehicle that got in accident with 75% of the total accidents. Heavy vehicles like vans, buses, and goods trucks are involved in around 12–13% of accidents. While not as common as cars, they still represent a significant portion of total accidents.</h3>

<h2>2. How many accidents fall under each severity level? Compare each of them to each other. </h2>

In [18]:
accident ['Accident_Severity'].value_counts()

Accident_Severity
Slight     563801
Serious     88217
Fatal        8661
Name: count, dtype: int64

<h3>Insight#3: Most accidents are slight for over 85%, but with over 13% being serious, this shows that while many crashes are minor, a large number still cause injuries that need medical attention. </h3>
<h3>Insight#4: Only about 1% of accidents are fatal, but because the total number of accidents is large which is over 660,679. Even this small percentage represents thousands of life lost showing why road safety measures is important.</h3>

<h2>3. How many accidents happened under each light condition? Are accidents more common during the day or at night? </h2>

In [19]:
accident['Light_Conditions'].value_counts()

Light_Conditions
Daylight                       484880
Darkness - lights lit          129335
Darkness - no lighting          37437
Darkness - lighting unknown      6484
Darkness - lights unlit          2543
Name: count, dtype: int64

<h3>Insight#5: Most accidents happen during daylight for about 73%, which shows that even with daylight accident can happen. Heavy traffic and busier roads and etc. can lead to more crashes even in daytime. </h3>
<h3>Insight#6: Accidents in darkness still make up about 27% of the total, proving that poor visibility and lighting conditions remain a big risk for road safety. </h3>
<h3>Insight#7: Though fewer accidents happen at night, they may be more dangerous, since limited visibility and unlit roads can increase the chance of severe outcomes. </h3>

<h2>4. Which district area has the highest number of accidents? </h2>

In [20]:
accident['District Area'].value_counts()

District Area
Birmingham            13491
Leeds                  8898
Manchester             6720
Bradford               6212
Sheffield              5710
                      ...  
Berwick-upon-Tweed      153
Teesdale                142
Shetland Islands        133
Orkney Islands          117
Clackmannanshire         91
Name: count, Length: 422, dtype: int64

<h3>Insight#8: Birmingham has 32.9% of accidents among the top 5 cities, far ahead of Leeds which is 21.7% and Manchester which is about 16.4%. This shows that larger urban centers face greater accident risks. </h3>
<h3>Insight#9: Smaller or rural areas such as Clackmannanshire which has 91 accident cases and Orkney Islands that has 117 accidents report, very few accidents compared to big cities. This shows a strong urban and rural gap in accident frequency.</h3>

<h2>5. How do the number of accidents vary across different years? Has the total number of accidents increased or decreased over the years? </h2>

In [21]:
accident['Accident_Year'].value_counts()

Accident_Year
2019    182115
2020    170591
2021    163554
2022    144419
Name: count, dtype: int64

<h3>Insight#10: In 2019, accidents reached 182,115 which is 27.6%, the highest across the dataset. By 2022, the number dropped to 144,419, which is a 21% decrease overall </h3>
<h3>Insight#11: From 2019 to 2020, accidents fell from 182,115 to 170,591, marking a 6% drop. </h3>
<h3>Insight#12: While from 2020 to 2021, accidents drop more by 4%. </h3>
<h3>Insight#13: So over the years for the span of 2021 to 2022 is the drop of accident, where accidents went down by 12%. </h3>

<h3>Bivariate</h3>

<h2>6. How has accident severity changed from 2019 to 2022?</h2>

In [25]:
district_year =accident.groupby(["Accident_Severity", "Accident_Year"]).size().unstack()
district_year

Accident_Year,2019,2020,2021,2022
Accident_Severity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fatal,2714,2341,2057,1549
Serious,24322,23121,21997,18777
Slight,155079,145129,139500,124093


<h3>Insight#14: Fatal accidents dropped from 2,714 in 2019 to 1,549 in 2022 a 43% decrease of accident, which suggests road safety improves</h3>
<h3>Insights#15: Serious accidents also decreased, from 24,322 in 2019 to 18,777 in 2022 a 23% decrease. While still high, the decline shows a positive safety improvements. </h3>
<h3>Insights#16: Slight accidents remain the most common by far, though they also declined significantly from 155,079 in 2019 to 124,093 in 2022 a 20% decrease. </h3>
<h3>Insights#17: Across all severity levels, accidents have been steadily decreasing every year from 2019 to 2022, showing a clear downward in total accidents. </h3>

<h2>7. Do accidents that happen in daylight tend to be less severe compared to those in darkness?</h2>

In [24]:
light_severity = accident.groupby(["Light_Conditions", "Accident_Severity"]).size().unstack(fill_value=0)
light_severity

Accident_Severity,Fatal,Serious,Slight
Light_Conditions,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Darkness - lighting unknown,68,794,5622
Darkness - lights lit,1860,19130,108345
Darkness - lights unlit,45,360,2138
Darkness - no lighting,1612,7174,28651
Daylight,5076,60759,419045


<h3>Insights#18: The majority of accidents occur during daylight conditions, which shows that high traffic volume during the day contributes more to accidents than poor visibility at night. </h3>
<h3>Insights#19: While total accidents are lower at night, the proportion of fatal accidents tends to be higher in darkness, especially when street has no light or has poor light. </h3>
<h3>Insights#20: Areas with street lights record more accidents overall, but the severity is lower compared to accidents in unlit or poorly lit conditions, suggesting lighting plays a key role in reducing accident severity. </h3>

<h2>8. On which day of the week are accidents most likely to result in severe outcomes? </h2>

In [33]:
accident.groupby(["Day_of_Week", "Accident_Severity"]).size().unstack()

Accident_Severity,Fatal,Serious,Slight
Day_of_Week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1338,12023,66470
1,1171,12271,82095
2,1155,12430,83484
3,1166,12535,84178
4,1138,12598,82853
5,1265,13364,87136
6,1428,12996,77585


<h3>Insights#21: Monday has the highest fatal accident share with 1.86%, so it is the riskiest day for deadly crashes. </h3>
<h3>Insights#22: Sunday has the highest share of serious accidents with 14.19%, meaning severe accidents are more likely at the end of the week.</h3>
<h3>Insights#23: Slight accidents dominate every day 81–86%, but weekend days which is Saturday and Sunday have lower slight accident percentages, indicating accidents are more likely to be severe on weekends.</h3>

<h2>9. Do wet or icy road surfaces contribute to higher accident severity compared to dry conditions?</h2>

In [34]:
road_accidents = accident.groupby(["Road_Surface_Conditions", "Accident_Severity"]).size().unstack()
road_accidents

Accident_Severity,Fatal,Serious,Slight
Road_Surface_Conditions,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Dry,5788,61638,380395
Flood over 3cm. deep,23,152,842
Frost or ice,193,2007,16317
Snow,35,565,5290
Unaccounted,2,70,654
Wet or damp,2620,23785,160303


<h3>Insights#24: Wet roads cause more fatal accidents than dry roads. There are 1.41% fatal accidents on wet/damp roads compared to 1.29% on dry roads, even though dry roads have far more total accidents. </h3>
<h3>Insights#25: Icy and snowy roads mostly lead to slight injuries. Frost or ice had 88.12% slight accidents out of 18,517 total, and snow had 89.81% slight accidents out of 5,890 total, meaning most accidents on these surfaces are minor. </h3>
<h3>Insights#26: Wet or icy road surfaces lead to higher accident severity compared to dry conditions. Wet/damp and flooded roads have a higher share of fatal and serious accidents relative to their total number of crashes. </h3>
<h3>Insights#27: Flooded roads are the most dangerous condition. They have the highest fatal accident percentage 2.26%, meaning floods are the most severe road condition overall, even though they happen less often.</h3>