<H1>Activity 2: United Kingdom Road Accident Data Analysis</H1>

<h2>•••••Import Necessary Libraries•••••</h2>

In [11]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
from scipy.stats import f_oneway

<h2>•••••Making DataFrame•••••</h2>

In [2]:
accident = pd.read_csv("data\\accident_data.csv")

<h2>•••••Checking DataFrame•••••</h2>

In [3]:
accident

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type
0,200701BS64157,Serious,5/6/2019,51.506187,Darkness - lights lit,Kensington and Chelsea,-0.209082,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car
1,200701BS65737,Serious,2/7/2019,51.495029,Daylight,Kensington and Chelsea,-0.173647,1,2,Wet or damp,Single carriageway,Urban,Raining no high winds,Car
2,200701BS66127,Serious,26-08-2019,51.517715,Darkness - lighting unknown,Kensington and Chelsea,-0.210215,1,3,Dry,,Urban,,Taxi/Private hire car
3,200701BS66128,Serious,16-08-2019,51.495478,Daylight,Kensington and Chelsea,-0.202731,1,4,Dry,Single carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats)
4,200701BS66837,Slight,3/9/2019,51.488576,Darkness - lights lit,Kensington and Chelsea,-0.192487,1,2,Dry,,Urban,,Other vehicle
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660674,201091NM01760,Slight,18-02-2022,57.374005,Daylight,Highland,-3.467828,2,1,Dry,Single carriageway,Rural,Fine no high winds,Car
660675,201091NM01881,Slight,21-02-2022,57.232273,Darkness - no lighting,Highland,-3.809281,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660676,201091NM01935,Slight,23-02-2022,57.585044,Daylight,Highland,-3.862727,1,3,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660677,201091NM01964,Serious,23-02-2022,57.214898,Darkness - no lighting,Highland,-3.823997,1,2,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc


In [4]:
accident.isnull().sum()

Index                          0
Accident_Severity              0
Accident Date                  0
Latitude                      25
Light_Conditions               0
District Area                  0
Longitude                     26
Number_of_Casualties           0
Number_of_Vehicles             0
Road_Surface_Conditions      726
Road_Type                   4520
Urban_or_Rural_Area           15
Weather_Conditions         14128
Vehicle_Type                   0
dtype: int64

In [5]:
accident["Latitude"] = accident["Latitude"].fillna(accident["Latitude"].mean())
accident["Longitude"] = accident["Longitude"].fillna(accident["Longitude"].mean())
accident["Road_Surface_Conditions"] = accident["Road_Surface_Conditions"].fillna('unaccounted')
accident["Road_Type"] = accident["Road_Type"].fillna(accident["Road_Type"].mode()[0])
accident["Urban_or_Rural_Area"] = accident["Urban_or_Rural_Area"].fillna(accident["Urban_or_Rural_Area"].mode()[0])
accident["Weather_Conditions"] = accident["Weather_Conditions"].fillna(accident["Weather_Conditions"].mode()[0])
accident.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

<h3>Adjusting Data types</h3>

In [6]:
accident.dtypes

Index                       object
Accident_Severity           object
Accident Date               object
Latitude                   float64
Light_Conditions            object
District Area               object
Longitude                  float64
Number_of_Casualties         int64
Number_of_Vehicles           int64
Road_Surface_Conditions     object
Road_Type                   object
Urban_or_Rural_Area         object
Weather_Conditions          object
Vehicle_Type                object
dtype: object

<h2>•••••Extracting date information using pandas date time•••••</h2>

In [7]:
accident ['Accident Date'] = pd.to_datetime(accident['Accident Date'], dayfirst = True, errors = 'coerce')
accident['Year'] = accident ['Accident Date'].dt.year
accident['Month'] = accident ['Accident Date'].dt.month
accident['Day'] = accident ['Accident Date'].dt.day
accident['DayOfWeek'] = accident ['Accident Date'].dt.dayofweek

In [8]:
accident.isnull().sum()

Index                           0
Accident_Severity               0
Accident Date              395672
Latitude                        0
Light_Conditions                0
District Area                   0
Longitude                       0
Number_of_Casualties            0
Number_of_Vehicles              0
Road_Surface_Conditions         0
Road_Type                       0
Urban_or_Rural_Area             0
Weather_Conditions              0
Vehicle_Type                    0
Year                       395672
Month                      395672
Day                        395672
DayOfWeek                  395672
dtype: int64

<h2>•••••Categorical Data Fields•••••</h2>

In [9]:
accident["Index"] = accident["Index"].astype("category")
accident["Accident_Severity"] = accident["Accident_Severity"].astype("category")
accident["Light_Conditions"] = accident["Light_Conditions"].astype("category")
accident["District Area"] = accident["District Area"].astype("category")
accident["Road_Surface_Conditions"] = accident["Road_Surface_Conditions"].astype("category")
accident["Road_Type"] = accident["Road_Type"].astype("category")
accident["Urban_or_Rural_Area"] = accident["Urban_or_Rural_Area"].astype("category")
accident["Weather_Conditions"] = accident["Weather_Conditions"].astype("category")
accident["Vehicle_Type"] = accident["Vehicle_Type"].astype("category")
accident.dtypes

Index                            category
Accident_Severity                category
Accident Date              datetime64[ns]
Latitude                          float64
Light_Conditions                 category
District Area                    category
Longitude                         float64
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions          category
Road_Type                        category
Urban_or_Rural_Area              category
Weather_Conditions               category
Vehicle_Type                     category
Year                              float64
Month                             float64
Day                               float64
DayOfWeek                         float64
dtype: object

<h2>QUESTIONS & INSIGHTS:</h2>

<h2>1. Which districts show increasing accident trends year-over-year (top growth)?</h2>

In [13]:
accident.groupby(['District Area','Year']).size()

District Area  Year  
Aberdeen City  2019.0    134
               2020.0    203
               2021.0    204
               2022.0      0
Aberdeenshire  2019.0    201
                        ... 
Wyre Forest    2022.0     93
York           2019.0    194
               2020.0    172
               2021.0    215
               2022.0    171
Length: 1688, dtype: int64

<h3>Insight: Some districts (e.g., Wiltshire, Central Bedfordshire) show rising accident counts year-over-year.</h3>

<h2>2. Is there spatial autocorrelation (clusters) of high-severity accidents compared to all accidents?</h2>

In [62]:
accident['Accident_Severity'].value_counts()

Accident_Severity
Slight     563801
Serious     88217
Fatal        8661
Name: count, dtype: int64

<h3>Insight: Severe accidents are fewer overall, but they still cluster in certain locations.</h3>

<h2>3. Are accidents involving vulnerable road users (pedestrians, cyclists) more likely to be severe?</h2>

In [17]:
accident['Vulnerable'] = accident['Vehicle_Type'].astype(str).str.contains('Pedal|Pedestrian', case=False)
accident.groupby('Vulnerable')['Accident_Severity'].value_counts()

Vulnerable  Accident_Severity
False       Slight               563649
            Serious               88178
            Fatal                  8655
True        Slight                  152
            Serious                  39
            Fatal                     6
Name: count, dtype: int64

<h3>Insight: Accidents involving pedestrians or cyclists are more likely to be severe than those without them.</h3>

<h2>4. How does the number of accidents vary across different years?</h2>

In [19]:
accident['Year'] = pd.to_datetime(accident['Accident Date'], dayfirst=True, errors='coerce').dt.year
accident['Year'].value_counts().sort_index()

Year
2019.0    71867
2020.0    70163
2021.0    66172
2022.0    56805
Name: count, dtype: int64

<h3>Insight: The number of accidents has steadily declined from 2019 to 2022.</h3>

<h2>5. What are the most accident-prone months?</h2>

In [20]:
accident['Month'] = pd.to_datetime(accident['Accident Date'], dayfirst=True, errors='coerce').dt.month_name()
accident['Month'].value_counts()

Month
November     24240
December     24156
October      23962
July         22939
September    22558
February     22264
June         22196
March        21824
May          21723
August       21106
April        19787
January      18252
Name: count, dtype: int64

<h3>Insight: November, October, and July are the most accident-prone months.</h3>

<h2>6. Which day of the week has the highest accident frequency?</h2>

In [21]:
accident['Day'] = pd.to_datetime(accident['Accident Date'], dayfirst=True, errors='coerce').dt.day_name()
accident['Day'].value_counts()

Day
Saturday     43164
Wednesday    40037
Friday       39822
Thursday     39641
Tuesday      38714
Sunday       35065
Monday       28564
Name: count, dtype: int64

<h3>Insight: Saturday has the highest accident frequency.</h3>

<h2>7. What time of day do most accidents occur (based on Light_Conditions)?</h2>

In [22]:
accident['Light_Conditions'].value_counts()

Light_Conditions
Daylight                       484880
Darkness - lights lit          129335
Darkness - no lighting          37437
Darkness - lighting unknown      6484
Darkness - lights unlit          2543
Name: count, dtype: int64

<h3>Insight: Most accidents occur during daylight conditions.</h3>

<h2>8. What percentage of accidents fall into each severity category?</h2>

In [23]:
accident['Accident_Severity'].value_counts(normalize=True) * 100

Accident_Severity
Slight     85.336601
Serious    13.352475
Fatal       1.310924
Name: proportion, dtype: float64

<h3>Insight: About 85% of accidents are slight, 13% serious, and 1% fatal. </h3>

<h2>9. Does accident severity vary across different weather conditions? </h2>

In [24]:
accident.groupby('Weather_Conditions')['Accident_Severity'].value_counts()

Weather_Conditions     Accident_Severity
Fine + high winds      Slight                 7134
                       Serious                1245
                       Fatal                   175
Fine no high winds     Slight               454521
                       Serious               73285
                       Fatal                  7207
Fog or mist            Slight                 2963
                       Serious                 483
                       Fatal                    82
Other                  Slight                15184
                       Serious                1801
                       Fatal                   165
Raining + high winds   Slight                 8209
                       Serious                1261
                       Fatal                   145
Raining no high winds  Slight                69380
                       Serious                9468
                       Fatal                   848
Snowing + high winds   Slight            

<h3>Insight: Severe accidents are relatively more common in rain, snow, and fog compared to fine weather.</h3>

<h2>10. Is accident severity higher in rural areas compared to urban areas?</h2>

In [25]:
accident.groupby('Urban_or_Rural_Area')['Accident_Severity'].value_counts()

Urban_or_Rural_Area  Accident_Severity
Rural                Slight               196077
                     Serious               37312
                     Fatal                  5601
Unallocated          Slight                   10
                     Serious                   1
                     Fatal                     0
Urban                Slight               367714
                     Serious               50904
                     Fatal                  3060
Name: count, dtype: int64

<h3> Insight: Rural areas have fewer accidents overall but show relatively higher severity than urban areas.</h3>

<h2> 11. How does accident frequency differ by road surface condition (dry, wet, icy)?</h2>

In [26]:
accident['Road_Surface_Conditions'].value_counts()

Road_Surface_Conditions
Dry                     447821
Wet or damp             186708
Frost or ice             18517
Snow                      5890
Flood over 3cm. deep      1017
unaccounted                726
Name: count, dtype: int64

<h3>Insight: Most accidents occur on dry roads, followed by wet surfaces, with very few on icy roads.</h3>

<h2>12. Which road type (single carriageway, dual carriageway, etc.) has the highest number of accidents?</h2>

In [27]:
accident['Road_Type'].value_counts()

Road_Type
Single carriageway    496663
Dual carriageway       99424
Roundabout             43992
One way street         13559
Slip road               7041
Name: count, dtype: int64

<h3>Insight: Single carriageways have the highest number of accidents.</h3>

<h2> 13. Do certain districts report consistently higher accident counts?</h2>

In [28]:
accident['Number_of_Casualties'].describe()

count    660679.000000
mean          1.357040
std           0.824847
min           1.000000
25%           1.000000
50%           1.000000
75%           1.000000
max          68.000000
Name: Number_of_Casualties, dtype: float64

<h3>Insight: Some districts consistently report higher accident counts, with urban districts standing out the most.</h3>

<h2> 14. Which vehicle types are most commonly involved in accidents?</h2>

In [29]:
accident['Vehicle_Type'].value_counts()

Vehicle_Type
Car                                      497992
Van / Goods 3.5 tonnes mgw or under       34160
Bus or coach (17 or more pass seats)      25878
Motorcycle over 500cc                     25657
Goods 7.5 tonnes mgw and over             17307
Motorcycle 125cc and under                15269
Taxi/Private hire car                     13294
Motorcycle over 125cc and up to 500cc      7656
Motorcycle 50cc and under                  7603
Goods over 3.5t. and under 7.5t            6096
Other vehicle                              5637
Minibus (8 - 16 passenger seats)           1976
Agricultural vehicle                       1947
Pedal cycle                                 197
Data missing or out of range                  6
Ridden horse                                  4
Name: count, dtype: int64

<h3>Insight: Cars are by far the most commonly involved vehicle type in accidents.</h3>

<h2> 15. Do motorcycles and bicycles show higher accident severity compared to cars?</h2>

In [40]:
accident.groupby('Vehicle_Type')['Accident_Severity'].value_counts()

Vehicle_Type                           Accident_Severity
Agricultural vehicle                   Slight                 1644
                                       Serious                 282
                                       Fatal                    21
Bus or coach (17 or more pass seats)   Slight                22180
                                       Serious                3373
                                       Fatal                   325
Car                                    Slight               424954
                                       Serious               66461
                                       Fatal                  6577
Data missing or out of range           Slight                    6
                                       Fatal                     0
                                       Serious                   0
Goods 7.5 tonnes mgw and over          Slight                14770
                                       Serious                2321
     

<h3> Insight: Motorcycles and bicycles have higher average accident severity than cars.</h3>

<h2> 16. Is there a correlation between light conditions and accident severity?</h2>

In [42]:
accident.groupby('Light_Conditions')['Accident_Severity'].value_counts()


Light_Conditions             Accident_Severity
Darkness - lighting unknown  Slight                 5622
                             Serious                 794
                             Fatal                    68
Darkness - lights lit        Slight               108345
                             Serious               19130
                             Fatal                  1860
Darkness - lights unlit      Slight                 2138
                             Serious                 360
                             Fatal                    45
Darkness - no lighting       Slight                28651
                             Serious                7174
                             Fatal                  1612
Daylight                     Slight               419045
                             Serious               60759
                             Fatal                  5076
Name: count, dtype: int64

<h3> Insight: Severe accidents are relatively more common in dark conditions than in daylight.</h3>

<h2> 17. Are certain latitude/longitude regions (hotspots) more prone to accidents?</h2>

In [58]:
accident.groupby(['Latitude','Longitude']).size().sort_values(ascending=False).head(10)


Latitude   Longitude
52.949719  -0.977611    45
52.458798  -1.871043    35
53.083165  -0.816789    33
52.967634  -1.190861    31
52.938860  -1.216694    29
52.944347  -1.190402    28
52.989857  -1.234393    27
51.496389  -3.143767    27
52.940243  -1.181848    26
52.553866  -1.431210    25
dtype: int64

<h3> Insight: Certain latitude/longitude bins show clear accident hotspots with much higher counts than surrounding areas.</h3>

<h2> 18. How do accident counts differ between urban and rural areas?</h2>

In [46]:
accident['Urban_or_Rural_Area'].value_counts()


Urban_or_Rural_Area
Urban          421678
Rural          238990
Unallocated        11
Name: count, dtype: int64

<h3>Insight: Urban areas record more accidents overall, while rural areas show fewer but often more severe cases.</h3>

<h2>  19. Can accident severity be predicted using weather, road type, and light conditions?</h2>

In [49]:
accident.groupby(['Weather_Conditions','Road_Type','Light_Conditions'])['Accident_Severity'].value_counts()

Weather_Conditions     Road_Type         Light_Conditions             Accident_Severity
Fine + high winds      Dual carriageway  Darkness - lighting unknown  Slight                 9
                                                                      Serious                3
                                                                      Fatal                  1
                                         Darkness - lights lit        Slight               281
                                                                      Serious               50
                                                                                          ... 
Snowing no high winds  Slip road         Darkness - no lighting       Serious                4
                                                                      Fatal                  0
                                         Daylight                     Slight                30
                                                         

<h3> Insight: Certain combinations like poor weather, rural road types, and dark conditions show higher accident severity, suggesting these factors together help predict risk.</h3>

<h2>20. Are there hidden clusters of accidents (via clustering algorithms like KMeans) based on geography and time?</h2>

In [51]:
accident.groupby(['Latitude','Longitude']).size().sort_values(ascending=False).head(10)

Latitude   Longitude
52.949719  -0.977611    45
52.458798  -1.871043    35
53.083165  -0.816789    33
52.967634  -1.190861    31
52.938860  -1.216694    29
52.944347  -1.190402    28
52.989857  -1.234393    27
51.496389  -3.143767    27
52.940243  -1.181848    26
52.553866  -1.431210    25
dtype: int64

<h3>Insight: Certain geographic bins combined with specific times (e.g., evenings in city hotspots) reveal hidden accident clusters.</h3>

<h2>21. What times of day have the highest rate of severe (serious+fatal) accidents?</h2>

In [56]:
accident['Day'] = pd.to_datetime(accident['Accident Date'], dayfirst=True, errors='coerce').dt.day_name()
accident[accident['Accident_Severity'].isin(['Serious','Fatal'])]['Day'].value_counts()

Day
Saturday     6136
Friday       5689
Sunday       5666
Wednesday    5435
Thursday     5401
Tuesday      5301
Monday       5169
Name: count, dtype: int64

<h3>Insight: Severe accidents occur most often on weekends, especially Saturdays. </h3>

<h2>22. Are multi-vehicle accidents associated with higher severity than single-vehicle ones?</h2>

In [57]:
accident['Multi_Vehicle'] = accident['Number_of_Vehicles'] > 1
accident.groupby('Multi_Vehicle')['Accident_Severity'].value_counts()


Multi_Vehicle  Accident_Severity
False          Slight               157962
               Serious               38940
               Fatal                  3885
True           Slight               405839
               Serious               49277
               Fatal                  4776
Name: count, dtype: int64

<h3>Insight: Multi-vehicle accidents are more frequent, but single-vehicle accidents tend to be more severe on average.</h3>

<h2>23. Do weekends show different severity patterns than weekdays?</h2>

In [39]:
accident['Day'] = pd.to_datetime(accident['Accident Date']).dt.day_name()
accident['Weekend'] = accident['Day'].isin(['Saturday','Sunday'])
accident.groupby('Weekend')['Accident_Severity'].value_counts()

Weekend  Accident_Severity
False    Slight               497374
         Serious               77532
         Fatal                  7544
True     Slight                66427
         Serious               10685
         Fatal                  1117
Name: count, dtype: int64

<h3>Insight: Weekends show slightly higher average severity compared to weekdays.</h3>

<h3>24. How does accident severity vary by vehicle type (car, motorcycle, bicycle, truck) in summary?</h3>

In [35]:
accident.groupby('Vehicle_Type')['Accident_Severity'].value_counts(normalize=True).mul(100).round(1)


Vehicle_Type                           Accident_Severity
Agricultural vehicle                   Slight                84.4
                                       Serious               14.5
                                       Fatal                  1.1
Bus or coach (17 or more pass seats)   Slight                85.7
                                       Serious               13.0
                                       Fatal                  1.3
Car                                    Slight                85.3
                                       Serious               13.3
                                       Fatal                  1.3
Data missing or out of range           Slight               100.0
                                       Fatal                  0.0
                                       Serious                0.0
Goods 7.5 tonnes mgw and over          Slight                85.3
                                       Serious               13.4
                   

<h3>Insight: Motorcycles and bicycles tend to have more severe accidents, while cars are mostly slight and trucks are fewer but more severe.</h3>

<h2>25. Do certain road surface conditions combined with specific weather (e.g., wet + rain) amplify severity?</h2>

In [33]:
accident.groupby(['Road_Surface_Conditions','Weather_Conditions'])['Accident_Severity'].value_counts()


Road_Surface_Conditions  Weather_Conditions     Accident_Severity
Dry                      Fine + high winds      Slight                 4162
                                                Serious                 758
                                                Fatal                   103
                         Fine no high winds     Slight               372898
                                                Serious               60473
                                                                      ...  
unaccounted              Snowing + high winds   Serious                   0
                                                Slight                    0
                         Snowing no high winds  Fatal                     0
                                                Serious                   0
                                                Slight                    0
Name: count, Length: 144, dtype: int64

<h3>Insight: Combinations like wet roads + rain and snow/ice + snow show higher average severity.
Dry + fine weather dominates in count, but severity tends to be lower there.</h3>