<h1>Project: United Kingdom Road Accident Data Analysis</h1>
<h3>Inclusive Years: 2019-2022</h3>
<p>Analyst: Jomarie Roperez</p>

<h1><strong>STEP 1:</strong> Import necessary libraries</h1>

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import f_oneway

import matplotlib.pyplot as plt
import seaborn as sns


<h1><strong>STEP 1.1:</strong>Load dataset</h1>

In [2]:
uk_accident_data = pd.read_csv('datasets\\accident_data.csv')

<h1><strong>STEP 1.2:</strong>Create a copy for EDA (Preserve the original dataset)</h1>

In [3]:
eda_data = uk_accident_data.copy()


In [4]:
eda_data

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type
0,200701BS64157,Serious,5/6/2019,51.506187,Darkness - lights lit,Kensington and Chelsea,-0.209082,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car
1,200701BS65737,Serious,2/7/2019,51.495029,Daylight,Kensington and Chelsea,-0.173647,1,2,Wet or damp,Single carriageway,Urban,Raining no high winds,Car
2,200701BS66127,Serious,26-08-2019,51.517715,Darkness - lighting unknown,Kensington and Chelsea,-0.210215,1,3,Dry,,Urban,,Taxi/Private hire car
3,200701BS66128,Serious,16-08-2019,51.495478,Daylight,Kensington and Chelsea,-0.202731,1,4,Dry,Single carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats)
4,200701BS66837,Slight,3/9/2019,51.488576,Darkness - lights lit,Kensington and Chelsea,-0.192487,1,2,Dry,,Urban,,Other vehicle
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660674,201091NM01760,Slight,18-02-2022,57.374005,Daylight,Highland,-3.467828,2,1,Dry,Single carriageway,Rural,Fine no high winds,Car
660675,201091NM01881,Slight,21-02-2022,57.232273,Darkness - no lighting,Highland,-3.809281,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660676,201091NM01935,Slight,23-02-2022,57.585044,Daylight,Highland,-3.862727,1,3,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660677,201091NM01964,Serious,23-02-2022,57.214898,Darkness - no lighting,Highland,-3.823997,1,2,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc


In [5]:
eda_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660679 entries, 0 to 660678
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   Index                    660679 non-null  object 
 1   Accident_Severity        660679 non-null  object 
 2   Accident Date            660679 non-null  object 
 3   Latitude                 660654 non-null  float64
 4   Light_Conditions         660679 non-null  object 
 5   District Area            660679 non-null  object 
 6   Longitude                660653 non-null  float64
 7   Number_of_Casualties     660679 non-null  int64  
 8   Number_of_Vehicles       660679 non-null  int64  
 9   Road_Surface_Conditions  659953 non-null  object 
 10  Road_Type                656159 non-null  object 
 11  Urban_or_Rural_Area      660664 non-null  object 
 12  Weather_Conditions       646551 non-null  object 
 13  Vehicle_Type             660679 non-null  object 
dtypes: f

In [6]:
eda_data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Latitude,660654.0,52.553866,1.406922,49.91443,51.49069,52.315641,53.453452,60.757544
Longitude,660653.0,-1.43121,1.38333,-7.516225,-2.332291,-1.411667,-0.232869,1.76201
Number_of_Casualties,660679.0,1.35704,0.824847,1.0,1.0,1.0,1.0,68.0
Number_of_Vehicles,660679.0,1.831255,0.715269,1.0,1.0,2.0,2.0,32.0


<h1><strong>STEP 2:</strong> Check for Missing Values</h1>

In [7]:
eda_data.isnull().sum()

Index                          0
Accident_Severity              0
Accident Date                  0
Latitude                      25
Light_Conditions               0
District Area                  0
Longitude                     26
Number_of_Casualties           0
Number_of_Vehicles             0
Road_Surface_Conditions      726
Road_Type                   4520
Urban_or_Rural_Area           15
Weather_Conditions         14128
Vehicle_Type                   0
dtype: int64

<h1><strong>STEP 3:</strong>Handle Missing Values(still in Progress)</h1>

In [8]:
eda_data.isnull().sum()

Index                          0
Accident_Severity              0
Accident Date                  0
Latitude                      25
Light_Conditions               0
District Area                  0
Longitude                     26
Number_of_Casualties           0
Number_of_Vehicles             0
Road_Surface_Conditions      726
Road_Type                   4520
Urban_or_Rural_Area           15
Weather_Conditions         14128
Vehicle_Type                   0
dtype: int64

In [9]:
eda_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660679 entries, 0 to 660678
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   Index                    660679 non-null  object 
 1   Accident_Severity        660679 non-null  object 
 2   Accident Date            660679 non-null  object 
 3   Latitude                 660654 non-null  float64
 4   Light_Conditions         660679 non-null  object 
 5   District Area            660679 non-null  object 
 6   Longitude                660653 non-null  float64
 7   Number_of_Casualties     660679 non-null  int64  
 8   Number_of_Vehicles       660679 non-null  int64  
 9   Road_Surface_Conditions  659953 non-null  object 
 10  Road_Type                656159 non-null  object 
 11  Urban_or_Rural_Area      660664 non-null  object 
 12  Weather_Conditions       646551 non-null  object 
 13  Vehicle_Type             660679 non-null  object 
dtypes: f

In [10]:
eda_data['Road_Surface_Conditions'] = eda_data['Road_Surface_Conditions'].fillna('Unknown road condition')

eda_data['Latitude'] = eda_data['Latitude'].fillna(eda_data['Latitude'].mode()[0])
eda_data['Longitude'] = eda_data['Longitude'].fillna(eda_data['Longitude'].mode()[0])
eda_data['Urban_or_Rural_Area'] = eda_data['Urban_or_Rural_Area'].fillna(eda_data['Urban_or_Rural_Area'].mode()[0])


eda_data['Road_Type'] = eda_data['Road_Type'].fillna('Unaccounted')
eda_data['Weather_Conditions'] = eda_data['Weather_Conditions'].fillna('Unaccounted')


In [11]:
eda_data.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

<h1><strong>STEP 4:</strong> Exploring Categorical Columns</h1>

In [12]:
# # for object type
# categorical_columns = eda_data.select_dtypes(include=['object']).columns
# categorical_columns


In [13]:
# # unique values for each categorical column
# for col in categorical_columns:
#     print(f"Unique values in '{col}':\n", eda_data[col].unique(), "\n")


In [14]:
# # Count unique values for each categorical column
# for col in categorical_columns:
#     print(f"Value counts for '{col}':\n", eda_data[col].value_counts(), "\n")


In [15]:
eda_data.dtypes

Index                       object
Accident_Severity           object
Accident Date               object
Latitude                   float64
Light_Conditions            object
District Area               object
Longitude                  float64
Number_of_Casualties         int64
Number_of_Vehicles           int64
Road_Surface_Conditions     object
Road_Type                   object
Urban_or_Rural_Area         object
Weather_Conditions          object
Vehicle_Type                object
dtype: object

<h1><strong>STEP 5:</strong>Data Type Conversion</h1>

In [16]:
# eda_data['Accident Date'] = pd.to_datetime(
#     eda_data['Accident Date'], 
#     dayfirst=True, 
#     errors='coerce'
#     )

In [17]:
# eda_data.isnull().sum()

In [18]:
eda_data['Accident Date'].unique()

array(['5/6/2019', '2/7/2019', '26-08-2019', ..., '26-12-2022',
       '25-07-2022', '25-12-2022'], dtype=object)

In [19]:
eda_data['Accident Date'] = eda_data['Accident Date'].astype('str')
eda_data.dtypes

Index                       object
Accident_Severity           object
Accident Date               object
Latitude                   float64
Light_Conditions            object
District Area               object
Longitude                  float64
Number_of_Casualties         int64
Number_of_Vehicles           int64
Road_Surface_Conditions     object
Road_Type                   object
Urban_or_Rural_Area         object
Weather_Conditions          object
Vehicle_Type                object
dtype: object

In [20]:
eda_data['Accident Date'] = eda_data['Accident Date'].str.strip()

In [21]:
eda_data['Accident Date'] = eda_data['Accident Date'].str.replace('/','-')

In [22]:
eda_data['Accident Date'] = pd.to_datetime(
    eda_data['Accident Date'], 
    dayfirst=True, 
    errors='coerce'
    )

In [23]:
eda_data.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

<h1><strong>STEP 6:</strong>Categorization of columns</h1>

In [24]:
# category_columns = [
#     "Index",
#     "Accident_Severity",
#     "Light_Conditions",
#     "Weather_Conditions",
#     "Road_Surface_Conditions",
#     "Road_Type",
#     "Urban_or_Rural_Area",
#     "Vehicle_Type"
# ]
# eda_data[category_columns] = eda_data[category_columns].astype('category')


In [25]:
eda_data.dtypes

Index                              object
Accident_Severity                  object
Accident Date              datetime64[ns]
Latitude                          float64
Light_Conditions                   object
District Area                      object
Longitude                         float64
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions            object
Road_Type                          object
Urban_or_Rural_Area                object
Weather_Conditions                 object
Vehicle_Type                       object
dtype: object

<h1>Accidents</h1>

In [26]:
eda_data['Year'] =  eda_data['Accident Date'].dt.year

In [27]:
eda_data['Year'].value_counts()

Year
2019    182115
2020    170591
2021    163554
2022    144419
Name: count, dtype: int64

In [28]:
eda_data['Month'] =  eda_data['Accident Date'].dt.month

In [29]:
eda_data['Month'].value_counts()

Month
11    60424
10    59580
7     57445
6     56481
9     56455
5     56352
3     54086
8     53913
1     52872
12    51836
4     51744
2     49491
Name: count, dtype: int64

In [30]:
eda_data['Day'] =  eda_data['Accident Date'].dt.day

In [31]:
eda_data['Day'].value_counts().head(5)

Day
1     22606
12    22536
11    22503
5     22409
10    22328
Name: count, dtype: int64

In [32]:
eda_data.columns


Index(['Index', 'Accident_Severity', 'Accident Date', 'Latitude',
       'Light_Conditions', 'District Area', 'Longitude',
       'Number_of_Casualties', 'Number_of_Vehicles', 'Road_Surface_Conditions',
       'Road_Type', 'Urban_or_Rural_Area', 'Weather_Conditions',
       'Vehicle_Type', 'Year', 'Month', 'Day'],
      dtype='object')

In [33]:
eda_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660679 entries, 0 to 660678
Data columns (total 17 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   Index                    660679 non-null  object        
 1   Accident_Severity        660679 non-null  object        
 2   Accident Date            660679 non-null  datetime64[ns]
 3   Latitude                 660679 non-null  float64       
 4   Light_Conditions         660679 non-null  object        
 5   District Area            660679 non-null  object        
 6   Longitude                660679 non-null  float64       
 7   Number_of_Casualties     660679 non-null  int64         
 8   Number_of_Vehicles       660679 non-null  int64         
 9   Road_Surface_Conditions  660679 non-null  object        
 10  Road_Type                660679 non-null  object        
 11  Urban_or_Rural_Area      660679 non-null  object        
 12  Weather_Conditio

In [34]:
#finding Categories

eda_data.select_dtypes(include=['object']).columns

Index(['Index', 'Accident_Severity', 'Light_Conditions', 'District Area',
       'Road_Surface_Conditions', 'Road_Type', 'Urban_or_Rural_Area',
       'Weather_Conditions', 'Vehicle_Type'],
      dtype='object')

In [35]:
eda_data['Accident_Severity'].unique()

array(['Serious', 'Slight', 'Fatal'], dtype=object)

In [36]:
cat_cols = ['Index', 'Accident_Severity', 'Light_Conditions', 'District Area',
       'Road_Surface_Conditions', 'Road_Type', 'Urban_or_Rural_Area',
       'Weather_Conditions', 'Vehicle_Type']


In [37]:
# para madetermine kung category ba talaga
for col in cat_cols:
    print(f"Column: {col}")
    print(eda_data[col].unique())
    print('\n',"=" * 100,'\n')


Column: Index
['200701BS64157' '200701BS65737' '200701BS66127' ... '201091NM01935'
 '201091NM01964' '201091NM02142']


Column: Accident_Severity
['Serious' 'Slight' 'Fatal']


Column: Light_Conditions
['Darkness - lights lit' 'Daylight' 'Darkness - lighting unknown'
 'Darkness - lights unlit' 'Darkness - no lighting']


Column: District Area
['Kensington and Chelsea' 'Westminster' 'Richmond upon Thames'
 'Hammersmith and Fulham' 'Hounslow' 'Tower Hamlets' 'City of London'
 'Southwark' 'Camden' 'Hackney' 'Islington' 'Barnet' 'Brent' 'Haringey'
 'Merton' 'Ealing' 'Enfield' 'Greenwich' 'Newham'
 'London Airport (Heathrow)' 'Hillingdon' 'Waltham Forest' 'Redbridge'
 'Barking and Dagenham' 'Bromley' 'Havering' 'Croydon' 'Lambeth'
 'Wandsworth' 'Sutton' 'Bexley' 'Lewisham' 'Harrow' 'Kingston upon Thames'
 'Barrow-in-Furness' 'South Lakeland' 'Carlisle' 'Eden' 'Allerdale'
 'Copeland' 'Fylde' 'Blackpool' 'Wyre' 'Lancaster' 'Chorley'
 'West Lancashire' 'South Ribble' 'Preston' 'Blackburn with D

In [38]:
eda_data[cat_cols] = eda_data[cat_cols].astype('category')


In [39]:
eda_data.dtypes

Index                            category
Accident_Severity                category
Accident Date              datetime64[ns]
Latitude                          float64
Light_Conditions                 category
District Area                    category
Longitude                         float64
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions          category
Road_Type                        category
Urban_or_Rural_Area              category
Weather_Conditions               category
Vehicle_Type                     category
Year                                int32
Month                               int32
Day                                 int32
dtype: object

# Q1: What is the most common road surface condition during accidents?


In [40]:
# most_common_rfc = eda_data['Road_Surface_Conditions'].value_counts()
# most_common_rfc

<h1><b>Insights on Road Surface Conditions and Accidents</b></h1>

<ol>
    <li><b>Most Accidents Occur on Dry Roads: <u>447,821 cases</u></b></li>
    <p>Insights: Higher traffic volume and overconfidence lead to more accidents.</p>
    <li><b>Wet or Damp Roads Increase Risk: <u>186,708 cases</u></b></li>
    <p>Insights: Reduced traction on wet surfaces increases accident likelihood.</p>
    <li><b>Snow and Ice Are Less Frequent but Dangerous: <u>18,517 (Frost/Ice) + 5,890 (Snow)</u></b></li>
    <p>Insights: Fewer vehicles on the road, but slippery conditions cause more severe accidents.</p>
    <li><b>Flooded Roads Have the Least Accidents: <u>1,017 cases</u></b></li>
    <p>Insights: Drivers avoid flooded roads, reducing accident occurrence.</p>
    <li><b>Unknown Road Conditions: <u>726 cases</u></b></li>
    <p>Insights: Incomplete reporting or misclassification of conditions.</p>
</ol>



# Q2: How does accident severity vary between urban and rural areas?


In [41]:
# as_in_urban_rural = eda_data.groupby('Urban_or_Rural_Area')['Accident_Severity'].value_counts().unstack()

In [42]:
# as_in_urban_rural

<h1>Insights on Accident Severity in Urban vs. Rural Areas</h1>

<ol>
    <li><b>Rural Areas Have More Fatal Accidents: <u>5,601 cases</u></b></li>
    <p>Insights: Higher speeds and fewer safety measures contribute to increased fatalities.</p>
    <li><b>Serious Accidents Are More Common in Urban Areas: <u>50,904 cases</u></b></li>
    <p>Insights: Higher traffic density and pedestrian interactions lead to more serious injuries.</p>
    <li><b>Slight Injuries Are Most Frequent in Urban Areas: <u>367,714 cases</u></b></li>
    <p>Insights: Lower speeds in urban settings result in more minor injuries rather than fatalities.</p>
    <li><b>Unallocated Data Is Minimal: <u>11 cases</u></b></li>
    <p>Insights: Possible reporting errors or accidents occurring in undefined areas.</p>
</ol>


# Q3: Which weather condition has the highest number of accidents?


In [43]:
# wc_to_as = eda_data['Weather_Conditions'].value_counts()

In [44]:
# wc_to_as

<h2>Insights on  weather condition to number of accidents?</h2>

<h2>Question #3: Which weather condition has the highest number of accidents?</h2>

<ul>
    <li><b>Most Accidents Occur in Clear Weather: <u>520,885 cases</u></b></li>
    <p>Insights: Higher traffic volume and driver complacency lead to more accidents in good weather.</p>
    <li><b>Rain Increases Accident Risk: <u>89,311 cases</u></b></li>
    <p>Insights: Wet roads reduce traction and visibility, increasing accident likelihood.</p>
    <li><b>Other or Unaccounted Weather Conditions: <u>31,278 cases</u></b></li>
    <p>Insights: Mixed or unclassified conditions may involve unpredictable hazards.</p>
    <li><b>Snow and Fog Are Less Common but Still Risky: <u>10,651 cases</u></b></li>
    <p>Insights: Fewer vehicles on the road during extreme weather may lower total accidents, but conditions make driving more dangerous.</p>
</ul>



# Q4: What is the distribution of accidents per month/year?

In [45]:
# accident_perMonth_everyYear = eda_data.groupby(['Year', 'Month']).size().unstack()

In [46]:
# accident_perMonth_everyYear

In [47]:
# def plot_accident_heatmap(accident_perMonth_everyYear):    
#     month_list = [
#         'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
#         'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'
#     ]
    
#     plt.figure(figsize=(12, 6))  
    
#     sns.heatmap(accident_perMonth_everyYear, cmap="Reds", annot=True, fmt="d", linewidths=0.5, cbar_kws={'label': 'Accident Count'})
    
#     plt.title("Monthly Accident Distribution (2019-2022)", fontsize=14, fontweight="bold")
#     plt.xlabel("Month", fontsize=12) 
#     plt.ylabel("Year", fontsize=12)

#     plt.xticks(ticks=range(12), labels=month_list, rotation=45)
#     plt.yticks(rotation=0)  

#     plt.tight_layout()  
#     plt.show()


# malag laptop kaya function

In [48]:
# plot_accident_heatmap(accident_perMonth_everyYear)

<h2>Insights on Monthly Accident Trends in the UK (2019-2022)</h2>

<ul>
    <li><b>Autumn Peak:</b> October & November consistently report the highest accident counts across all years.</li>
    <p>Insights: Shorter daylight hours and wetter road conditions contribute to higher risks.</p>
    <li><b>Lowest Accidents in 2022:</b> Significant drop in accident numbers compared to previous years.</li>
    <p>Insights: Post-pandemic changes in travel behavior and road safety improvements may be factors.</p>
    <li><b>February & December Typically Lower:</b> These months show reduced accident counts, possibly due to winter weather reducing travel frequency.</li>
</ul>


# Q5: Do accidents occur more frequently in daylight or darkness?

In [49]:
# daylight_vs_darkness = eda_data['Light_Conditions'].value_counts()

In [50]:
# daylight_vs_darkness

<h2>Impact of Light Conditions on Accidents</h2>

<ul>
    <li><b>Most Accidents Occur in Daylight: <u>484,880 cases</u></b></li>
    <p>Insights: Higher traffic volume during the day increases accident frequency despite better visibility.</p>
    <li><b>Accidents in Lit Darkness: <u>129,335 cases</u></b></li>
    <p>Insights: Artificial lighting helps but does not eliminate risks, especially due to driver fatigue.</p>
    <li><b>Darkness Without Streetlights: <u>37,437 cases</u></b></li>
    <p>Insights: Lack of visibility significantly increases accident risks.</p>
    <li><b>Unknown Lighting Conditions: <u>6,484 cases</u></b></li>
    <p>Insights: Possible reporting inconsistencies, or accidents in areas with intermittent lighting.</p>
    <li><b>Darkness with Unlit Streetlights: <u>2,543 cases</u></b></li>
    <p>Insights: Streetlight failures create unexpected hazards, making roads extremely dangerous.</p>
</ul>



<h1>Correlation-Based Questions</h1>
<ol>
    <li>Is there a relationship between number of casualties and number of vehicles involved?</li>
    <li>Is there a relationship between the number of vehicles involved and the month?</li>

</ol>

# Q6: Is there a relationship between number of casualties and number of vehicles involved?

In [51]:
rs_in_nc_and_nv = np.round(eda_data['Number_of_Casualties'].corr(eda_data['Number_of_Vehicles']),2)# 

In [52]:
# rs_in_nc_and_nv

<h2>Correlation Between Number of Casualties and Number of Vehicles</h2>

<p><b>Correlation: <u>0.23 (Weak Positive)</u></b></p>
<p>A correlation of 0.23 suggests a weak relationship between the number of vehicles and casualties. This means that while accidents involving more vehicles can result in more casualties, it's not a strong or consistent pattern. Other factors like speed, collision type, and safety measures likely have a greater influence.</p>



# Q7: Is there a relationship between the number of vehicles involved and the month?


In [53]:
month_nov_corr = eda_data['Number_of_Vehicles'].corr(eda_data['Month'])
month_nov_corr


0.0032017632046674733

<ul>
    <li><b>No Strong Relationship Between Month and Vehicles Involved: <u>0.0032 correlation</u></b></li>
    <p>Insights: Seasonal changes and holiday traffic do not significantly impact the number of vehicles involved in accidents.</p>
</ul>


# Q8: How has the number of accidents changed over the years?


In [54]:
eda_data['Year'].value_counts().sort_index()
severity_over_the_years= eda_data.groupby(['Year', 'Accident_Severity']).size().unstack()

In [55]:
severity_over_the_years

Accident_Severity,Fatal,Serious,Slight
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019,2714,24322,155079
2020,2341,23121,145129
2021,2057,21997,139500
2022,1549,18777,124093


<p><strong>Overall accidents have declined</strong> from 2019 to 2022 across all severity levels.</p>
    <ul>
        <li><strong>Fatal accidents:</strong> Dropped by <b>~43%</b> (from <b>2,714</b> in 2019 to <b>1,549</b> in 2022).</li>
        <li><strong>Serious accidents:</strong> Decreased by <b>~23%</b> (from <b>24,322</b> in 2019 to <b>18,777</b> in 2022).</li>
        <li><strong>Slight accidents:</strong> Reduced by <b>~20%</b> (from <b>155,079</b> in 2019 to <b>124,093</b> in 2022).</li>
    </ul>
    <p>Insights: <strong>Fatalities are decreasing faster</strong> than serious/slight accidents, possibly due to improved road safety measures, emergency response, or vehicle safety.</p>

# Q9: How road surface conditions impact severity in urban vs. rural areas?

In [56]:
severity_rsc_urban_rural = eda_data.groupby(['Road_Surface_Conditions', 'Urban_or_Rural_Area'])['Accident_Severity'].size().unstack()

In [57]:
severity_rsc_urban_rural

Urban_or_Rural_Area,Rural,Unallocated,Urban
Road_Surface_Conditions,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Dry,144861,5,302955
Flood over 3cm. deep,785,1,231
Frost or ice,10953,0,7564
Snow,3102,0,2788
Unknown road condition,284,0,442
Wet or damp,79005,5,107698


<h4>🌍 Urban vs. Rural Accidents by Road Condition</h4>
    <p><b>Dry roads:</b> Most accidents occur in urban areas (302,955), followed by rural (144,861).</p>
    <p><b>Wet roads:</b> Urban areas (107,698) still lead, but rural accidents (79,005) are high.</p>
    <p><b>Ice & Snow:</b> Rural areas have more ice-related accidents (10,953 vs. 7,564 in urban).</p>
    <p><b>Insights:</b> Rural roads are riskier in winter, while urban areas see more accidents overall.</p>


# Q10: Accident severity on different road types (single vs. dual carriageway)

In [58]:
severity_road_type = eda_data.groupby(['Road_Type', 'Urban_or_Rural_Area'])['Accident_Severity'].size().unstack()

In [59]:
severity_road_type

Urban_or_Rural_Area,Rural,Unallocated,Urban
Road_Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Dual carriageway,48715,1,50708
One way street,1193,0,12366
Roundabout,15545,1,28446
Single carriageway,168010,9,324124
Slip road,4294,0,2747
Unaccounted,1233,0,3287


<h4>Accidents by Road Type & Location</h4>
<ol>
    <li><b>Single carriageways:</b> Highest accidents in both urban (324,124) & rural (168,010) areas.</li>
    <li><b>Dual carriageways:</b> Urban (50,708) and rural (48,715) accidents are nearly equal.</li>
    <li><b>Roundabouts:</b> More urban accidents (28,446) than rural (15,545).</li>
</ol>

<p><b>Insight:</b> Single carriageways are the most accident-prone in both settings.</p>



# Q11: How light conditions impact accident severity in urban vs. rural areas

In [60]:
severity_lc_urban_rural = eda_data.groupby(['Light_Conditions', 'Urban_or_Rural_Area'])['Accident_Severity'].size().unstack()

In [61]:
severity_lc_urban_rural

Urban_or_Rural_Area,Rural,Unallocated,Urban
Light_Conditions,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Darkness - lighting unknown,2467,0,4017
Darkness - lights lit,24695,2,104638
Darkness - lights unlit,961,0,1582
Darkness - no lighting,35517,0,1920
Daylight,175350,9,309521


<h4>Accidents by Light Conditions & Location</h4>
<ol>
    <li><b>Daylight:</b> Most accidents occur in both urban (309,521) & rural (175,350) areas.</li>
    <li><b>Darkness - lights lit:</b> Higher in urban areas (104,638) than rural (24,695).</li>
    <li><b>Darkness - no lighting:</b> Rural areas (35,517) have far more accidents than urban (1,920).</li>
</ol>

<p><b>Insight:</b> Poor lighting increases accident risks, especially in rural areas.</p>


In [62]:
# Vehicle type involvement in serious vs. fatal accidents (urban vs. rural)

# python
# Copy
# Edit
# eda_data.groupby(['Vehicle_Type', 'Urban_or_Rural_Area'])['Accident_Severity'].size().unstack()
# 💡 Reveals if motorcycles, trucks, or cars have higher fatality rates in rural vs. urban areas.

# 6️⃣ How weather affects accident severity for different vehicle types

# python
# Copy
# Edit
# eda_data.groupby(['Weather_Conditions', 'Vehicle_Type'])['Accident_Severity'].size().unstack()

# Q12: Vehicle type involvement in serious vs. fatal accidents (urban vs. rural)

In [63]:
severity_v_urban_rural = eda_data.groupby(['Vehicle_Type', 'Urban_or_Rural_Area'])['Accident_Severity'].size().unstack()

In [64]:
severity_v_urban_rural

Urban_or_Rural_Area,Rural,Unallocated,Urban
Vehicle_Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Agricultural vehicle,675,0,1272
Bus or coach (17 or more pass seats),9025,2,16851
Car,181922,8,316062
Data missing or out of range,0,0,6
Goods 7.5 tonnes mgw and over,6156,0,11151
Goods over 3.5t. and under 7.5t,2232,0,3864
Minibus (8 - 16 passenger seats),718,0,1258
Motorcycle 125cc and under,5023,0,10246
Motorcycle 50cc and under,2710,0,4893
Motorcycle over 125cc and up to 500cc,2674,0,4982


<h4>Accidents by Vehicle Type & Location</h4>
<ol>
    <li><b>Cars:</b> Most involved in accidents in both urban (316,062) & rural (181,922) areas.</li>
    <li><b>Motorcycles over 500cc:</b> Higher accident rates in urban (16,700) than rural (8,957).</li>
    <li><b>Buses & coaches:</b> More accidents in urban (16,851) than rural (9,025).</li>
</ol>

<p><b>Insight:</b> Cars dominate accidents, while motorcycles and buses have higher urban risks.</p>
                                                                                                                                                                                                                                                                                                                                            

# Q13: How does accident frequency vary by district/region?

In [79]:
fre_dis_reg = eda_data.groupby('District Area').size().sort_values(ascending=False)

In [85]:
fre_dis_reg.head(5)

District Area
Birmingham    13491
Leeds          8898
Manchester     6720
Bradford       6212
Sheffield      5710
dtype: int64

<h4>Insights on Accident Frequency by District</h4>

<ol>
    <li><b>Birmingham:</b> Highest accident count (13,491), indicating a major hotspot.</li>
</ol>

# Q14: Do certain road types more prone to accidents under different weather conditions?

In [103]:
road_weather_conditions = eda_data.groupby(['Road_Type', 'Weather_Conditions']).size().unstack()


In [104]:
road_weather_conditions

Weather_Conditions,Fine + high winds,Fine no high winds,Fog or mist,Other,Raining + high winds,Raining no high winds,Snowing + high winds,Snowing no high winds,Unaccounted
Road_Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Dual carriageway,1519,76916,682,2264,2033,13044,185,1020,1761
One way street,158,11057,29,309,153,1429,14,78,332
Roundabout,540,34667,222,1109,563,5347,30,290,1224
Single carriageway,6178,389830,2541,13156,6703,58581,642,4751,9761
Slip road,109,5520,33,180,107,865,10,69,148
Unaccounted,50,2895,21,132,56,430,4,30,902


<h4>Insight on Weather Impact on Accidents by Road Type</h4>
<ol>
    <li><b>Clear weather dominates:</b> Most accidents occur in fine weather with no high winds, especially on single carriageways (389,830).</li>
    <li><b>Rain increases risk:</b> Over 58,581 accidents happened on wet single carriageways, showing higher risks in rainy conditions.</li>
    <li><b>Snow-related accidents:</b> Single carriageways have the highest snow-related accidents (4,751 in light snow, 642 in high winds).</li>
</ol>

<p><b>Insight:</b> While most accidents occur in clear weather, wet and snowy conditions significantly increase risk, especially on single carriageways.</p>


In [105]:
# road_weather_conditions1 = eda_data.groupby(['Road_Type', 'Weather_Conditions']).size().unstack()
# road_weather_conditions1

# Q15: Which vehicle types are more involved in accidents under different road surface conditions?

In [92]:
vehicle_road_surface = eda_data.groupby(['Vehicle_Type', 'Road_Surface_Conditions']).size().unstack()

In [93]:
vehicle_road_surface

Road_Surface_Conditions,Dry,Flood over 3cm. deep,Frost or ice,Snow,Unknown road condition,Wet or damp
Vehicle_Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agricultural vehicle,1303,8,59,24,1,552
Bus or coach (17 or more pass seats),17604,40,646,214,19,7355
Car,337311,777,14108,4483,549,140764
Data missing or out of range,3,0,1,0,0,2
Goods 7.5 tonnes mgw and over,11690,32,432,147,18,4988
Goods over 3.5t. and under 7.5t,4136,6,162,51,2,1739
Minibus (8 - 16 passenger seats),1355,1,40,16,1,563
Motorcycle 125cc and under,10485,14,420,128,30,4192
Motorcycle 50cc and under,5189,13,224,77,11,2089
Motorcycle over 125cc and up to 500cc,5225,9,204,71,8,2139


<h4>Insight on Vehicle Type & Road Surface Conditions</h4>
<ol>
    <li><b>Cars dominate accidents:</b> The highest number of accidents occur with cars, especially on dry (337,311) and wet roads (140,764).</li>
    <li><b>Motorcycles & poor surfaces:</b> Motorcycles have significant accidents on icy (420-689) and wet roads (4,192-7,233), highlighting their vulnerability.</li>
    <li><b>Buses & larger vehicles:</b> Buses and heavy goods vehicles show notable accidents on wet roads (7,355 for buses, 4,988 for heavy goods).</li>
</ol>

<p><b>Insight:</b> While most accidents occur on dry roads, wet and icy conditions significantly

# Q16: What is the relationship between vehicle type and light conditions during accidents?

In [106]:
vehicle_light = eda_data.groupby(['Vehicle_Type', 'Light_Conditions']).size().unstack()

In [107]:
vehicle_light

Light_Conditions,Darkness - lighting unknown,Darkness - lights lit,Darkness - lights unlit,Darkness - no lighting,Daylight
Vehicle_Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Agricultural vehicle,19,365,7,113,1443
Bus or coach (17 or more pass seats),267,5142,112,1427,18930
Car,4914,96994,1933,28385,365766
Data missing or out of range,0,1,0,0,5
Goods 7.5 tonnes mgw and over,185,3440,67,963,12652
Goods over 3.5t. and under 7.5t,59,1192,16,367,4462
Minibus (8 - 16 passenger seats),20,347,4,121,1484
Motorcycle 125cc and under,136,3074,50,794,11215
Motorcycle 50cc and under,68,1494,26,424,5591
Motorcycle over 125cc and up to 500cc,86,1480,25,447,5618


In [115]:
test_eda_data = eda_data.copy()

severity_district_road = test_eda_data.groupby(['District Area', 'Road_Type'])['Accident_Severity'].size().unstack()
severity_district_road['Total'] = severity_district_road.sum(axis=1)  # Create a total count column
severity_district_road = severity_district_road.sort_values(by='Total', ascending=False)  # Sort by total accidents


In [117]:
severity_district_road.head(10)

Road_Type,Dual carriageway,One way street,Roundabout,Single carriageway,Slip road,Unaccounted,Total
District Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Birmingham,3039,321,1008,9018,98,7,13491
Leeds,1900,207,432,6152,161,46,8898
Manchester,1181,170,189,4908,59,213,6720
Bradford,707,53,231,5155,36,30,6212
Sheffield,1035,137,387,4060,41,50,5710
Westminster,830,367,141,4351,6,11,5706
Liverpool,1801,154,173,3403,34,22,5587
Glasgow City,1585,366,133,2700,88,70,4942
"Bristol, City of",441,177,353,3797,40,11,4819
Kirklees,574,82,127,3855,37,15,4690


<h4>Insight on  Accidents by District & Road Type</h4>
<ol>
    <li><b>Birmingham leads with the highest accidents (13,491):</b> Single carriageways (9,018) are the most dangerous.</li>
    <li><b>Leeds (8,898) and Manchester (6,720) follow:</b> Single carriageways account for most accidents.</li>
    <li><b>Westminster stands out:</b> More accidents on one-way streets (367) compared to other districts.</li>
</ol>

<p><b>Insight:</b> Single carriageways consistently have the most accidents across districts, especially in Birmingham.</p>


# Q17: Are accidents more severe on weekends or weekdays?


In [124]:
eda_data.columns

Index(['Index', 'Accident_Severity', 'Accident Date', 'Latitude',
       'Light_Conditions', 'District Area', 'Longitude',
       'Number_of_Casualties', 'Number_of_Vehicles', 'Road_Surface_Conditions',
       'Road_Type', 'Urban_or_Rural_Area', 'Weather_Conditions',
       'Vehicle_Type', 'Year', 'Month', 'Day'],
      dtype='object')

In [130]:
eda_data['dayofweek'] = eda_data['Accident Date'].dt.dayofweek
eda_data.groupby(['dayofweek','Accident_Severity']).size().unstack()


Accident_Severity,Fatal,Serious,Slight
dayofweek,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,1385,11664,59631
1,1105,11918,81527
2,1113,12488,85957
3,1097,12440,85974
4,1113,12633,84154
5,1326,14000,91852
6,1522,13074,74706


<h4>Insight on Accidents by Day of the Week</h4>
<ol>
    <li><b>Weekends (Saturday & Sunday):</b> Higher fatal accidents, peaking on Saturday (1,522).</li>
    <li><b>Weekdays:</b> Higher total accident counts, with slight injuries peaking on Friday (84,154).</li>
    <li><b>Saturday:</b> Stands out with both high fatalities (1,522) and serious injuries (13,074).</li>
</ol>

<p><b>Insight:</b> Fatal accidents are more frequent on weekends, while slight accidents are higher on weekdays, likely due to weekday commuting patterns.</p>


# Q18: