<h1>Activity #2 - UK Road Accident Data Analytics</h1>
<hr>
<h3>Analyst: Carl Kien Carabido</h3>

<p>
This activity focuses on analyzing the <b>UK Road Accident dataset</b>, 
which contains detailed records of accidents including severity, location, 
weather conditions, road conditions, and vehicles involved. 
The goal is to practice fundamental data analysis techniques using 
the <code>pandas</code> library in Python, such as:
</p>

<ul>
  <li>Exploring dataset structure and summary statistics</li>
  <li>Handling missing values with statistical methods</li>
  <li>Inspecting and converting column data types</li>
  <li>Accessing and analyzing individual columns</li>
  <li>Preparing the dataset for further analysis</li>
</ul>

<p>
Through this exercise, I aim to reinforce my skills in 
<b>data cleaning, preparation, and basic exploration</b> — 
key steps in any real-world data analytics workflow.
</p>


<h2>📌 Step 1: Import Necessary Libraries</h2>
<p>We start by importing the required Python libraries for data analysis.</p>

In [1]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

<h2>📌 Step 2: Load Dataset into a DataFrame</h2>
<p>Load the <b>UK Road Accident</b> dataset into a Pandas DataFrame for analysis.</p>

In [2]:
uk_accidents = pd.read_csv('datasets\\uk_road_accident.csv')
uk_accidents

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type
0,200701BS64157,Serious,5/6/2019,51.506187,Darkness - lights lit,Kensington and Chelsea,-0.209082,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car
1,200701BS65737,Serious,2/7/2019,51.495029,Daylight,Kensington and Chelsea,-0.173647,1,2,Wet or damp,Single carriageway,Urban,Raining no high winds,Car
2,200701BS66127,Serious,26-08-2019,51.517715,Darkness - lighting unknown,Kensington and Chelsea,-0.210215,1,3,Dry,,Urban,,Taxi/Private hire car
3,200701BS66128,Serious,16-08-2019,51.495478,Daylight,Kensington and Chelsea,-0.202731,1,4,Dry,Single carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats)
4,200701BS66837,Slight,3/9/2019,51.488576,Darkness - lights lit,Kensington and Chelsea,-0.192487,1,2,Dry,,Urban,,Other vehicle
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660674,201091NM01760,Slight,18-02-2022,57.374005,Daylight,Highland,-3.467828,2,1,Dry,Single carriageway,Rural,Fine no high winds,Car
660675,201091NM01881,Slight,21-02-2022,57.232273,Darkness - no lighting,Highland,-3.809281,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660676,201091NM01935,Slight,23-02-2022,57.585044,Daylight,Highland,-3.862727,1,3,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660677,201091NM01964,Serious,23-02-2022,57.214898,Darkness - no lighting,Highland,-3.823997,1,2,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc


<h2>📌 Step 3: Check DataFrame Information</h2>
<p>Get a quick overview of the dataset:</p>
<ul>
  <li>Number of rows & columns</li>
  <li>Column names</li>
  <li>Data types</li>
  <li>Memory usage</li>
</ul>

In [3]:
uk_accidents.info()
uk_accidents.shape
uk_accidents.columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660679 entries, 0 to 660678
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   Index                    660679 non-null  object 
 1   Accident_Severity        660679 non-null  object 
 2   Accident Date            660679 non-null  object 
 3   Latitude                 660654 non-null  float64
 4   Light_Conditions         660679 non-null  object 
 5   District Area            660679 non-null  object 
 6   Longitude                660653 non-null  float64
 7   Number_of_Casualties     660679 non-null  int64  
 8   Number_of_Vehicles       660679 non-null  int64  
 9   Road_Surface_Conditions  659953 non-null  object 
 10  Road_Type                656159 non-null  object 
 11  Urban_or_Rural_Area      660664 non-null  object 
 12  Weather_Conditions       646551 non-null  object 
 13  Vehicle_Type             660679 non-null  object 
dtypes: f

Index(['Index', 'Accident_Severity', 'Accident Date', 'Latitude',
       'Light_Conditions', 'District Area', 'Longitude',
       'Number_of_Casualties', 'Number_of_Vehicles', 'Road_Surface_Conditions',
       'Road_Type', 'Urban_or_Rural_Area', 'Weather_Conditions',
       'Vehicle_Type'],
      dtype='object')

<h2>📌 Step 4: Basic Descriptive Statistics</h2>
<p>Check statistical summary of both numeric and categorical columns.</p>

In [4]:
uk_accidents.describe(include="all")

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type
count,660679.0,660679,660679,660654.0,660679,660679,660653.0,660679.0,660679.0,659953,656159,660664,646551,660679
unique,421020.0,3,1461,,5,422,,,,5,5,3,8,16
top,2010000000000.0,Slight,30-11-2019,,Daylight,Birmingham,,,,Dry,Single carriageway,Urban,Fine no high winds,Car
freq,239478.0,563801,704,,484880,13491,,,,447821,492143,421663,520885,497992
mean,,,,52.553866,,,-1.43121,1.35704,1.831255,,,,,
std,,,,1.406922,,,1.38333,0.824847,0.715269,,,,,
min,,,,49.91443,,,-7.516225,1.0,1.0,,,,,
25%,,,,51.49069,,,-2.332291,1.0,1.0,,,,,
50%,,,,52.315641,,,-1.411667,1.0,2.0,,,,,
75%,,,,53.453452,,,-0.232869,1.0,2.0,,,,,


<h2>📌 Step 5: Access Columns Individually</h2>
<p>We can inspect specific columns from the DataFrame.</p>

In [5]:
uk_accidents["Accident_Severity"]
uk_accidents["Number_of_Casualties"]

0         1
1         1
2         1
3         1
4         1
         ..
660674    2
660675    1
660676    1
660677    1
660678    1
Name: Number_of_Casualties, Length: 660679, dtype: int64

<h2>📌 Step 6: Check for Missing Values</h2>
<p>Identify which columns have null values and how many are missing.</p>

In [31]:
uk_accidents.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

In [30]:
uk_accidents["Latitude"].fillna(uk_accidents["Latitude"].mean(), inplace=True)
uk_accidents["Longitude"].fillna(uk_accidents["Longitude"].mean(), inplace=True)
uk_accidents["Road_Surface_Conditions"].fillna(uk_accidents["Road_Surface_Conditions"].mode()[0], inplace=True)
uk_accidents["Road_Type"].fillna(uk_accidents["Road_Type"].mode()[0], inplace=True)
uk_accidents["Urban_or_Rural_Area"].fillna(uk_accidents["Urban_or_Rural_Area"].mode()[0], inplace=True)
uk_accidents["Weather_Conditions"].fillna(uk_accidents["Weather_Conditions"].mode()[0], inplace=True)
mode_date = uk_accidents["Accident Date"].mode()[0]
uk_accidents["Accident Date"].fillna(mode_date, inplace=True)

<h2>📌 Step 8: Check & Adjust Data Types</h2>
<p>Inspect column data types and adjust if necessary.</p>

In [39]:
# Convert "Accident Date" column to datetime
uk_accidents["Accident Date"] = pd.to_datetime(uk_accidents["Accident Date"],dayfirst=True, errors="coerce")
categorical_cols = [
    "Accident_Severity",
    "Light_Conditions",
    "District Area",
    "Road_Surface_Conditions",
    "Road_Type",
    "Urban_or_Rural_Area",
    "Weather_Conditions",
    "Vehicle_Type"
]

for col in categorical_cols:
    uk_accidents[col] = uk_accidents[col].astype("category")

In [40]:
uk_accidents.dtypes

Index                              object
Accident_Severity                category
Accident Date              datetime64[ns]
Latitude                          float64
Light_Conditions                 category
District Area                    category
Longitude                         float64
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions          category
Road_Type                        category
Urban_or_Rural_Area              category
Weather_Conditions               category
Vehicle_Type                     category
Year                                int32
Month                               int32
Day                                 int32
Day_of_week                         int32
dtype: object

<h2>Extracting Date Information using Pandas Date/Time</h2>

In [41]:
uk_accidents['Year'] = uk_accidents['Accident Date'].dt.year
uk_accidents['Month'] = uk_accidents['Accident Date'].dt.month
uk_accidents['Day'] = uk_accidents['Accident Date'].dt.day
uk_accidents['Day_of_week'] = uk_accidents['Accident Date'].dt.dayofweek

In [42]:
uk_accidents.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
Year                       0
Month                      0
Day                        0
Day_of_week                0
dtype: int64

In [46]:
uk_accidents.groupby('Year')['Number_of_Casualties'].mean()

Year
2019    1.358092
2020    1.349002
2021    1.358972
2022    1.349934
Name: Number_of_Casualties, dtype: float64

<h1>Activity #2 – UK Road Accident Data Analytics</h1>
<h3>Analyst: Carl Kien Carabido</h3>
<hr>

<h2>📊Exploratory Data Analysis (EDA)</h2>
<p>Now that the dataset has been cleaned, we proceed to <b>Exploratory Data Analysis (EDA)</b> to uncover insights and patterns.</p>
<hr>

<h2>1. How many total accidents are in the dataset?</h2>

In [10]:
len(uk_accidents)

660679

<h3>Result:</h3>
<p>660,679 recorded accidents.</p>

<h3>Insight:</h3>
<p>The dataset is very large, providing a strong basis for meaningful analysis.</p>
<hr>

<h2>2. What are the different categories of accident severity, and how many accidents fall into each?</h2>

In [11]:
uk_accidents["Accident_Severity"].value_counts()

Accident_Severity
Slight     563801
Serious     88217
Fatal        8661
Name: count, dtype: int64

<h3>Result:</h3>
<p>
Slight     563,801<br>
Serious     88,217<br>
Fatal        8,661
</p>

<h3>Insight:</h3>
<p>Most accidents are <b>slight</b>, while fatal accidents are rare.</p>
<hr>

<h2>3. What is the most common weather condition during accidents?</h2>

In [12]:
uk_accidents["Weather_Conditions"].value_counts()

Weather_Conditions
Fine no high winds       535013
Raining no high winds     79696
Other                     17150
Raining + high winds       9615
Fine + high winds          8554
Snowing no high winds      6238
Fog or mist                3528
Snowing + high winds        885
Name: count, dtype: int64

<h3>Result:</h3>
<p>
Fine no high winds       535,013<br>
Raining no high winds     79,696<br>
Other                     17,150<br>
Raining + high winds       9,615<br>
Fine + high winds          8,554<br>
Snowing no high winds      6,238<br>
Fog or mist                3,528<br>
Snowing + high winds         885
</p>

<h3>Insight:</h3>
<p>Most accidents happened in <b>fine weather</b>, suggesting driver behavior is a bigger factor than poor weather.</p>
<hr>


<h2>4. Which road surface condition is most associated with accidents?</h2>

In [13]:
uk_accidents["Road_Surface_Conditions"].value_counts()

Road_Surface_Conditions
Dry                     448547
Wet or damp             186708
Frost or ice             18517
Snow                      5890
Flood over 3cm. deep      1017
Name: count, dtype: int64

<h3>Result:</h3>
<p>
Dry                     448,547<br>
Wet or damp             186,708<br>
Frost or ice             18,517<br>
Snow                      5,890<br>
Flood over 3cm. deep      1,017
</p>

<h3>Insight:</h3>
<p>Most accidents occurred on <b>dry roads</b>, highlighting that environmental factors are not the main cause — human error likely is.</p>
<hr>

<h2>5. What proportion of accidents occur in urban vs rural areas?</h2>

In [14]:
uk_accidents["Urban_or_Rural_Area"].value_counts(normalize=True) * 100

Urban_or_Rural_Area
Urban          63.824944
Rural          36.173391
Unallocated     0.001665
Name: proportion, dtype: float64

<h3>Result:</h3>
<p>
Urban          63.82%<br>
Rural          36.17%<br>
Unallocated     0.17%
</p>

<h3>Insight:</h3>
<p>About <b>two-thirds of accidents</b> occurred in <b>urban areas</b>, likely because of heavier traffic.</p>
<hr>

<h2>6. Which road type has the highest number of accidents?</h2>

In [15]:
uk_accidents["Road_Type"].value_counts()

Road_Type
Single carriageway    496663
Dual carriageway       99424
Roundabout             43992
One way street         13559
Slip road               7041
Name: count, dtype: int64

<h3>Result:</h3>
<p>
Single carriageway    496,663<br>
Dual carriageway       99,424<br>
Roundabout             43,992<br>
One way street         13,559<br>
Slip road               7,041
</p>

<h3>Insight:</h3>
<p>Most accidents occurred on <b>single carriageways</b>, showing they are the riskiest road type.</p>
<hr>

<h2>7. What is the average number of vehicles involved in accidents?</h2>

In [16]:
uk_accidents["Number_of_Vehicles"].mean()

np.float64(1.8312554205597575)

<h3>Result:</h3>
<p>Average number of vehicles involved: 1.83</p>

<h3>Insight:</h3>
<p>Most accidents involve <b>one or two vehicles</b>, meaning single-vehicle crashes are still very common.</p>
<hr>

<h2>8. What is the average number of casualties per accident?</h2>

In [17]:
uk_accidents["Number_of_Casualties"].mean()


np.float64(1.357040257068864)

<h3>Result:</h3>
<p>Average number of casualties per accident: 1.36</p>

<h3>Insight:</h3>
<p>On average, about <b>1 - 2 people</b>, which suggests most incidents affect more than just the driver.</p>
<hr>

<h2>9. What is the maximum number of casualties recorded in a single accident?</h2>

In [18]:
uk_accidents["Number_of_Casualties"].max()

np.int64(68)

<h3>Result:</h3>
<p>Maximum number of casualties in a single accident: 68</p>

<h3>Insight:</h3>
<p>Although rare, some accidents can be extremely severe, with <b>dozens of casualties</b> in one event.</p>
<hr>

<h2>10. Which weather condition has the highest average number of casualties?</h2>

In [54]:
uk_accidents.groupby("Weather_Conditions")["Number_of_Casualties"].mean()

Weather_Conditions
Fine + high winds        1.386018
Fine no high winds       1.347397
Fog or mist              1.452948
Other                    1.354869
Raining + high winds     1.416641
Raining no high winds    1.408214
Snowing + high winds     1.418079
Snowing no high winds    1.341776
Name: Number_of_Casualties, dtype: float64

<h3>Result:</h3>
<p>
Fine + high winds        1.39<br>
Fine no high winds       1.35<br>
Fog or mist              1.45<br>
Other                    1.35<br>
Raining + high winds     1.42<br>
Raining no high winds    1.41<br>
Snowing + high winds     1.42<br>
Snowing no high winds    1.34
</p>

<h3>Insight:</h3>
<p>The <b>highest average casualties per accident</b> happen during <b>fog or mist</b>, showing low visibility is particularly dangerous.</p>
<hr>

<h2>11. Which road surface condition has the highest accident severity?</h2>

In [20]:
uk_accidents.groupby("Road_Surface_Conditions")["Accident_Severity"].value_counts()

Road_Surface_Conditions  Accident_Severity
Dry                      Slight               381049
                         Serious               61708
                         Fatal                  5790
Flood over 3cm. deep     Slight                  842
                         Serious                 152
                         Fatal                    23
Frost or ice             Slight                16317
                         Serious                2007
                         Fatal                   193
Snow                     Slight                 5290
                         Serious                 565
                         Fatal                    35
Wet or damp              Slight               160303
                         Serious               23785
                         Fatal                  2620
Name: count, dtype: int64

<h3>Result:</h3>
<p>
<b>Dry</b><br>
Slight: 381,049 | Serious: 61,708 | Fatal: 5,790

<b>Wet or damp</b>
Slight: 160,303 | Serious: 23,785 | Fatal: 2,620

<b>Frost or ice</b>
Slight: 16,317 | Serious: 2,007 | Fatal: 193

<b>Snow</b>
Slight: 5,290 | Serious: 565 | Fatal: 35

<b>Flood over 3cm deep</b>
Slight: 842 | Serious: 152 | Fatal: 23
</p>

<h3>Insight:</h3>
<p>Accidents on <b>dry roads</b> dominate all categories, but <b>wet/icy conditions</b> increase the likelihood of serious or fatal accidents.</p>
<hr>

<h2>12. How many accidents happened each year?</h2>

In [21]:
uk_accidents.groupby(uk_accidents["Accident Date"].dt.year).size()

Accident Date
2019.0    71867
2020.0    70163
2021.0    66172
2022.0    56805
dtype: int64

<h3>Result:</h3>
<p>
2019 → 71,867<br>
2020 → 70,163<br>
2021 → 461,844<br>
2022 → 56,805
</p>

<h3>Insight:</h3>
<p>2021 shows an <b>abnormally high accident count</b>, likely due to reporting or dataset issues rather than a real surge in accidents.</p>
<hr>

<h2>13. In which month do accidents occur most frequently?</h2>

In [22]:
uk_accidents.groupby(uk_accidents["Accident Date"].dt.month).size()

Accident Date
1.0     22606
2.0     21815
3.0     21540
4.0     21699
5.0     22409
6.0     21974
7.0     21431
8.0     21914
9.0     22252
10.0    22328
11.0    22503
12.0    22536
dtype: int64

<h3>Result:</h3>
<p>
Jan: 22,606<br>
Feb: 21,815<br>
Mar: 21,540<br>
Apr: 21,699<br>
May: 22,409<br>
Jun: 21,974<br>
Jul: 21,431<br>
Aug: 21,914<br>
Sep: 22,252<br>
Oct: 22,328<br>
Nov: 418,175<br>
Dec: 22,536
</p>

<h3>Insight:</h3>
<p><b>November’s spike</b> is clearly a <b>data quality problem</b>, not a real seasonal effect. Other months are fairly consistent.</p>
<hr>

<h2>14. On which day of the week do most accidents occur?</h2>

In [52]:
uk_accidents.groupby(uk_accidents["Accident Date"].dt.day_name()).size()

Accident Date
Friday        38511
Monday        35715
Saturday      37751
Sunday        37772
Thursday      38009
Tuesday      435373
Wednesday     37548
dtype: int64

<h3>Result:</h3>
<p>
Monday → 35,715<br>
Tuesday → 435,373<br>
Wednesday → 37,548<br>
Thursday → 38,009<br>
Friday → 38,511<br>
Saturday → 37,751<br>
Sunday → 37,772
</p>

<h3>Insight:</h3>
<p><b>Tuesdays</b> have an unrealistic spike, caused by <b>data concentration issues</b>. Apart from this, accident counts are evenly distributed across the week.</p>
<hr>

<h2>15. During which light conditions do accidents occur most?</h2>

In [24]:
uk_accidents["Light_Conditions"].value_counts()

Light_Conditions
Daylight                       484880
Darkness - lights lit          129335
Darkness - no lighting          37437
Darkness - lighting unknown      6484
Darkness - lights unlit          2543
Name: count, dtype: int64

<h3>Result:</h3>
<p>
Daylight → 484,880<br>
Darkness - lights lit → 129,335<br>
Darkness - no lighting → 37,437<br>
Darkness - lighting unknown → 6,484<br>
Darkness - lights unlit → 2,543
</p>

<h3>Insight:</h3>
<p>Most accidents happened during <b>daylight</b>, but poor lighting conditions (<b>dark roads with no lights</b>) significantly raise accident risks.</p>
<hr>

<h2>16. Which district area has the most accidents recorded?</h2>

In [25]:
uk_accidents["District Area"].value_counts()

District Area
Birmingham            13491
Leeds                  8898
Manchester             6720
Bradford               6212
Sheffield              5710
                      ...  
Berwick-upon-Tweed      153
Teesdale                142
Shetland Islands        133
Orkney Islands          117
Clackmannanshire         91
Name: count, Length: 422, dtype: int64

<h3>Result:</h3>
<p>
Top districts by accident count:<br>
Birmingham → 13,491<br>
Leeds → 8,898<br>
Manchester → 6,720<br>
Bradford → 6,212<br>
Sheffield → 5,710<br>
... (422 districts total)
</p>

<h3>Insight:</h3>
<p>Big cities like <b>Birmingham, Leeds, and Manchester</b> dominate accident counts, reflecting higher traffic density and population.</p>
<hr>

<h2>17. What is the relationship between the number of vehicles and number of casualties?</h2>

In [26]:
uk_accidents.groupby("Number_of_Vehicles")["Number_of_Casualties"].mean()

Number_of_Vehicles
1      1.170932
2      1.374880
3      1.711169
4      1.995575
5      2.315341
6      2.612137
7      3.064189
8      3.401361
9      3.350877
10     3.629630
11     4.000000
12     2.285714
13     7.833333
14     5.444444
15     5.000000
16     6.000000
19    13.000000
28    16.000000
32     5.000000
Name: Number_of_Casualties, dtype: float64

<h3>Result:</h3>
<p>
Average casualties per accident by number of vehicles:<br>
1 vehicle → 1.17<br>
2 vehicles → 1.37<br>
3 vehicles → 1.71<br>
4 vehicles → 1.99<br>
5 vehicles → 2.31<br>
... up to 32 vehicles → 5.00<br>
Max avg casualties observed: 28 vehicles → 16.00.
</p>

<h3>Insight:</h3>
<p>Most accidents involve <b>1–2 vehicles with only 1–2 casualties</b>. However, rare multi-vehicle pileups cause very high casualty numbers.</p>
<hr>

<h2>18. Do rural areas tend to have more severe accidents compared to urban areas?</h2>

In [27]:
uk_accidents.groupby("Urban_or_Rural_Area")["Accident_Severity"].value_counts(normalize=True) * 100

Urban_or_Rural_Area  Accident_Severity
Rural                Slight               82.044019
                     Serious              15.612369
                     Fatal                 2.343613
Unallocated          Slight               90.909091
                     Serious               9.090909
                     Fatal                 0.000000
Urban                Slight               87.202557
                     Serious              12.071770
                     Fatal                 0.725672
Name: proportion, dtype: float64

<h3>Result:</h3>
<p>
<b>Urban</b> → Slight: 87.2% | Serious: 12.1% | Fatal: 0.7%<br>
<b>Rural</b> → Slight: 82.0% | Serious: 15.6% | Fatal: 2.3%<br>
<b>Unallocated</b> → Slight: 90.9% | Serious: 9.1% | Fatal: 0.0%
</p>

<h3>Insight:</h3>
<p><b>Rural accidents</b> are more likely to be <b>serious or fatal</b> compared to urban ones, likely due to higher speeds and delayed medical response.</p>
<hr>

<h2>19. What is the most common vehicle type involved in accidents?</h2>

In [28]:
uk_accidents["Vehicle_Type"].value_counts()

Vehicle_Type
Car                                      497992
Van / Goods 3.5 tonnes mgw or under       34160
Bus or coach (17 or more pass seats)      25878
Motorcycle over 500cc                     25657
Goods 7.5 tonnes mgw and over             17307
Motorcycle 125cc and under                15269
Taxi/Private hire car                     13294
Motorcycle over 125cc and up to 500cc      7656
Motorcycle 50cc and under                  7603
Goods over 3.5t. and under 7.5t            6096
Other vehicle                              5637
Minibus (8 - 16 passenger seats)           1976
Agricultural vehicle                       1947
Pedal cycle                                 197
Data missing or out of range                  6
Ridden horse                                  4
Name: count, dtype: int64

<h3>Result:</h3>
<p>
Cars → 497,992<br>
Vans (≤3.5t) → 34,160<br>
Buses/coaches (17+ seats) → 25,878<br>
Motorcycles (>500cc) → 25,657<br>
Heavy goods vehicles (≥7.5t) → 17,307<br>
Other types (cycles, taxis, horses, etc.) much lower.
</p>

<h3>Insight:</h3>
<p><b>Cars dominate accident involvement</b>, making up the vast majority, followed by <b>vans, buses, and motorcycles</b>.</p>
<hr>


<h2>20. What percentage of accidents occur under normal weather and road conditions?</h2>

In [29]:
normal_accidents = uk_accidents[
    (uk_accidents["Weather_Conditions"] == uk_accidents["Weather_Conditions"].mode()[0]) &
    (uk_accidents["Road_Surface_Conditions"] == uk_accidents["Road_Surface_Conditions"].mode()[0])
]
(len(normal_accidents) / len(uk_accidents)) * 100

66.55773832678199

<h3>Result:</h3>  
About <b>66%</b> of accidents happen when the weather is fine and the road surface is dry.  

<h3>Insight:</h3>  
Most accidents occur under normal conditions rather than extreme ones. This suggests that driver behavior, traffic volume, and other human factors are more critical causes of accidents than weather or road conditions.  
<hr>