üìÅ Dataset Description

The Road Accident Dataset contains detailed records of road traffic accidents, capturing information related to accident severity, location, time, weather conditions, road surface, lighting, vehicles involved, and casualties.
The dataset is designed to support exploratory data analysis and visualization, helping identify patterns, trends, and high-risk factors associated with road accidents.

This data is suitable for:
	‚Ä¢	Accident trend analysis
	‚Ä¢	Risk factor identification
	‚Ä¢	Traffic safety studies
	‚Ä¢	Data visualization and machine learning projects


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import folium
from folium.plugins import HeatMap
import numpy as np
from datetime import datetime

In [None]:
df = pd.read_csv("Road Accident Data.csv")
df.head()

In [None]:
df.info()


In [None]:
df.columns = (
    df.columns
    .str.strip()
    .str.lower()
    .str.replace(" ", "_")
)

df.columns

In [None]:
df["accident_date"] = pd.to_datetime(
    df["accident_date"], errors="coerce"
)

In [None]:
# Check missing values
df.isnull().sum()

In [None]:
for col in df.select_dtypes(include="object").columns:
    df[col].fillna("Unknown", inplace=True)

for col in df.select_dtypes(include=["int64", "float64"]).columns:
    df[col].fillna(df[col].median(), inplace=True)

In [None]:
numeric_cols = [
    "number_of_vehicles",
    "number_of_casualties",
    "speed_limit"
]

for col in numeric_cols:
    if col in df.columns:
        df[col] = pd.to_numeric(df[col], errors="coerce")

In [None]:
df["year"] = df["accident_date"].dt.year
df["month"] = df["accident_date"].dt.month
df["day"] = df["accident_date"].dt.day
df["weekday"] = df["accident_date"].dt.day_name()

In [None]:
df.drop_duplicates(inplace=True)

In [None]:
df.describe()

In [None]:
sns.set_theme(style="whitegrid")

üìä Accident Severity Distribution

Description:
This chart shows the distribution of road accidents based on their severity.
It helps identify how frequently slight, serious, and fatal accidents occur, highlighting the overall risk level on roads.

In [None]:
plt.figure(figsize=(8,5))
sns.countplot(data=df, x="accident_severity")
plt.title("Accident Severity Distribution")
plt.xlabel("Severity")
plt.ylabel("Count")
plt.show()

üìÖ Accidents by Day of the Week

Description:
This visualization analyzes the frequency of road accidents across different days of the week.
It helps determine whether accidents are more common on weekdays or weekends due to traffic patterns.

In [None]:
plt.figure(figsize=(10,5))
sns.countplot(
    data=df,
    x="weekday",
    order=["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
)
plt.title("Accidents by Day of Week")
plt.xlabel("Day")
plt.ylabel("Count")
plt.show()

üöó Speed Limit vs Accident Frequency

Description:
This chart illustrates how accident frequency varies across different speed limits.
It highlights zones where accidents are most common, often reflecting urban traffic congestion areas.

In [None]:
plt.figure(figsize=(8,5))
sns.histplot(df["speed_limit"], bins=15, kde=True)
plt.title("Accidents by Speed Limit")
plt.xlabel("Speed Limit")
plt.ylabel("Frequency")
plt.show()

üöë Casualties Distribution

Description:
This box plot represents the spread and outliers in the number of casualties per accident.
It provides insight into typical accident impact and rare but severe cases.

In [None]:
plt.figure(figsize=(8,5))
sns.boxplot(x=df["number_of_casualties"])
plt.title("Casualties Distribution")
plt.show()

üìà Year-wise Accident Trend

Description:
This interactive bar chart displays the trend of road accidents over the years.
It helps identify whether accident rates are increasing, decreasing, or stable over time.

In [None]:
fig = px.bar(
    df.groupby("year").size().reset_index(name="count"),
    x="year",
    y="count",
    title="Year-wise Road Accidents"
)
fig.show()

üå¶Ô∏è Accidents by Weather Conditions

Description:
This pie chart visualizes the proportion of accidents occurring under different weather conditions.
It reveals that many accidents happen in clear weather, emphasizing the role of human factors.

In [None]:
fig = px.pie(
    df,
    names="weather_conditions",
    title="Accidents by Weather Conditions"
)
fig.show()

üí° Accidents by Light Conditions

Description:
This chart compares accident counts under different lighting conditions such as daylight and darkness.
It shows how visibility and traffic volume influence accident occurrence.


In [None]:
fig = px.bar(
    df,
    x="light_conditions",
    title="Accidents by Light Conditions"
)
fig.show()

üèôÔ∏è Urban vs Rural Accident Distribution

Description:
This visualization compares accident occurrences in urban and rural areas.
It highlights higher accident density in urban regions due to traffic congestion.


In [None]:
fig = px.pie(
    df,
    names="urban_or_rural_area",
    title="Urban vs Rural Accident Distribution"
)
fig.show()

üó∫Ô∏è Accident Location Map

Description:
This interactive map plots individual accident locations using geographic coordinates.
It allows users to visually explore where accidents are concentrated across regions.




In [None]:
m = folium.Map(
    location=[df["latitude"].mean(), df["longitude"].mean()],
    zoom_start=6
)

for _, row in df.sample(500).iterrows():
    folium.CircleMarker(
        location=[row["latitude"], row["longitude"]],
        radius=2,
        fill=True
    ).add_to(m)

m

üî• Accident Density Heatmap

Description:
This heatmap represents accident density across locations.
Brighter areas indicate high-risk zones, helping identify accident-prone regions.

In [None]:
heat_data = df[["latitude", "longitude"]].dropna().values.tolist()

heat_map = folium.Map(
    location=[df["latitude"].mean(), df["longitude"].mean()],
    zoom_start=6
)

HeatMap(heat_data).add_to(heat_map)

heat_map

