📌 Project Title:
"Road Accident Analysis and Risk Factor Identification in Urban Areas"

📝 Introduction:
Road safety remains a critical public health and infrastructure challenge in urban regions across the globe. With the increasing number of vehicles and dense city traffic, understanding the root causes of road accidents has become essential to formulating data-driven safety strategies.

This project focuses on analyzing real-world accident data to uncover key patterns and contributing factors associated with road incidents. Using data preprocessing, visualization, and exploratory data analysis (EDA), we aim to identify:

The most dangerous types of junctions

The influence of weather and lighting conditions on accident severity

Road types that are more prone to accidents

Temporal trends in accident occurrences

By uncovering these insights, the project seeks to assist traffic management authorities and urban planners in making informed decisions that enhance road safety and reduce accident risks.



In [None]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Load dataset
df = pd.read_excel("C:\Users\DELL\Downloads\Book1.xlsx")

# Fill missing values
for col in ['Road_Surface_Conditions', 'Road_Type', 'Weather_Conditions']:
    df[col] = df[col].fillna(df[col].mode()[0])

# Encode all object columns safely
categorical_columns = df.select_dtypes(include='object').columns
label_encoders = {}

for col in categorical_columns:
    le = LabelEncoder()
    df[col] = df[col].astype(str)  # Fix mixed types
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le

# Check result
print(df.head())

   Accident_Index Accident Date  Month  Year  Junction_Control  \
0               0    2021-01-01      4  2021                 4   
1               1    2021-01-05      4  2021                 4   
2               2    2021-01-04      4  2021                 4   
3               3    2021-01-05      4  2021                 2   
4               4    2021-01-06      4  2021                 2   

   Junction_Detail  Accident_Severity   Latitude  Light_Conditions  \
0                8                  1  51.512273                 4   
1                0                  1  51.514399                 4   
2                8                  2  51.486668                 4   
3                8                  1  51.507804                 4   
4                0                  1  51.482076                 1   

   Local_Authority_(District)  ...  Longitude  Number_of_Casualties  \
0                         115  ...  -0.201349                     1   
1                         115  ...  -0.1

In [None]:
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset
df = pd.read_excel("C:\Users\DELL\Downloads\Book1.xlsx")

# Display basic info
print("Basic Info:")
print(df.info())

# Handle missing values
for col in ['Road_Surface_Conditions', 'Road_Type', 'Weather_Conditions']:
    df[col] = df[col].fillna(df[col].mode()[0])

# -------------------------------
# 1. Junctions with Most Accidents
# -------------------------------
plt.figure(figsize=(10, 6))
sns.countplot(data=df, y='Junction_Detail', order=df['Junction_Detail'].value_counts().index)
plt.title("Most Accident-Prone Junction Types")
plt.xlabel("Number of Accidents")
plt.ylabel("Junction Type")
plt.tight_layout()
plt.show()

# -------------------------------
# 2. Accident Severity vs Weather
# -------------------------------
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Accident_Severity', hue='Weather_Conditions')
plt.title("Accident Severity by Weather Condition")
plt.xlabel("Severity")
plt.ylabel("Count")
plt.legend(title="Weather")
plt.tight_layout()
plt.show()

# -------------------------------
# 3. Accidents by Light Conditions
# -------------------------------
plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='Light_Conditions')
plt.title("Accidents by Light Conditions")
plt.xticks(rotation=30)
plt.tight_layout()
plt.show()

# -------------------------------
# 4. Road Types with Most Accidents
# -------------------------------
plt.figure(figsize=(10, 6))
sns.countplot(data=df, y='Road_Type', order=df['Road_Type'].value_counts().index)
plt.title("Accidents by Road Type")
plt.xlabel("Number of Accidents")
plt.ylabel("Road Type")
plt.tight_layout()
plt.show()

# -------------------------------
# 5. Accident Pattern Over Time
# -------------------------------
# Convert date if not already
df['Accident Date'] = pd.to_datetime(df['Accident Date'])

# Group by month
monthly_accidents = df.groupby(df['Accident Date'].dt.to_period('M')).size()

plt.figure(figsize=(12, 6))
monthly_accidents.plot()
plt.title("Monthly Accident Trend")
plt.xlabel("Month")
plt.ylabel("Number of Accidents")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Pie chart for accidents by road type
road_counts = df['Road_Type'].value_counts()
plt.figure(figsize=(8, 8))
plt.pie(road_counts, labels=road_counts.index, autopct='%1.1f%%', startangle=140)
plt.title("🛣️ Accident Distribution by Road Type")
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()
plt.figure(figsize=(10, 6))
sns.violinplot(x='Accident_Severity', y='Light_Conditions', data=df, inner="quart", palette="muted")
plt.title("💡 Accident Severity vs Light Conditions (Violin Plot)")
plt.xlabel("Accident Severity")
plt.ylabel("Light Conditions")
plt.tight_layout()
plt.show()


