TASK **5**

Analyze traffic accident data to identify patterns related to road conditions, weather, and time of day. Visualize accident hotspots and contributing factors.

In [None]:
# ======================================================
# 📦 STEP 1: Import Required Libraries
# ======================================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import geopandas as gpd
from shapely.geometry import Point
import warnings
warnings.filterwarnings("ignore")
sns.set(style="whitegrid")


In [None]:
# ======================================================
# 📥 STEP 2: Load Dataset
# ======================================================
file_path = "/content/sampled_US_Accidents.csv"
df = pd.read_csv(file_path)
print(f"✅ Dataset loaded with shape: {df.shape}")
print("📌 Columns:", df.columns.tolist())

In [None]:
# ======================================================
# 🧹 STEP 3: Data Preprocessing
# ======================================================

# 3.1 Drop duplicates
df.drop_duplicates(inplace=True)

# 3.2 Convert Start_Time to datetime
df['Start_Time'] = pd.to_datetime(df['Start_Time'], errors='coerce')

# 3.3 Drop rows with missing key info
df.dropna(subset=['Start_Time', 'Start_Lat', 'Start_Lng'], inplace=True)

# 3.4 Extract time features
df['Hour'] = df['Start_Time'].dt.hour
df['Weekday'] = df['Start_Time'].dt.day_name()
df['Month'] = df['Start_Time'].dt.month_name()

# 3.5 Fill missing environmental values
df['Weather_Condition'] = df['Weather_Condition'].fillna('Unknown')
df['Visibility(mi)'] = df['Visibility(mi)'].fillna(df['Visibility(mi)'].median())
df['Temperature(F)'] = df['Temperature(F)'].fillna(df['Temperature(F)'].median())
df['Precipitation(in)'] = df['Precipitation(in)'].fillna(0)

In [None]:
# ======================================================
# 📊 STEP 4: Exploratory Analysis
# ======================================================

# 4.1 Accidents by Hour
plt.figure(figsize=(10,5))
sns.countplot(x='Hour', data=df, palette='plasma')
plt.title("Accidents by Hour of Day", fontsize=14)
plt.xlabel("Hour")
plt.ylabel("Number of Accidents")
plt.tight_layout()
plt.show()

# 4.2 Accidents by Day of the Week
plt.figure(figsize=(10,5))
order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
sns.countplot(x='Weekday', data=df, order=order, palette='Set2')
plt.title("Accidents by Weekday", fontsize=14)
plt.tight_layout()
plt.show()

# 4.3 Weather Conditions
top_weather = df['Weather_Condition'].value_counts().nlargest(10).index
plt.figure(figsize=(12,6))
sns.countplot(y='Weather_Condition', data=df[df['Weather_Condition'].isin(top_weather)],
              order=top_weather, palette='coolwarm')
plt.title("Top 10 Weather Conditions During Accidents", fontsize=14)
plt.tight_layout()
plt.show()

# 4.4 Accidents by Month
month_order = ['January','February','March','April','May','June',
               'July','August','September','October','November','December']
plt.figure(figsize=(12,5))
sns.countplot(x='Month', data=df, order=month_order, palette='Spectral')
plt.title("Monthly Accident Distribution", fontsize=14)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


In [None]:
# ======================================================
# 🗺️ STEP 5: Visualize Accident Hotspots
# ======================================================

# 5.1 Create GeoDataFrame
gdf = gpd.GeoDataFrame(
    df,
    geometry=gpd.points_from_xy(df['Start_Lng'], df['Start_Lat']),
    crs="EPSG:4326"
)

# 5.2 Static Map (sample for speed)
plt.figure(figsize=(10,6))
gdf.sample(1000).plot(markersize=2, color='red', alpha=0.4)
plt.title("US Accident Hotspots (Sample)", fontsize=14)
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.tight_layout()
plt.show()

# 5.3 Interactive Map
print("🌍 Creating interactive map...")
m = folium.Map(location=[39.5, -98.35], zoom_start=5)
for _, row in df.sample(500).iterrows():
    folium.CircleMarker(
        location=[row['Start_Lat'], row['Start_Lng']],
        radius=2,
        color='crimson',
        fill=True,
        fill_opacity=0.4
    ).add_to(m)

m.save("us_accident_hotspots_map.html")
print("✅ Interactive map saved as 'us_accident_hotspots_map.html'")

In [None]:
# ======================================================
# 🔬 STEP 6: Environmental Factor Distributions
# ======================================================

# 6.1 Visibility
plt.figure(figsize=(10,5))
sns.histplot(df['Visibility(mi)'], bins=40, kde=True, color='skyblue')
plt.title("Visibility Distribution", fontsize=14)
plt.tight_layout()
plt.show()

# 6.2 Temperature
plt.figure(figsize=(10,5))
sns.histplot(df['Temperature(F)'], bins=40, kde=True, color='orange')
plt.title("Temperature Distribution", fontsize=14)
plt.tight_layout()
plt.show()

# 6.3 Precipitation
plt.figure(figsize=(10,5))
sns.histplot(df['Precipitation(in)'], bins=40, kde=True, color='seagreen')
plt.title("Precipitation Distribution", fontsize=14)
plt.tight_layout()
plt.show()


In [None]:
# ======================================================
# 💾 STEP 7: Save Cleaned Dataset
# ======================================================
df.to_csv("cleaned_sampled_US_Accidents.csv", index=False)
print("✅ Cleaned dataset saved as 'cleaned_sampled_US_Accidents.csv'")


✅ Conclusion
This study analyzed a representative sample of U.S. traffic accident data to uncover meaningful patterns related to time, weather, road, and environmental conditions. The key insights are as follows:

Time-of-Day and Week Trends: Accidents are most common during morning and evening rush hours, with Fridays showing the highest frequency. This reflects the influence of commuting and end-of-week traffic surges.

Weather Conditions: Clear weather was the most frequent condition recorded, but severe weather types like fog, rain, and snow were more closely associated with increased accident severity.

Monthly Trends: Accident occurrences showed seasonal variation, with notable peaks in the winter months — likely due to icy roads and limited visibility.

Environmental Impact:

Low visibility and extreme temperatures (both hot and cold) were common during accident events.

Precipitation, although often zero, contributed to higher risk during rainfall or snowy days.

Geographic Hotspots: Visualizations of geographic data highlighted urban clusters and highways as major accident-prone regions, supporting the need for targeted urban traffic policies.
