# Exploratory Data Analysis (EDA)
## Step 1: Load Cleaned Data
- Load the cleaned sampled US Accidents dataset (~9,995 rows).
- Confirm the dataset structure and key columns.

In [1]:
import pandas as pd

# Load cleaned dataset
df = pd.read_csv('../data/usa_traffic_accidents_cleaned.csv')
print("Dataset Info:")
print(df.info())
print("\nFirst 5 rows:")
print(df.head())

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8812 entries, 0 to 8811
Data columns (total 51 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   ID                     8812 non-null   object 
 1   Source                 8812 non-null   object 
 2   Severity               8812 non-null   int64  
 3   Start_Time             8812 non-null   object 
 4   End_Time               8812 non-null   object 
 5   Start_Lat              8812 non-null   float64
 6   Start_Lng              8812 non-null   float64
 7   End_Lat                4530 non-null   float64
 8   End_Lng                4530 non-null   float64
 9   Distance(mi)           8812 non-null   float64
 10  Description            8812 non-null   object 
 11  Street                 8799 non-null   object 
 12  City                   8812 non-null   object 
 13  County                 8812 non-null   object 
 14  State                  8812 non-null   obj

## Step 2: Analyze Patterns
- Analyze accidents by time (hour, day of week, month), weather, road conditions (inferred), and lighting.

In [2]:
# Accidents by Hour
print("Accidents by Hour:")
print(df['Hour'].value_counts().sort_index())

Accidents by Hour:
Hour
0.0     126
1.0      91
2.0     105
3.0      91
4.0     179
5.0     279
6.0     472
7.0     669
8.0     675
9.0     394
10.0    396
11.0    405
12.0    420
13.0    415
14.0    504
15.0    594
16.0    686
17.0    646
18.0    492
19.0    358
20.0    262
21.0    213
22.0    181
23.0    159
Name: count, dtype: int64


In [3]:
# Accidents by Day of Week
print("Accidents by Day of Week:")
print(df['Day_of_Week'].value_counts())

Accidents by Day of Week:
Day_of_Week
Friday       1609
Wednesday    1558
Thursday     1497
Tuesday      1452
Monday       1345
Saturday      713
Sunday        638
Name: count, dtype: int64


In [4]:
# Accidents by Month
print("Accidents by Month:")
print(df['Month'].value_counts())

Accidents by Month:
Month
December     949
November     839
January      795
October      782
September    781
February     752
August       724
April        684
June         659
March        618
July         615
May          614
Name: count, dtype: int64


In [5]:
# Accidents by Weather
print("Accidents by Weather:")
print(df['Weather_Condition'].value_counts())

Accidents by Weather:
Weather_Condition
Fair                            2851
Mostly Cloudy                   1217
Clear                            998
Cloudy                           953
Partly Cloudy                    842
Overcast                         471
Light Rain                       394
Scattered Clouds                 272
Light Snow                       138
Fog                              111
Rain                             103
Haze                              98
Heavy Rain                        37
Fair / Windy                      33
Mostly Cloudy / Windy             23
Light Drizzle                     23
T-Storm                           21
Wintry Mix                        20
Thunder in the Vicinity           18
Light Rain with Thunder           16
Cloudy / Windy                    16
Thunder                           14
Smoke                             12
Partly Cloudy / Windy             11
Snow                              11
Heavy T-Storm                     1

In [6]:
# Accidents by Road Condition (inferred from Weather_Condition and Precipitation(in))
print("Accidents by Road Condition:")
print(df['Road_Condition'].value_counts())

Accidents by Road Condition:
Road_Condition
Dry        6866
Unknown    1036
Wet         910
Name: count, dtype: int64


In [7]:
# Accidents by Lighting
print("Accidents by Lighting:")
print(df['Light_Condition'].value_counts())

Accidents by Lighting:
Light_Condition
Daylight    6090
Dark        2701
Unknown       21
Name: count, dtype: int64


## Step 3: Preliminary Observations
- **Time Patterns**: Look for peaks in hours (e.g., rush hours), days (e.g., weekdays vs. weekends), and months (e.g., seasonal trends).
- **Weather**: Identify dominant weather conditions (e.g., Clear, Rain).
- **Road Conditions**: Check distribution of inferred road conditions (Dry, Wet, Unknown).
- **Lighting**: Compare accidents in daylight vs. dark conditions.