<p align="center">
  <img src="Images/US_Traffic_Accident.jpg" width="900">
</p>



# US Traffic Accidents Analysis and Insights


## Overview

This project analyzes U.S. traffic data to identify key patterns related to time, location, weather, and road conditions. The analysis reveals that accidents are more frequent during 
rush hours, under adverse weather conditions, and in high-density urban areas. These insights highlight opportunities for improving traffic safety through targeted interventions and data
driven decision making.

## Business Understanding

### Business Context
Traffic accidents are a major public safety issue in the United States, leading to injuries,
fatalities, and economic losses. The dataset used in this project provides information on accident severity, location, time, and environmetal conditions, making it suitable for analyzing
patterns and risk factors associated with traffic accidents.

### Analytical Questions
This project aims to answer the following questions:
1. when to traffic accidents occur most frequently?
2. where are accidents most concentrated geographically?
3. How do factors such as time, weather, and road conditions relate to accident severity?
4. What patterns can be identified that may help reduce accident risks?

### Business Goals
The goalof this analysis is to generate insights that support improved road safety and data-
driven decision-making. Finding from this project can help identify high-risk conditions and 
inform preventive strategies.

### Stakeholders
1. Transportation and traffic safety agencies.
2. City and state policymakers.
3. Urban planners and engineer.
4. Insurance and risk assessment teams.

## Data Understanding

In this step, the dataset is loaded and examined to understand its structure, Key variable,and 
basic statistics. The data is reviewed to identify important features related to traffic accidents and to assess data quality issues such as missing values and potential inconsistencies before further analysis.


In [1]:
# Load relevant imports here
import pandas as pd
df = pd.read_csv("Data/acc_20.csv")
df.head()
df.shape
df.info()
df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54745 entries, 0 to 54744
Data columns (total 80 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   CASENUM         54745 non-null  int64  
 1   STRATUM         54745 non-null  int64  
 2   STRATUMNAME     54745 non-null  object 
 3   REGION          54745 non-null  int64  
 4   REGIONNAME      54745 non-null  object 
 5   PSU             54745 non-null  int64  
 6   PJ              54745 non-null  int64  
 7   PSU_VAR         54745 non-null  int64  
 8   URBANICITY      54745 non-null  int64  
 9   URBANICITYNAME  54745 non-null  object 
 10  VE_TOTAL        54745 non-null  int64  
 11  VE_FORMS        54745 non-null  int64  
 12  PVH_INVL        54745 non-null  int64  
 13  PEDS            54745 non-null  int64  
 14  PERMVIT         54745 non-null  int64  
 15  PERNOTMVIT      54745 non-null  int64  
 16  NUM_INJ         54745 non-null  int64  
 17  NUM_INJNAME     54745 non-null 

Unnamed: 0,CASENUM,STRATUM,REGION,PSU,PJ,PSU_VAR,URBANICITY,VE_TOTAL,VE_FORMS,PVH_INVL,...,MANCOL_IM,RELJCT1_IM,RELJCT2_IM,LGTCON_IM,WEATHR_IM,MAXSEV_IM,NO_INJ_IM,ALCHL_IM,PSUSTRAT,WEIGHT
count,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0,...,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0,54745.0
mean,202002600000.0,7.050726,2.73815,48.196803,2325.352087,52.35121,1.252498,1.789734,1.730167,0.059567,...,2.587414,0.066508,2.82497,1.647493,2.410266,1.06638,0.815161,1.922751,12.628021,95.914463
std,215186.6,2.596944,0.879934,19.900415,1666.926528,32.705451,0.434449,0.655057,0.641039,0.296503,...,2.92049,0.249171,3.541586,1.062263,3.091939,1.180133,0.979352,0.266989,6.204774,66.419435
min,202002100000.0,2.0,1.0,10.0,45.0,10.0,1.0,1.0,1.0,0.0,...,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,8.230572
25%,202002500000.0,5.0,2.0,32.0,573.0,32.0,1.0,1.0,1.0,0.0,...,0.0,0.0,1.0,1.0,1.0,0.0,0.0,2.0,9.0,32.376107
50%,202002600000.0,8.0,3.0,48.0,1800.0,48.0,1.0,2.0,2.0,0.0,...,1.0,0.0,2.0,1.0,1.0,1.0,1.0,2.0,13.0,95.296227
75%,202002800000.0,9.0,3.0,65.0,4142.0,67.0,2.0,2.0,2.0,0.0,...,6.0,0.0,3.0,2.0,2.0,2.0,1.0,2.0,16.0,162.82931
max,202003000000.0,10.0,4.0,83.0,4153.0,214.0,2.0,15.0,15.0,6.0,...,11.0,1.0,20.0,7.0,12.0,8.0,14.0,2.0,25.0,483.948635


In [2]:
#To assess data quality, missing values and potential inconsistencies 
df.isnull().sum().sort_values(ascending=False)

WRK_ZONENAME     53750
CASENUM              0
STRATUMNAME          0
STRATUM              0
REGIONNAME           0
                 ...  
NO_INJ_IMNAME        0
ALCHL_IM             0
ALCHL_IMNAME         0
PSUSTRAT             0
WEIGHT               0
Length: 80, dtype: int64

The missing value analysis shows that most variable have no missing values.One variable (WRK_ZONENAME) contains a large number of missing entries, which may be due to incomplete reporting or the varible not being applicable to all accidents. This column will be considered for exclusion or further investigation during data preparation.

## Data Preparation
Text here

## Analysis

Text here

## Evaluation

### Business Insight/Recommendation 1

### Business Insight/Recommendation 2

### Business Insight/Recommendation 3

### Tableau Dashboard link

## Conclusion and Next Steps
Text here