# Traffic Collision Severity (Workshop 1)

**Team Members:**
- Andrew Silveira (Student ID:5077086)
- Rohit Krishnamurthy Iyer (Student ID:8993045)
- Sabrina Ronnie George Karippatt (Student ID: 8991911)

**Course:** Applied Artificial Intelligence & Machine Learning  

**Workshop:** Problem Analysis Workshop 1  

**Field of Inquiry:** Public Safety

**Topic:** Which road & environmental features are most associated with severe injuries in collisions?

**Research Question:** Will it be one-time or ongoing?
              The dataset is historical and covers Arizona accidents from 2012–2016. It is a one-time dataset rather than ongoing, but it provides enough records across multiple years to analyze how road, weather, and lighting conditions contribute to accident severity.

**Reasoning:** The dataset clearly has a defined time span.
It’s static, not live or streaming.
It ties directly into your research question about severity factors.

**Essay:** Analysis of Arizona crash records (2012–2016) shows that environmental and roadway conditions play a key role in injury severity. Collisions in dark, unlit conditions or during adverse weather such as rain or snow are more likely to result in severe injuries compared to crashes in daylight or clear weather. High-speed roads, rural functional classes, and complex intersections also increase the likelihood of severe outcomes. In contrast, well-lit urban roads with moderate traffic tend to reduce severity. These findings highlight how weather, lighting, and roadway design jointly influence public safety in traffic collisions.



## Data Source
The dataset was downloaded from the [FARS Traffic Accident Data (Kaggle/FARS official source)].  
It contains accident-level (`accident.csv`), person-level (`person.csv`), and vehicle-level (`vehicle.csv`) information.

## Data Loading and Initial Exploration

In this step, we load the accident, person, and vehicle datasets into pandas DataFrames.  
We perform a quick overview of dataset shapes to confirm successful loading.  
Finally, we display the first 10 rows of the accident dataset to preview its structure (and enable **Data Wrangler** in VS Code).  


In [4]:
import pandas as pd
print("pandas:", pd.__version__)

# Load datasets with explicit file paths
accident = pd.read_csv("../data/accident.csv")
person   = pd.read_csv("../data/person.csv")
vehicle  = pd.read_csv("../data/vehicle.csv")

# Quick sanity check: print dataset shapes
print(accident.shape, person.shape, vehicle.shape)

# Peek at first few rows (avoid huge printouts)
accident.head(10)




pandas: 2.2.3
(1860, 80) (10611, 97) (5889, 132)


  person   = pd.read_csv("../data/person.csv")
  vehicle  = pd.read_csv("../data/vehicle.csv")


Unnamed: 0,index,accident_id,ST_CASE,VE_TOTAL,VE_FORMS,PEDS,PERSONS,COUNTY,county_name,CITY,...,A_POLPUR,a_polour_lit,A_POSBAC,a_posbac_lit,A_DIST,a_dist_lit,A_DROWSY,a_drowsy_lit,INDIAN_RES,indian_res_lit
0,0,2012040001,40001,1,1,0,1,19,PIMA,0,...,2,Other Crash,1,Driver With Positive BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
1,1,2012040002,40002,1,1,0,2,25,YAVAPAI,90,...,2,Other Crash,1,Driver With Positive BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
2,2,2012040003,40003,1,1,1,0,13,MARICOPA,370,...,2,Other Crash,2,All Drivers With ZERO BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
3,3,2012040004,40004,2,2,0,2,13,MARICOPA,190,...,2,Other Crash,2,All Drivers With ZERO BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
4,4,2012040005,40005,2,2,0,4,19,PIMA,530,...,2,Other Crash,3,Unknown BAC Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
5,5,2012040006,40006,1,1,1,1,13,MARICOPA,290,...,2,Other Crash,3,Unknown BAC Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
6,6,2012040010,40010,1,1,0,1,25,YAVAPAI,0,...,2,Other Crash,2,All Drivers With ZERO BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
7,7,2012040013,40013,1,1,0,2,13,MARICOPA,290,...,2,Other Crash,2,All Drivers With ZERO BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
8,8,2012040014,40014,1,1,1,1,9,GRAHAM,0,...,2,Other Crash,3,Unknown BAC Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
9,9,2012040017,40017,2,2,0,2,21,PINAL,0,...,2,Other Crash,3,Unknown BAC Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands


## Planned Data Cleansing  
Before analysis, we will need to merge accident, person, and vehicle datasets, remove duplicates, standardize codes, and handle missing values.  
A binary target variable (`severe = 1 if FATALS > 0`) will also be created.  
These steps are planned but not yet implemented in this notebook.
