# Traffic Collision Severity (Workshop 1)

**Team Members:**
- Andrew Silveria (Student ID: )
- Rohit Krishnamurthy Iyer (Student ID: )
- Sabrina Ronnie George Karippatt (Student ID: 8991911)

**Course:** Applied Artificial Intelligence & Machine Learning  
**Workshop:** Problem Analysis Workshop 1  

import sys, pandas as pd, numpy as np, matplotlib, sklearn

print("Python:", sys.version)
print("pandas:", pd.__version__, "| numpy:", np.__version__, "| matplotlib:", matplotlib.__version__, "| sklearn:", sklearn.__version__)

## Data Source
The dataset was downloaded from the [FARS Traffic Accident Data (Kaggle/FARS official source)].  
It contains accident-level (`accident.csv`), person-level (`person.csv`), and vehicle-level (`vehicle.csv`) information.

## Data Loading and Initial Exploration

In this step, we load the accident, person, and vehicle datasets into pandas DataFrames.  
We perform a quick overview of dataset shapes, column names, and selected variables (`FATALS`, `WEATHER`, `LGT_COND`, `ROAD_FNC`).  
Finally, we display the first 50 rows of the accident dataset to preview the structure and enable **Data Wrangler** in VS Code.  


In [3]:
"""
Data Loading and Initial Exploration
-----------------------------------
This cell:
1. Loads accident, person, and vehicle datasets into pandas DataFrames.
2. Prints shapes and initial columns for overview.
3. Summarizes selected variables relevant to research question:
   - FATALS (target variable)
   - WEATHER
   - LGT_COND (light condition)
   - ROAD_FNC (road function)
4. Displays the first 50 rows of accident dataset for inspection and Data Wrangler.
"""

import pandas as pd
from pathlib import Path

# Define data directory
DATA_DIR = Path("..") / "data"

# Load datasets
accident = pd.read_csv(DATA_DIR / "accident.csv")
person   = pd.read_csv(DATA_DIR / "person.csv")
vehicle  = pd.read_csv(DATA_DIR / "vehicle.csv")

# Quick overview: dataset shapes and sample columns
print("Shapes -> accident:", accident.shape,
      "| person:", person.shape,
      "| vehicle:", vehicle.shape)

print("\nColumns in accident table:", accident.columns[:10].tolist(), "...")

# Summarize target + key predictors
summary = {
    "Fatalities": accident["FATALS"].value_counts(),
    "Weather": accident["WEATHER"].value_counts().head(),
    "Light Condition": accident["LGT_COND"].value_counts().head(),
    "Road Function": accident["ROAD_FNC"].value_counts().head()
}

for key, val in summary.items():
    print(f"\n--- {key} ---")
    print(val)

# Display sample rows (triggers Data Wrangler in VS Code)
accident.head(50)



Shapes -> accident: (1860, 80) | person: (10611, 97) | vehicle: (5889, 132)

Columns in accident table: ['index', 'accident_id', 'ST_CASE', 'VE_TOTAL', 'VE_FORMS', 'PEDS', 'PERSONS', 'COUNTY', 'county_name', 'CITY'] ...

--- Fatalities ---
FATALS
1    1726
2     103
3      20
4       8
5       3
Name: count, dtype: int64

--- Weather ---
WEATHER
1     1305
99     221
10     167
98     120
2       40
Name: count, dtype: int64

--- Light Condition ---
LGT_COND
1    815
2    336
3    335
9    130
6    128
Name: count, dtype: int64

--- Road Function ---
ROAD_FNC
14.0    315
4.0     190
6.0     161
16.0    148
15.0    133
Name: count, dtype: int64


  person   = pd.read_csv(DATA_DIR / "person.csv")
  vehicle  = pd.read_csv(DATA_DIR / "vehicle.csv")


Unnamed: 0,index,accident_id,ST_CASE,VE_TOTAL,VE_FORMS,PEDS,PERSONS,COUNTY,county_name,CITY,...,A_POLPUR,a_polour_lit,A_POSBAC,a_posbac_lit,A_DIST,a_dist_lit,A_DROWSY,a_drowsy_lit,INDIAN_RES,indian_res_lit
0,0,2012040001,40001,1,1,0,1,19,PIMA,0,...,2,Other Crash,1,Driver With Positive BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
1,1,2012040002,40002,1,1,0,2,25,YAVAPAI,90,...,2,Other Crash,1,Driver With Positive BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
2,2,2012040003,40003,1,1,1,0,13,MARICOPA,370,...,2,Other Crash,2,All Drivers With ZERO BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
3,3,2012040004,40004,2,2,0,2,13,MARICOPA,190,...,2,Other Crash,2,All Drivers With ZERO BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
4,4,2012040005,40005,2,2,0,4,19,PIMA,530,...,2,Other Crash,3,Unknown BAC Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
5,5,2012040006,40006,1,1,1,1,13,MARICOPA,290,...,2,Other Crash,3,Unknown BAC Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
6,6,2012040010,40010,1,1,0,1,25,YAVAPAI,0,...,2,Other Crash,2,All Drivers With ZERO BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
7,7,2012040013,40013,1,1,0,2,13,MARICOPA,290,...,2,Other Crash,2,All Drivers With ZERO BAC Testing Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
8,8,2012040014,40014,1,1,1,1,9,GRAHAM,0,...,2,Other Crash,3,Unknown BAC Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
9,9,2012040017,40017,2,2,0,2,21,PINAL,0,...,2,Other Crash,3,Unknown BAC Crash,2,Other Crash,2,Other Crash,0,Not Tribal Lands
