# OKI Traffic Safety Data Pipeline (Python-first)

## Objective
Build an update-ready spatial dataset from ODOT traffic count station data and public boundary layers.
Outputs include:
- Clean station point layer with county assignment (and optional nearest-road info)
- County-level summary table (station count and AADT stats if available)
- QA report documenting missingness, coordinate validity, and join success

## Why this matters
Transportation planning and safety analysis require reliable spatial datasets. This notebook demonstrates:
- Data ingestion (direct downloads / provided datasets)
- Cleaning and validation (QA)
- Spatial joins and aggregation
- Reproducible outputs suitable for future updates


In [1]:
!pip -q install geopandas pyogrio shapely pandas matplotlib

In [2]:
from pathlib import Path
import re
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

# -----------------------
# Project root detection
# -----------------------
ROOT = Path.cwd()

if not (ROOT / "data").exists() and (ROOT / "oki-traffic-safety-arcgis" / "data").exists():
    ROOT = ROOT / "oki-traffic-safety-arcgis"

RAW_DIR = ROOT / "data" / "raw"
PROCESSED_DIR = ROOT / "data" / "processed"
MAPS_DIR = ROOT / "maps"

PROCESSED_DIR.mkdir(parents=True, exist_ok=True)
MAPS_DIR.mkdir(parents=True, exist_ok=True)

print("ROOT:", ROOT)
print("RAW_DIR exists:", RAW_DIR.exists())
print("Files in RAW_DIR:", len(list(RAW_DIR.glob("*"))))


ROOT: C:\Users\attafuro\Desktop\oki-traffic-safety-arcgis
RAW_DIR exists: True
Files in RAW_DIR: 99
