# NYC Motor Vehicle Collisions — Time of Day Analysis (Preliminary Project Update)

**Authors:**
Adan Valadez
Liam O'Herlihy
Tasneem Khokha

**Goal:** Investigate whether *time of day* correlates with crash frequency and severity in NYC.
This analysis-focused notebook includes:
- dataset acquisition (NYC Open Data)
- clear problem statement and target measures
- feature selection (temporal & contextual features)
- preprocessing and cleaning steps
- exploratory data analysis and visualizations focused on time-of-day patterns
- preliminary aggregation and statistical checks 

In [2]:
# Cell 1 — Setup: imports and download parameters
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

sns.set(style='whitegrid')

SAMPLE_LIMIT = 50000  # set to None to show full dataset
DATA_URL = "https://data.cityofnewyork.us/resource/h9gi-nx95.csv"

print('Notebook ready. SAMPLE_LIMIT =', SAMPLE_LIMIT)

Notebook ready. SAMPLE_LIMIT = 50000


## Problem statement

We will analyze whether **time of day** correlates with the **frequency** and **severity** of motor vehicle collisions in NYC.  
Specifically:

- Primary question (descriptive / inferential): *How does crash frequency vary by hour of the day and day of the week?*  
- Secondary question (severity): *Do crashes occurring at certain hours have higher injury or fatality rates?*  

**Target measures used in this analysis:**

- Crash **count** aggregated by hour / weekday (main descriptive target)
- Injury rate (mean number of persons injured per crash) by hour
- Fatality rate (proportion of crashes with >=1 person killed) by hour

In [4]:
# Cell 2 — Download / Load the data 


In [6]:
# Cell 3 -  Preprocessing & feature engineering (temporal focus)

In [7]:
# Cell 4 Exploratory plots: crash frequency by hour and day of week

In [None]:
# Cell 5 — Injury and fatality rates by hour

In [None]:
# Cell 6 — Simple statistical check: are late-night hours riskier?