<a href="https://colab.research.google.com/github/brycenmillette/ML_Final/blob/main/ML_Final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We will be training the model to make predictions based on correlations between many different conditions on when it is safe to drive/how likely an accident will occur. Examples would include weather conditions, time of day, length of trip etc. We will be using the dataset of Chicago car crashes from 2013-2023: https://www.kaggle.com/datasets/nathaniellybrand/chicago-car-crash-dataset/data



# Section 1: Dataset and Problem
## 1.1 Learning Problem Description
- Problem Statement: Predicting accident likelihood based on environmental and temporal factors
- Type: Binary classification (High Risk/Low Risk) or Regression (Risk Score)
- Input Features: Weather conditions, time variables, location metadata, trip characteristics
- Target: Accident probability/risk score

## 1.2 Data Loading & Initial Exploration
- Import necessary libraries (pandas, numpy, matplotlib, seaborn)
- Load Chicago Car Crash Dataset (2013-2023)
- Display basic dataset info: shape, columns, data types
- Check for missing values and duplicates

## 1.3 Data Cleaning & Feature Engineering
- Handle missing values (imputation or removal)
- Feature extraction from timestamps: hour, day of week, month, season
- Weather condition encoding (if available)
- Location feature engineering (neighborhood, street type)
- Create target variable (e.g., severity-based risk score)

## 1.4 Exploratory Data Analysis (EDA)
- Visualize accident distribution by time variables
- Analyze weather impact on accident frequency
- Location hotspots mapping
- Correlation analysis between features
- Class imbalance check (if classification)

Section 1.1

# Section 2: Model Development & Training
## 2.1 Data Preprocessing for ML
- Train-test-validation split (70-15-15 recommended)
- Feature scaling/normalization (StandardScaler, MinMaxScaler)
- Categorical variable encoding (OneHot, Label encoding)
- Handling class imbalance (SMOTE, class weights)

## 2.2 Baseline Model Implementation
- Implement traditional ML models as baselines:
  * Logistic Regression (for classification)
  * Random Forest
  * Gradient Boosting (XGBoost/LightGBM)
- Establish performance metrics: Accuracy, Precision, Recall, F1, ROC-AUC

## 2.3 Neural Network Architecture Design
- Design 2-3 neural network architectures:
  * Simple Feedforward Network (baseline NN)
  * More complex architecture with dropout/batch normalization
  * Optional: Hybrid model (CNN for spatial features if using location grids)
- Model size considerations (keep under 100K parameters)
- Activation functions, loss functions, optimizers

## 2.4 Training Pipeline
- Training loops with validation monitoring
- Early stopping implementation
- Learning rate scheduling
- Cross-validation setup (time-series aware if needed)
- Hyperparameter tuning (grid/random search)




# Section 3: Model Evaluation & Deployment
### 3.1 Model Evaluation & Comparison
### 3.2 Deployment Implementation
### 3.3 Demo Application
### 3.4 Presentation Materials