# 🧪 Notebooks Directory – Pipe Break Prediction Project

This folder contains all Jupyter notebooks related to the development of the water main break prediction pipeline. Each notebook focuses on a specific phase of the project, from raw data ingestion to final model visualization.

## 📁 Notebook Index

| Notebook | Description |
|----------|-------------|
| `00_data_ingestion.ipynb` | Load and inspect raw water main break datasets (2004–2019, 2021, etc.) from Syracuse Open Data Portal. |
| `01_data_preprocessing.ipynb` | Standardize schema, merge datasets, and perform initial cleanup (deduplication, format alignment). |
| `02_data_cleaning.ipynb` | Handle missing data, convert datatypes, extract temporal features, and generate engineered variables (e.g., pipe age). |
| `03_exploratory_analysis.ipynb` | Analyze break trends, distributions, and spatial/temporal correlations. Generate plots and geospatial maps. |
| `04_model_training.ipynb` | Train baseline classification models (e.g., Random Forest). Evaluate with accuracy, precision, recall, etc. |
| `05_model_tuning.ipynb` | Optimize models using hyperparameter search (GridSearchCV, RandomizedSearchCV). |
| `06_results_visualization.ipynb` | Visualize model outputs: confusion matrix, feature importance, predicted break probabilities. |
| `07_dashboard_prep.ipynb` | Prepare final cleaned datasets and export model files for Flask dashboard integration. |

## 📤 Export Logic

| Step | Output File | Used In |
|------|-------------|---------|
| Cleaning → Modeling | `cleaned_breaks.csv` | Model training |
| Modeling → Dashboard | `random_forest_model.pkl` | Flask dashboard |
| Visualization | `feature_importance.png`, `risk_map.html` | Dashboard or whitepaper/blog |

## 🔖 Notes

- Each notebook is self-contained but designed to be run **in order**.
- Exported files are saved in `../data/clean/`, `../models/`, or `../dashboard/` for downstream use.
- Archived or exploratory notebooks are stored in `notebooks/archive/`.

---

Let me know if you'd like a Quarto-compatible version of this, or an automated table of contents generated via Python!
