# 📘 ML Workflow Roadmap — v4


> **v4 Enhancements**  
> - Robust local CSV loader with fallback (`titanic.csv` or `train.csv`)  
> - EDA-first template with clear "What/Why" notes  
> - Version-agnostic metrics (manual RMSE), safe ROC plotting  
> - Target NaN handling (drop before split)  
> - "What we infer" summary cells at the end  
> - Reproducible `random_state=42`  



## End-to-End Steps (What / Why / When to Skip)

1. **Problem Definition** — Always.
2. **EDA** — Always (at least basic).
3. **Cleaning** — Almost always (validate even if clean).
4. **Feature Engineering** — Often; skip scaling for tree models.
5. **Split** — Always (train/test or CV).
6. **Model Selection** — Usually (baseline + advanced).
7. **Training & Tuning** — Tuning optional for teaching.
8. **Evaluation** — Always (choose task-appropriate metrics).
9. **Interpretation** — Encourage for trust/explainability.
10. **Deployment/Reporting** — Optional in class; essential in prod.



```mermaid
flowchart TD
    A[Problem Definition] --> B[EDA]
    B --> C[Data Cleaning]
    C --> D[Feature Engineering]
    D --> E[Train/Test Split]
    E --> F[Model Selection]
    F --> G[Model Training]
    G --> H[Model Evaluation]
    H --> I[Interpretation]
    I --> J[Deployment / Reporting]

    B:::always
    E:::always
    H:::always

    classDef always fill:#fdd,stroke:#333,stroke-width:1px;
```
