# Stage 16 — Lifecycle Review (Filled)

This extends the starter with a complete **Lifecycle Framework Guide**, a concise reflection,
and a repo-polish checklist aligned to the homework rubric. Everything is self-contained
in this notebook (no external files created).

## Lifecycle Framework Guide — Final

| Lifecycle Stage | What You Did | Challenges | Solutions / Decisions | Future Improvements |
|---|---|---|---|---|
| **1. Problem Framing & Scoping** | Clarified objective, success metric, constraints; wrote acceptance criteria. | Ambiguity on tolerable error and latency. | Single-sentence problem statement; explicit guardrails for latency/cost. | Align earlier on must-have vs nice-to-have; add decision log. |
| **2. Tooling Setup** | Python, scikit-learn, Matplotlib; notebooks for EDA. | Env/version drift. | Pinned requirements; recorded commands in README. | Add Dockerfile/lockfile; CI for env checks. |
| **3. Python Fundamentals** | Reusable helpers; clean functions. | Readability and perf trade-offs. | Docstrings/type hints; simple profiling. | Add unit tests, linting, pre-commit hooks. |
| **4. Data Acquisition / Ingestion** | Loaded CSV/synthetic fallback; documented schema. | Missing columns and schema changes. | Centralized ingestion; schema checks. | Automate fetch; add schema registry. |
| **5. Data Storage** | Organized `data/raw`, `data/processed`, `models`, `deliverables`. | Versioning & lineage. | Timestamped directories; recorded hashes. | Add lightweight DVC or registry. |
| **6. Data Preprocessing** | Imputation (mean/median), scaling when needed. | Skew/outliers affected imputers. | Compared mean vs median; robust defaults per feature. | Add robust scalers and feature-specific strategies. |
| **7. Outlier Analysis** | Residual checks, boxplots by segment. | Distinguishing noise vs events. | Flagged but retained informative outliers; documented. | Add automated flags (robust z-score, isolation forest). |
| **8. EDA** | Trends, dists, correlations; consistent axes. | Subgroup variance. | Standard plot templates; subgroup overlays. | Add interactive facets/tooltips. |
| **9. Feature Engineering** | Basic ratios/lags; polynomial trial. | Overfitting, collinearity. | Validate via held-out error; prune noisy terms. | Explore domain-driven features; regularization. |
| **10. Modeling (Reg/TS/Class)** | Linear baseline; compared polynomial; TimeSeriesSplit where relevant. | Simplicity vs flexibility. | Chose simplest model hitting acceptance criteria. | Try regularized or lightweight tree models. |
| **11. Evaluation & Risk Communication** | RMSE with **bootstrap 95% CI**; scenario comparisons with CI error bars; subgroup residuals. | Communicating uncertainty. | Standard CI visuals; stakeholder-friendly bullets. | Add calibration checks and parametric vs bootstrap CI. |
| **12. Results & Stakeholder Communication** | One-pager/MD report; executive summary; assumptions/risks. | Translating to decisions. | Headline takeaways + decision implications. | Add short deck; align visuals to brand. |
| **13. Productization** | Packaged model (`model.pkl`); Flask API (`POST /predict`, `/plot`); repo structure + README. | Contract clarity, maintainability. | Modular `src/`, schema examples, pinned deps. | Add auth/tests; containerize; model registry. |
| **14. Deployment & Monitoring** | 4-layer monitoring (**Data/Model/System/Business**); thresholds for PSI, AUC, p95 latency; runbook. | Threshold tuning & alerts. | Concrete routes to Slack/on-call; first-step freeze & snapshot. | Add dashboards & drills; automate retrain triggers. |
| **15. Orchestration & System Design** | Tasks: Ingest→Validate→Clean→Train→Evaluate→Package→Report; checkpoints & retries; DAG documented. | Dependencies & idempotency. | `.ok` checkpoints; content-hash artifacts; backoff retries. | Adopt orchestrator (Airflow/Prefect) with calendars/backfills. |
| **16. Lifecycle Review** | Consolidated decisions; repo aligned to lifecycle and rubric. | Consistency across docs/code. | Final pass: README, report, figures, framework guide, checklist. | CI/CD for checks; periodic audits; post-mortems. |

## Reflection (Short Answers)

- **Most difficult:** Balancing model simplicity with flexibility while keeping communication crisp.  
- **Most rewarding:** Using uncertainty (bootstrap CI) and scenario analysis to drive decisions.  
- **Stage connections:** Early scoping and schema choices limited features; monitoring thresholds informed deployment gates.  
- **Do differently next time:** Lock environment earlier, add tests/CI before modeling, agree on retrain triggers with business owners up front.  
- **Skills to strengthen:** Drift detection, calibration, and production-grade observability.

## Repo Polish Checklist (Aligned to Rubric)

- [ ] Clean structure: `data/`, `src/`, `notebooks/`, `models/`, `deliverables/`  
- [ ] README: lifecycle mapping, how to run, where deliverables live  
- [ ] Stakeholder summary: one-pager or Markdown report in `/deliverables/`  
- [ ] Figures/tables: exported PNG/CSV with clear names  
- [ ] Requirements pinned (`requirements.txt`) and/or Dockerfile  
- [ ] Reproducibility: seeds set; scripts or steps to regenerate artifacts  
- [ ] Monitoring & runbook: thresholds and steps captured (from Stage 14)  
- [ ] Orchestration: DAG + checkpoints/retries (from Stage 15)  
- [ ] Framework Guide complete (this section)  
- [ ] Final commit & push with clear message

In [None]:
# Optional: print a compact "next steps" to use as commit message notes
steps = [
    "Polish README (lifecycle mapping + run instructions)",
    "Verify deliverables live under /deliverables",
    "Re-run notebooks and export final figures/tables",
    "Confirm monitoring & orchestration docs exist (S14/S15)",
    "Commit and push"
]
print("Next Steps:")
for i, s in enumerate(steps, 1):
    print(f"{i}. {s}")