# Stage 16 Homework Starter

This notebook is a starting point for polishing your final repo and lifecycle mapping.

## Checklist Template
 - Add checklist elements, as in the examples below, to make sure you cover everything you would like to accomplish
- Update this checklist as you finalize your repo.

In [8]:
# Stage 16 — Repo checklist helpers (auto-check + render)
import os, re, json, subprocess
from pathlib import Path
import pandas as pd

ROOT = Path(".").resolve()

def file_exists(*relpath) -> bool:
    return (ROOT.joinpath(*relpath)).exists()

def has_readme() -> bool:
    return any(p.name.lower()=="readme.md" for p in ROOT.iterdir())

def has_lifecycle_map() -> bool:
    # accept lifecycle_map.md or docs/lifecycle_map.md
    for p in [ROOT, ROOT/"docs", ROOT/"handoff"]:
        f = p/"lifecycle_map.md"
        if f.exists():
            return True
    return False

def has_summary_doc() -> bool:
    for name in ["summary.md", "stakeholder_summary.md", "reflection.md"]:
        for p in [ROOT, ROOT/"handoff", ROOT/"reports", ROOT/"docs"]:
            if (p/name).exists():
                return True
    return False

def has_framework_table() -> bool:
    # look for a markdown with a header row that includes Data/Model/System/Business
    candidates = list(ROOT.rglob("*.md"))
    for c in candidates:
        try:
            txt = c.read_text(encoding="utf-8", errors="ignore")
        except Exception:
            continue
        if re.search(r"\|?\s*Data\s*\|\s*Model\s*\|\s*System\s*\|\s*Business\s*\|", txt, flags=re.I):
            return True
    return False

def repo_is_clean() -> bool:
    # simple heuristics:
    # 1) .gitignore exists
    # 2) no large files (>50MB) outside data/ or reports/
    # 3) no obvious secrets in env files committed
    ok = file_exists(".gitignore")
    large = []
    for p in ROOT.rglob("*"):
        if not p.is_file(): 
            continue
        # skip typical dirs
        if any(str(p).startswith(str(ROOT/d)) for d in ["data", "reports", ".git", ".ipynb_checkpoints"]):
            continue
        try:
            if p.stat().st_size > 50*1024*1024:
                large.append(str(p))
        except Exception:
            pass
    no_large = (len(large) == 0)
    # quick secret scan
    secret_hits = []
    for p in ROOT.rglob("*"):
        if not p.is_file():
            continue
        if p.suffix.lower() in {".png",".jpg",".jpeg",".gif",".pdf",".zip",".gz",".pkl",".joblib"}:
            continue
        try:
            txt = p.read_text(encoding="utf-8", errors="ignore")
        except Exception:
            continue
        if re.search(r"(api[_-]?key|secret|aws_access_key_id|aws_secret_access_key)=", txt, flags=re.I):
            secret_hits.append(str(p))
            break
    no_secrets = (len(secret_hits) == 0)
    return ok and no_large and no_secrets

def repo_complete() -> bool:
    # minimal: notebooks/, src/, model/ or app.py, requirements.txt, README.md
    must = [
        has_readme(),
        file_exists("requirements.txt"),
    ]
    has_any_model_artifacts = file_exists("model") or file_exists("app.py") or file_exists("dashboard.py")
    has_src_or_nb = file_exists("src") or file_exists("notebooks")
    must.extend([has_any_model_artifacts, has_src_or_nb])
    return all(must)

checklist = {
    "repo_clean": ROOT}

## Reflection Prompts
- What stage of the lifecycle was hardest for you, and why?
- Which part of your repo is most reusable in a future project?
- If a teammate had to pick up your repo tomorrow, what would help them most?

### Reflection

**1) Which stage of the lifecycle was hardest for me, and why?**  
The hardest stage was **deployment and monitoring**. Training the model and cleaning data were straightforward, but exposing the model as a reliable service required careful design of error handling, validation, and monitoring. It was challenging to make the API both robust and easy for others to consume, while also planning for long-term risks like data drift and model decay.

**2) Which part of my repo is most reusable in a future project?**  
The most reusable part is the **core utilities in `src/`**: the data I/O helpers, preprocessing functions, and the `save_model/load_model` persistence pattern. In addition, the Flask API template (`/predict`, `/meta`, `/plot`) is generic enough to be adapted quickly to other models by only changing the feature names and evaluation metrics. These components can save significant setup time for future projects.

**3) If a teammate had to pick up my repo tomorrow, what would help them most?**  
Clear documentation is the most helpful asset. The `README.md` explains setup and usage, while `docs/lifecycle_map.md` shows the full pipeline from ingest to monitoring. The `docs/framework_guide_table.md` outlines what to monitor and who owns each step. Together with sample requests and screenshots in `reports/`, a new teammate can understand how to run, test, and extend the project without guesswork. This reduces handoff friction and ensures continuity.
