# Notebook 08 — Calibration, Blending & Production Export

## Purpose
This notebook refines the deterministic school-matching system validated in
Notebook 07 to support **safe launch** and **future evolution**.

It serves a **dual mandate**:

- **Startup mandate**: Improve trust, stability, performance, and deployability
  before real users exist.
- **Capstone mandate**: Demonstrate advanced analytical rigor using unsupervised
  methods and robustness analysis — without introducing opaque models.

Learning is used strictly as a **diagnostic and advisory tool**, never as the
authority over ranking logic.

---

## 00. Scope & Dual Mandate

- Relationship to Notebook 07 (validated deterministic segments)
- Why calibration is required before launch
- Guardrails:
  - Determinism remains the baseline
  - Tier logic is never overridden
  - No outcome prediction or click-based learning
- Definition of “learning as consultant, not boss”

---

## 01. Failure Mode Analysis (The “Bugs”)

Identify behaviors that would cause **user confusion or loss of trust**:

- Tie density and large tie groups
- Rank volatility under small weight changes
- Unexpected dominance or suppression of tier tags
- Segment-specific instability

Startup goal:
- Find fragile or confusing behavior before users do.

Artifacts:
- Tie density metrics
- Rank stability diagnostics
- Volatility flags by segment

---

## 02. Resolution Strategies (The “Fixes”)

Apply **manual, deterministic adjustments** informed by Section 01:

- Weight rebalancing
- Segment-specific tie breakers
- Feature scaling or clipping
- Soft constraint tuning

No learning is used here.

Startup goal:
- Improve stability and clarity using explicit design decisions.

---

## 03. Statistical Refinement (The “Capstone Science”)

Apply unsupervised analysis to **improve information density**:

- Feature correlation analysis
- Redundancy detection
- PCA / variance contribution (diagnostic only)

Learning outputs:
- Feature redundancy warnings
- Suggested weight budget reallocation
- Candidate features for removal or down-weighting

Capstone goal:
- Demonstrate principled feature engineering.

Startup goal:
- Simplify the matrix for faster computation and clearer rankings.

---

## 04. Segment Blending (The “Killer Feature”)

Enable continuous personalization via **linear segment blending**:

\[
\vec{V}_{final} = \alpha \vec{V}_{A} + (1 - \alpha) \vec{V}_{B}
\]

- Implement `blend_segments(seg_a, w_a, seg_b, w_b)`
- Validate blended rankings
- Visualize Top-K changes as blending weight shifts

Startup goal:
- Power intuitive UI controls (sliders, toggles)
- Generate infinite personas from a small base

No learning required.

---

## 05. Evaluation & Safety Regression

Ensure refinements and blending remain safe:

- Tier dominance checks (e.g., IB floors)
- Grade-span enforcement
- Regression against v0 rankings
- Explainability consistency checks

All changes must pass guardrails before proceeding.

---

## 06. Production Artifact Generation

Generate deployable, versioned assets for launch:

- Precompute Top-K rankings for each segment
- Generate:
  - `schools_top100_v1.json`
  - accompanying metadata (segment version, feature hash, timestamp)
- Include scores and explanation strings

Startup goal:
- Enable 0-latency initial page load
- Keep serving logic simple and reliable

---

## 07. Summary & Forward Roadmap

- What calibration and blending add
- What remains deterministic by design
- How learning will be introduced safely post-launch
- Transition to real user data (future work)

---

> Determinism builds trust.  
> Learning improves structure.  
> Control preserves safety.


# 00. Scope & Dual Mandate  

## Relationship to Notebook 07
Notebook 07 validated that **Preference Segments v0** (deterministic personas) can sit on top of the **Scoring Engine v2** to produce coherent, explainable rankings — **without ML**.

Notebook 08 builds on that baseline and prepares the system for:
1) **Safe launch** (stable, predictable, shippable outputs)  
2) **Capstone rigor** (diagnostics + robustness analysis using unsupervised tools)

---

## Why calibration is required before launch
Even with a correct deterministic ranking, users can lose trust if the system feels:

- **Random** (rank volatility under small changes)
- **Unclear** (large tie groups and unclear ordering)
- **Inconsistent** (segment-specific surprises)
- **Overfit to configs** (feature redundancy causing accidental dominance)

Notebook 08 treats these as *launch-blocking failure modes* to detect and correct **before users exist**.

---

## Guardrails (Non-negotiables)
This notebook must preserve:

- **Determinism remains the baseline**
  - Same inputs + same config → same output
- **Tier logic is never overridden**
  - Tier membership is a rule, not a suggestion
- **No outcome prediction**
  - We are not predicting enrollment/clicks/satisfaction
- **No click-based learning**
  - No personalization based on behavior signals (future phase only)
- **Learning as consultant, not boss**
  - Unsupervised methods may *diagnose* issues and *suggest* fixes  
  - Final ranking logic is still explicit and human-auditable

---

## Definition: “Learning as consultant, not boss”
**Consultant:** identifies patterns like redundancy, high correlation, instability, brittle weights  
**Boss:** directly changes ranking by training a model to decide the score

Notebook 08 uses learning only in the *consultant* role.

---

## Outputs of Notebook 08
Launch deliverables:
- Calibrated config(s)
- Precomputed Top-K artifacts per segment (versioned JSON)
- Regression & safety checks

Capstone deliverables:
- Failure mode diagnostics (tie density, volatility)
- Unsupervised redundancy analysis (correlation/PCA as diagnostic)
- Documented rationale for refinements


In [151]:
# 00.1 Notebook Contract + Paths  

from __future__ import annotations

import os
import json
import hashlib
from dataclasses import dataclass, asdict
from datetime import datetime, timezone
from pathlib import Path

# --- Notebook versioning (update when you materially change logic)
NOTEBOOK_ID = "notebook08"
SYSTEM_VERSION = "v1"            # overall launch bundle version (bump when exporting new artifacts)
SEGMENTS_VERSION = "v0"          # from Notebook 07 (validated deterministic segments)
SCORING_ENGINE_VERSION = "v2"    # from Notebook 06/07

RUN_TS = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")

# --- Project paths (adjust if your repo layout differs)
ROOT = Path("..").resolve()  # typical if notebooks/ is one level below repo root
DATA_DIR = ROOT / "data"
PROCESSED_DIR = DATA_DIR / "processed"
CONFIG = ROOT / "config"
REPORTS_DIR = ROOT / "reports"
ARTIFACTS_DIR = ROOT / "artifacts" / NOTEBOOK_ID

for p in [REPORTS_DIR, ARTIFACTS_DIR]:
    p.mkdir(parents=True, exist_ok=True)

print("ROOT:", ROOT)
print("PROCESSED_DIR:", PROCESSED_DIR)
print("REPORTS_DIR:", REPORTS_DIR)
print("ARTIFACTS_DIR:", ARTIFACTS_DIR)
print("RUN_TS:", RUN_TS)

# 00.2 Guardrails + Run Manifest 

@dataclass(frozen=True)
class Guardrails:
    deterministic_baseline: bool = True
    never_override_tier_logic: bool = True
    no_outcome_prediction: bool = True
    no_click_learning: bool = True
    learning_is_consultant_only: bool = True

@dataclass
class RunManifest:
    notebook_id: str
    run_ts_utc: str
    system_version: str
    scoring_engine_version: str
    segments_version: str
    guardrails: Guardrails
    inputs: dict
    outputs: dict

GUARDRAILS = Guardrails()

# Fill these in as you load concrete files in later sections
manifest = RunManifest(
    notebook_id=NOTEBOOK_ID,
    run_ts_utc=RUN_TS,
    system_version=SYSTEM_VERSION,
    scoring_engine_version=SCORING_ENGINE_VERSION,
    segments_version=SEGMENTS_VERSION,
    guardrails=GUARDRAILS,
    inputs={},
    outputs={},
)

# Save manifest early (gets updated later)
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
with open(manifest_path, "w") as f:
    json.dump(asdict(manifest), f, indent=2)

print("Saved:", manifest_path)


ROOT: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school
PROCESSED_DIR: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed
REPORTS_DIR: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports
ARTIFACTS_DIR: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08
RUN_TS: 2025-12-31T16:46:27Z
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


# 01. Failure Mode Analysis (The “Bugs”) 

This section identifies **ranking behaviors that would confuse users or erode trust** —
even when the underlying logic is correct.

The goal is *not* to optimize scores, but to **surface fragility** in the deterministic
system *before* real users experience it.

---

## Why Failure Mode Analysis Matters

A deterministic system can still feel broken if users observe:

- Many schools tied at the same rank
- Small config changes causing large rank shifts
- Segment-specific inconsistencies
- Expected “tier signals” appearing muted or overwhelming

These behaviors create a perception of randomness or bias.

**Notebook 08 treats these as launch-blocking bugs.**

---

## Failure Modes We Actively Test

### 1️ Tie Density
Large groups of schools sharing the same score or rank reduce:
- Trust (“why can’t the system decide?”)
- Usability (long, unordered lists)
- Explainability (no clear differentiator)

We quantify:
- Tie group size distribution
- % of Top-K results involved in ties
- Tie density by preference segment

---

### 2️ Rank Volatility (Config Sensitivity)
If a tiny weight change causes large rank movement, the system is brittle.

We measure:
- Rank delta under small, controlled perturbations
- Stability of Top-K membership
- Volatility by segment

High volatility signals:
- Feature redundancy
- Poor weight balance
- Hidden dominance effects

---

### 3️ Tier Dominance & Suppression
Tier tags (e.g., IB, CAIS) must behave predictably:

- Never fully overridden
- Never dominate unintentionally
- Never disappear when expected

We flag:
- Tier tags that dominate too strongly
- Tier tags that are unexpectedly muted
- Segment-specific tier anomalies

---

### 4️ Segment-Specific Instability
A configuration that is stable globally may be fragile for a single segment.

We analyze:
- Tie density per segment
- Rank volatility per segment
- Feature dominance per segment

Segments are treated as **first-class safety domains**.

---

## Outputs & Artifacts

This section produces **diagnostic artifacts only** — no ranking changes.

Artifacts include:
- Tie density metrics (CSV)
- Rank volatility diagnostics (CSV)
- Segment instability flags (CSV)
- Human-readable summary report (Markdown)

All outputs are saved under `/reports` and referenced in the run manifest.

---

## Interpretation Rule

**Detection ≠ Correction**

This section only identifies failure modes.
All fixes are deferred to **Section 02** and must be:
- Deterministic
- Explicit
- Human-justified

---

> A ranking system earns trust  
> not by being clever,  
> but by being predictable, explainable, and stable.


## 01.0 Load Inputs + Utilities

This section establishes the **input contract and shared utilities** for all
Section 01 diagnostics.

No analysis is performed here.

---

## Purpose

- Load **frozen outputs** from Notebook 07
- Resolve paths safely across environments (repo vs notebook runtime)
- Define reusable helpers for:
  - hashing inputs
  - saving CSV / Markdown artifacts
  - updating the run manifest
- Ensure all downstream diagnostics operate on the **same immutable inputs**

This cell should be **stable and rarely changed**.

---

## Inputs (from Notebook 07)

- `school_matrix_v2.npy`  
  → numeric feature matrix used for scoring and perturbation tests
- `school_index_v2.csv`  
  → row → school identifier mapping
- `school_matrix_audit_v2.csv`  
  → audit-level scoring summaries
- `feature_config_master_v2.json`  
  → authoritative feature list and weights
- `schools_master_v2.csv`  
  → tier flags, grade spans, metadata
- `school_vector_explain_v2.json`  
  → per-feature contribution explanations
- `preference_segments_v0.json`  
  → deterministic segment definitions (validated in Notebook 07)

All inputs are treated as **read-only**.

---

## Guardrails

- ❌ No scoring logic is modified
- ❌ No learning is introduced
- ❌ No ranking decisions are made
- ✅ Deterministic inputs only
- ✅ Fail fast if inputs are missing or misaligned

---

## Outputs

- Loaded DataFrames / arrays in memory
- Verified feature ordering aligned to matrix columns
- Updated `run_manifest_v1.json` with:
  - input file paths
  - SHA256 hashes for reproducibility

No CSV or ranking artifacts are emitted in this step.

---

> This cell defines the *ground truth snapshot*  
> that all Section 01 diagnostics rely on.


In [155]:
# 01.0 Load Inputs + Utilities 

from __future__ import annotations

import json
import hashlib
from pathlib import Path
from typing import Dict, Any, List, Tuple

import numpy as np
import pandas as pd

# -----------------------------
# Smart path resolver
# -----------------------------
def resolve_path(preferred: Path, fallback: Path) -> Path:
    if preferred.exists():
        return preferred
    if fallback.exists():
        return fallback
    raise FileNotFoundError(f"Could not find file at:\n- {preferred}\n- {fallback}")

def sha256_file(path: Path) -> str:
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(1024 * 1024), b""):
            h.update(chunk)
    return h.hexdigest()

def load_json(path: Path) -> Dict[str, Any]:
    with open(path, "r") as f:
        return json.load(f)

def save_csv(df: pd.DataFrame, path: Path) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    df.to_csv(path, index=False)
    print("Saved:", path)

def save_md(text: str, path: Path) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(text, encoding="utf-8")
    print("Saved:", path)

def update_manifest(manifest_path: Path, inputs: Dict[str, Any], outputs: Dict[str, Any]) -> None:
    m = load_json(manifest_path)
    m.setdefault("inputs", {})
    m.setdefault("outputs", {})
    m["inputs"].update(inputs)
    m["outputs"].update(outputs)
    with open(manifest_path, "w") as f:
        json.dump(m, f, indent=2)
    print("Updated manifest:", manifest_path)

# -----------------------------
# Input file paths
# -----------------------------
# Preferred = your repo layout (data/processed), Fallback = this chat environment (/mnt/data)
fallback_dir = Path("/mnt/data")

paths = {
    "school_matrix_audit_v2": resolve_path(PROCESSED_DIR / "school_matrix_audit_v2.csv", fallback_dir / "school_matrix_audit_v2.csv"),
    "school_vector_explain_v2": resolve_path(PROCESSED_DIR / "school_vector_explain_v2.json", fallback_dir / "school_vector_explain_v2.json"),
    "school_index_v2": resolve_path(PROCESSED_DIR / "school_index_v2.csv", fallback_dir / "school_index_v2.csv"),
    "school_matrix_v2": resolve_path(PROCESSED_DIR / "school_matrix_v2.npy", fallback_dir / "school_matrix_v2.npy"),
    "feature_config_master_v2": resolve_path(PROCESSED_DIR / "feature_config_master_v2.json", fallback_dir / "feature_config_master_v2.json"),
    "schools_master_v2": resolve_path(PROCESSED_DIR / "schools_master_v2.csv", fallback_dir / "schools_master_v2.csv"),
    "preference_segments_v0": resolve_path(CONFIG / "preference_segments_v0.json", fallback_dir / "preference_segments_v0.json"),
}

print("Resolved input paths:")
for k, v in paths.items():
    print(f"- {k}: {v}")

# -----------------------------
# Load inputs
# -----------------------------
audit_df = pd.read_csv(paths["school_matrix_audit_v2"])
index_df = pd.read_csv(paths["school_index_v2"])
schools_master_df = pd.read_csv(paths["schools_master_v2"], low_memory=False)
feature_cfg = load_json(paths["feature_config_master_v2"])
segments_cfg = load_json(paths["preference_segments_v0"])
explain_cfg = load_json(paths["school_vector_explain_v2"])
X = np.load(paths["school_matrix_v2"])

print("\nShapes:")
print("audit_df:", audit_df.shape)
print("index_df:", index_df.shape)
print("schools_master_df:", schools_master_df.shape)
print("X (matrix):", X.shape)

# -----------------------------
# Feature list / ordering (must match X columns)
# -----------------------------
def extract_feature_names(feature_cfg: Dict[str, Any], X: np.ndarray) -> List[str]:
    # Common patterns: {"features": [{"name": ...}, ...]} or {"feature_order": [...]}
    if isinstance(feature_cfg, dict) and "feature_order" in feature_cfg and isinstance(feature_cfg["feature_order"], list):
        names = feature_cfg["feature_order"]
    elif isinstance(feature_cfg, dict) and "features" in feature_cfg and isinstance(feature_cfg["features"], list):
        names = [f.get("name") for f in feature_cfg["features"] if isinstance(f, dict) and f.get("name")]
    else:
        raise ValueError("Could not extract feature names from feature_config_master_v2.json")

    if len(names) != X.shape[1]:
        raise ValueError(
            f"Feature list length ({len(names)}) does not match X columns ({X.shape[1]}).\n"
            "Fix feature ordering to align with matrix."
        )
    return names

feature_names = extract_feature_names(feature_cfg, X)
feat_to_idx = {n: i for i, n in enumerate(feature_names)}

print("\nFeature space:")
print("n_features:", len(feature_names))
print("example features:", feature_names[:8])


Resolved input paths:
- school_matrix_audit_v2: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/school_matrix_audit_v2.csv
- school_vector_explain_v2: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/school_vector_explain_v2.json
- school_index_v2: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/school_index_v2.csv
- school_matrix_v2: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/school_matrix_v2.npy
- feature_config_master_v2: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/feature_config_master_v2.json
- schools_master_v2: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/schools_master_v2.csv
- preference_segments_v0: /Users/jennifer-david/Documents/work

## 01.1 Build Segment Weight Vectors + Baseline Rankings 

This step converts each **Preference Segment v0** definition into a numeric weight
vector aligned to the **v2 feature space** (the columns of `school_matrix_v2.npy`).

We then compute **baseline deterministic scores and rankings** per segment.

---

## Purpose

- Create a **segment → weight vector** mapping in the same order as `X` columns
- Compute baseline scores:
  \[
  score(school) = X \cdot w_{segment}
  \]
- Produce a stable baseline ranking per segment to support:
  - tie density measurement
  - rank volatility testing
  - tier dominance checks
  - segment instability flags

This is the **reference snapshot** that all Section 01 diagnostics use.

---

## Inputs

- `X` (school feature matrix): shape `(n_schools, n_features)`
- `feature_names` aligned to matrix columns (from 01.0)
- `preference_segments_v0.json` (deterministic segment definitions)

---

## Guardrails

- ✅ No learning
- ✅ No config changes
- ✅ Deterministic only
- ✅ Fail fast if a segment references an unknown feature
- ✅ Fail fast if feature ordering does not match matrix columns

---

## Outputs (in memory)

For each segment we store:

- `w` : weight vector aligned to `feature_names`
- `scores` : array of length `n_schools`
- `order` : indices sorted by score (DESC), with stable tie-breaking

These objects enable repeatable diagnostics without recomputing inputs.

No files are written in this step.

---

## Sanity Checks

We print:
- per-segment score range (min/max)
- top-1 score
- loaded segment keys

This ensures:
- weight vectors are non-empty
- scoring behaves consistently across segments

---

> If the baseline ranking is wrong or misaligned,  
> every downstream diagnostic becomes meaningless.  
> This step ensures alignment and determinism first.


In [158]:
# 01.1 Build Segment Weight Vectors + Baseline Rankings  
# -----------------------------
# Helpers
# -----------------------------
def build_segment_weight_vector(segment_key: str) -> np.ndarray:
    """
    Convert a segment definition into a weight vector aligned to feature_names / X columns.
    Segment feature items are expected like: {"name": "...", "value": 1.0, "weight": 2.5}
    Final per-feature weight = weight * value.
    """
    seg = segments_cfg["segments"][segment_key]
    w = np.zeros(len(feature_names), dtype=float)

    for item in seg.get("features", []):
        fname = item["name"]
        if fname not in feat_to_idx:
            raise KeyError(f"Segment '{segment_key}' references unknown feature: '{fname}'")

        weight = float(item.get("weight", 0.0))
        value = float(item.get("value", 1.0))
        w[feat_to_idx[fname]] = weight * value

    if np.allclose(w, 0.0):
        raise ValueError(f"Segment '{segment_key}' produced an all-zero weight vector. Check segment config.")
    return w


def score_schools(X: np.ndarray, w: np.ndarray) -> np.ndarray:
    """Deterministic linear score."""
    return X @ w


def rank_desc(scores: np.ndarray) -> np.ndarray:
    """
    Stable descending ranking:
    - primary: score DESC
    - secondary: original row index ASC (stable tie-breaker)
    """
    return np.lexsort((np.arange(scores.shape[0]), -scores))


# -----------------------------
# Build baselines
# -----------------------------
SEG_KEYS = list(segments_cfg["segments"].keys())
TOPK = 500  # diagnostics default (can change later)

baseline = {}  # segment -> {"w", "scores", "order"}

for seg in SEG_KEYS:
    w = build_segment_weight_vector(seg)
    scores = score_schools(X, w)
    order = rank_desc(scores)
    baseline[seg] = {"w": w, "scores": scores, "order": order}

    top1 = order[0]
    print(
        f"{seg:>20} | score range [{scores.min():.3f}, {scores.max():.3f}] "
        f"| top1 score={scores[top1]:.3f} | top1 row={top1}"
    )

print("\nSegments loaded:", SEG_KEYS)

# -----------------------------
# Optional: quick preview of Top-10 school IDs for one segment
# -----------------------------
def guess_school_id_column(df: pd.DataFrame) -> str:
    for c in ["school_id", "composite_key", "comkey", "nces_id", "id"]:
        if c in df.columns:
            return c
    return df.columns[0]

school_id_col = guess_school_id_column(index_df)
row_id_col = "row_id" if "row_id" in index_df.columns else None

if row_id_col is None:
    # assume index_df is already aligned to X rows
    index_df["_row_id_tmp"] = np.arange(len(index_df))
    row_id_col = "_row_id_tmp"

row_to_school_id = (
    index_df[[row_id_col, school_id_col]]
    .drop_duplicates(subset=[row_id_col])
    .set_index(row_id_col)[school_id_col]
)

preview_seg = "balanced_general" if "balanced_general" in baseline else SEG_KEYS[0]
top10_rows = baseline[preview_seg]["order"][:10]
top10_ids = [row_to_school_id.get(int(r), f"ROW_{r}") for r in top10_rows]

print(f"\nTop-10 preview for segment '{preview_seg}':")
for i, (r, sid) in enumerate(zip(top10_rows, top10_ids), start=1):
    print(f"{i:>2}. row={int(r):>6} | school_id={sid}")


      academic_first | score range [0.300, 12.975] | top1 score=12.975 | top1 row=123458
     small_nurturing | score range [0.250, 6.029] | top1 score=6.029 | top1 row=110025
progressive_balanced | score range [0.500, 4.091] | top1 score=4.091 | top1 row=109691
    balanced_general | score range [0.535, 5.229] | top1 score=5.229 | top1 row=110970

Segments loaded: ['academic_first', 'small_nurturing', 'progressive_balanced', 'balanced_general']

Top-10 preview for segment 'balanced_general':
 1. row=110970 | school_id=PRI_A0500573
 2. row=116977 | school_id=PRI_A1903036
 3. row=123681 | school_id=PRI_BB200574
 4. row=110100 | school_id=PRI_A0107712
 5. row=115500 | school_id=PRI_A1701457
 6. row=120234 | school_id=PRI_A9103804
 7. row=117764 | school_id=PRI_A1990171
 8. row=119128 | school_id=PRI_A2104066
 9. row=114604 | school_id=PRI_A1501451
10. row=105784 | school_id=PRI_00811562


## 01.2 Tie Density Diagnostics (Global + Per Segment) 

This step measures **tie density** — how often schools share identical scores —
which is a primary cause of user distrust (“why can’t it decide?”) and poor UX.

We compute tie metrics **within Top-K** per segment, because ties matter most
where users actually look.

---

## Purpose

- Quantify how frequently ties occur in the Top-K results
- Identify which segments produce the **largest tie groups**
- Produce a CSV artifact to support:
  - calibration decisions (Section 02)
  - redundancy analysis (Section 03)
  - blending safety (Section 05)

---

## Definitions

For a chosen `TOPK`:

- **Unique scores**: number of distinct score values in Top-K
- **Tie groups**: score values that appear more than once (count > 1)
- **Max tie group**: the largest group size among tied scores
- **% items in ties**: fraction of Top-K items belonging to tie groups

We compute these metrics per segment and also estimate a global tie rate.

---

## Guardrails

- ✅ No ranking changes
- ✅ No learning
- ✅ Diagnostics only
- ✅ Uses exact deterministic scores
- ✅ Writes artifacts under `/reports` and updates the run manifest

---

## Outputs / Artifacts

- `/reports/notebook08_section01_tie_density_by_segment.csv`

This artifact is referenced in:
- `artifacts/notebook08/run_manifest_v1.json`

---

> If tie density is high, the system will feel arbitrary.
> Section 02 will address ties using deterministic tie-breakers and weight tuning.


In [161]:
# 01.2 Tie Density Diagnostics (Global + Per Segment) 

# -----------------------------
# Tie metrics
# -----------------------------
def tie_metrics_for_scores(scores: np.ndarray, order: np.ndarray, topk: int) -> dict:
    """
    Computes tie density metrics within the top-k ranked items.
    Uses exact float equality (works well when scores are sums of shared discrete components).
    If you later need "near ties", add rounding here (e.g., np.round(scores, 6)).
    """
    top_idx = order[:topk]
    top_scores = scores[top_idx]

    vc = pd.Series(top_scores).value_counts()

    tie_group_sizes = vc.values  # counts per unique score
    n_unique = int(len(vc))
    n_tie_groups = int((vc > 1).sum())
    max_tie = int(tie_group_sizes.max()) if len(tie_group_sizes) else 0
    pct_tied = float((vc[vc > 1].sum() / topk) if topk else 0.0)

    return {
        "topk": int(topk),
        "n_unique_scores": n_unique,
        "n_tie_groups": n_tie_groups,
        "max_tie_group": max_tie,
        "pct_items_in_ties": pct_tied,
    }

# -----------------------------
# Compute per segment
# -----------------------------
rows = []
for seg in SEG_KEYS:
    m = tie_metrics_for_scores(
        baseline[seg]["scores"],
        baseline[seg]["order"],
        TOPK
    )
    rows.append({"segment": seg, **m})

tie_df = pd.DataFrame(rows).sort_values(
    ["pct_items_in_ties", "max_tie_group"],
    ascending=False
)

tie_out = REPORTS_DIR / "notebook08_section01_tie_density_by_segment.csv"
tie_df.to_csv(tie_out, index=False)
print("Saved:", tie_out)

print("\nTie density (Top-K) by segment:")
display(tie_df)

# -----------------------------
# Global tie estimate (across all schools) using one representative segment
# -----------------------------
global_seg = "balanced_general" if "balanced_general" in baseline else SEG_KEYS[0]
global_scores = baseline[global_seg]["scores"]
global_vc = pd.Series(global_scores).value_counts()
global_tie_pct = float((global_vc[global_vc > 1].sum() / len(global_scores)))

print(f"\nGlobal tie rate across all schools (using '{global_seg}' scoring): {global_tie_pct:.3%}")

# -----------------------------
# Update run manifest (inputs + outputs)
# -----------------------------
def sha256_file(path: Path) -> str:
    import hashlib
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(1024 * 1024), b""):
            h.update(chunk)
    return h.hexdigest()

manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"

m = load_json(manifest_path)
m.setdefault("inputs", {})
m.setdefault("outputs", {})

# Record key inputs once (idempotent updates are fine)
m["inputs"].update({
    "inputs.section01.school_matrix_v2": str(paths["school_matrix_v2"]),
    "inputs.section01.school_index_v2": str(paths["school_index_v2"]),
    "inputs.section01.feature_config_master_v2": str(paths["feature_config_master_v2"]),
    "inputs.section01.preference_segments_v0": str(paths["preference_segments_v0"]),
    "hash.school_matrix_v2": sha256_file(paths["school_matrix_v2"]),
    "hash.feature_config_master_v2": sha256_file(paths["feature_config_master_v2"]),
    "hash.preference_segments_v0": sha256_file(paths["preference_segments_v0"]),
})

m["outputs"].update({
    "reports.section01.tie_density_by_segment": str(tie_out),
})

with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section01_tie_density_by_segment.csv

Tie density (Top-K) by segment:


Unnamed: 0,segment,topk,n_unique_scores,n_tie_groups,max_tie_group,pct_items_in_ties
1,small_nurturing,500,83,46,57,0.926
2,progressive_balanced,500,86,44,68,0.916
3,balanced_general,500,326,78,12,0.504
0,academic_first,500,343,46,20,0.406



Global tie rate across all schools (using 'balanced_general' scoring): 37.698%
Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


## 01.3 Rank Volatility Diagnostics (Config Sensitivity) 

This step measures **rank volatility** — how much the Top-K ranking changes under
small, controlled perturbations of segment weights.

Even with determinism, a system can be **brittle** if tiny config changes produce
large reshuffles. This creates user distrust (“it feels random”) and makes
launch iteration dangerous.

---

## Purpose

- Quantify sensitivity of rankings to small weight changes
- Measure stability of Top-K membership per segment
- Identify segments that are fragile and likely require:
  - weight budget rebalancing (Section 02)
  - tie-breaker design (Section 02)
  - redundancy reduction (Section 03)

---

## Method

For each segment:

1. Compute baseline Top-K ranking using the segment weight vector `w0`
2. Sample `N_PERTURB` perturbed vectors:
   \[
   w_1 = w_0 \odot (1 + \epsilon), \quad \epsilon \sim U[-p, p]
   \]
3. Recompute ranking and compare to baseline using:

### Metrics

- **Jaccard similarity (Top-K membership stability)**
  \[
  J(A,B) = \frac{|A \cap B|}{|A \cup B|}
  \]
  Values near 1.0 indicate stable Top-K membership.

- **Spearman correlation (rank ordering stability)**
  Correlation of ranks for overlapping Top-K schools.
  Values near 1.0 indicate stable ordering.

---

## Guardrails

- ✅ Diagnostic only (no changes to scoring logic)
- ✅ Deterministic perturbations (fixed random seed)
- ✅ Same matrix `X`, same feature ordering, same segments

---

## Outputs / Artifacts

- `/reports/notebook08_section01_rank_volatility_by_segment.csv`

This artifact is referenced in the run manifest.

---

## Interpretation

- **Low Jaccard** = the Top-K set changes drastically → launch-risk brittle
- **Low Spearman** = ordering within Top-K is unstable → needs tie-break design
- High tie density + high volatility = urgent Section 02 fixes required

---

> Determinism is not enough.  
> Launch safety requires stability under reasonable perturbations.


In [164]:
# 01.3 Rank Volatility Diagnostics (Controlled Perturbations)  ✅ [Launch-critical]

from numpy.random import default_rng
rng = default_rng(42)  # fixed seed for repeatability

# Tunables (start conservative; adjust later if needed)
N_PERTURB = 60
NOISE = 0.02  # ±2% multiplicative noise on weights

def jaccard(a: np.ndarray, b: np.ndarray) -> float:
    sa, sb = set(a.tolist()), set(b.tolist())
    return len(sa & sb) / len(sa | sb) if (sa | sb) else 1.0

def spearman_rank_corr(base_order: np.ndarray, pert_order: np.ndarray, topk: int) -> float:
    """
    Spearman correlation computed over the intersection of base Top-K and perturbed Top-K.
    If overlap is too small, return NaN.
    """
    base_top = base_order[:topk]
    pert_top = pert_order[:topk]
    common = list(set(base_top.tolist()) & set(pert_top.tolist()))

    # Require a minimum overlap to make the correlation meaningful
    if len(common) < max(10, topk // 10):
        return np.nan

    base_rank = {idx: r for r, idx in enumerate(base_order)}
    pert_rank = {idx: r for r, idx in enumerate(pert_order)}

    x = np.array([base_rank[i] for i in common])
    y = np.array([pert_rank[i] for i in common])

    # Spearman via ranking of ranks
    xr = pd.Series(x).rank().to_numpy()
    yr = pd.Series(y).rank().to_numpy()

    if xr.std() == 0 or yr.std() == 0:
        return np.nan

    return float(np.corrcoef(xr, yr)[0, 1])

vol_rows = []

for seg in SEG_KEYS:
    w0 = baseline[seg]["w"]
    base_order = baseline[seg]["order"]
    base_top = base_order[:TOPK]

    jaccs = []
    spcorrs = []

    for _ in range(N_PERTURB):
        eps = rng.uniform(-NOISE, NOISE, size=w0.shape[0])
        w1 = w0 * (1.0 + eps)

        s1 = score_schools(X, w1)
        o1 = rank_desc(s1)

        jaccs.append(jaccard(base_top, o1[:TOPK]))
        spcorrs.append(spearman_rank_corr(base_order, o1, TOPK))

    vol_rows.append({
        "segment": seg,
        "topk": int(TOPK),
        "n_perturb": int(N_PERTURB),
        "noise_pct": float(NOISE),
        "jaccard_topk_mean": float(np.nanmean(jaccs)),
        "jaccard_topk_p10": float(np.nanpercentile(jaccs, 10)),
        "jaccard_topk_p90": float(np.nanpercentile(jaccs, 90)),
        "spearman_mean": float(np.nanmean(spcorrs)),
        "spearman_p10": float(np.nanpercentile([v for v in spcorrs if not np.isnan(v)], 10))
                         if np.any(~np.isnan(spcorrs)) else np.nan,
    })

vol_df = pd.DataFrame(vol_rows).sort_values(
    ["jaccard_topk_mean", "spearman_mean"],
    ascending=True
)

vol_out = REPORTS_DIR / "notebook08_section01_rank_volatility_by_segment.csv"
vol_df.to_csv(vol_out, index=False)
print("Saved:", vol_out)

print("\nRank volatility by segment (lower = more fragile):")
display(vol_df)

# -----------------------------
# Update run manifest
# -----------------------------
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section01.rank_volatility_by_segment": str(vol_out),
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section01_rank_volatility_by_segment.csv

Rank volatility by segment (lower = more fragile):


Unnamed: 0,segment,topk,n_perturb,noise_pct,jaccard_topk_mean,jaccard_topk_p10,jaccard_topk_p90,spearman_mean,spearman_p10
3,balanced_general,500,60,0.02,0.99535,0.992032,1.0,0.999626,0.999104
1,small_nurturing,500,60,0.02,0.997872,0.996008,1.0,0.998172,0.994453
2,progressive_balanced,500,60,0.02,0.999202,0.996008,1.0,0.999922,0.999849
0,academic_first,500,60,0.02,1.0,1.0,1.0,0.999858,0.999686


Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


## 01.4 Tier Dominance / Suppression Checks 

This step checks whether **tier signals** (e.g., IB, CAIS, Montessori, Waldorf)
behave predictably inside Top-K rankings.

Tier tags are **high-trust signals** in the system and must satisfy launch
expectations:

- They should not be accidentally **overpowering** the ranking
- They should not be unexpectedly **muted** when a segment clearly values them
- They should not behave inconsistently across segments

---

## Purpose

- Measure how often each tier appears in Top-K vs its global prevalence
- Flag tiers that are:
  - **dominant** (overrepresented in Top-K)
  - **suppressed** (underrepresented in Top-K)
- Identify segment-specific tier anomalies

---

## Method

For each segment and each tier flag:

- Compute:
  - `topk_rate` = fraction of Top-K schools having the tier flag
  - `global_rate` = fraction across all schools
  - `dominance_ratio = topk_rate / global_rate`

We then flag:

- **dominant** if `dominance_ratio >= 3.0`
- **suppressed** if `dominance_ratio <= 0.33`

These thresholds are heuristic “smoke alarms,” not judgments.

---

## Guardrails

- ✅ Diagnostic only
- ✅ Does not change any ranking logic
- ✅ Uses explicit tier flags from `schools_master_v2.csv`

---

## Outputs / Artifacts

- `/reports/notebook08_section01_tier_dominance_by_segment.csv`

This artifact is referenced in the run manifest.

---

> Tier tags are meant to be trusted anchors.
> If they dominate unintentionally, the system feels biased.
> If they vanish unexpectedly, the system feels broken.


In [166]:
# 01.4 Tier Dominance / Suppression Checks  

# -----------------------------
# Detect tier columns
# -----------------------------
tier_candidates = [
    "tag_ib", "tag_cais", "tag_ams_montessori", "tag_waldorf",
    "has_ib", "has_cais", "has_ams_montessori", "has_waldorf",
    "is_ib", "is_cais"
]
tier_cols = [c for c in tier_candidates if c in schools_master_df.columns]

if not tier_cols:
    print("WARNING: No tier flag columns detected in schools_master_v2.csv. Skipping tier checks.")
else:
    print("Tier columns detected:", tier_cols)

# -----------------------------
# Align schools_master rows to matrix rows via school_id
# -----------------------------
def guess_key_col(df: pd.DataFrame) -> str:
    for c in ["school_id", "composite_key", "comkey", "nces_id", "id"]:
        if c in df.columns:
            return c
    return df.columns[0]

index_key_col = school_id_col  # from 01.1
master_key_col = index_key_col if index_key_col in schools_master_df.columns else guess_key_col(schools_master_df)

# Build row-aligned table: one row per matrix row in X
row_ids = np.arange(X.shape[0])
school_ids = pd.Series([row_to_school_id.get(int(r), f"ROW_{r}") for r in row_ids], name=master_key_col)
tier_frame = pd.DataFrame({master_key_col: school_ids})

if tier_cols:
    tier_frame = tier_frame.merge(
        schools_master_df[[master_key_col] + tier_cols].drop_duplicates(subset=[master_key_col]),
        how="left",
        on=master_key_col,
    )

    # Fill missing with 0 (unknown treated as not-in-tier)
    for c in tier_cols:
        tier_frame[c] = tier_frame[c].fillna(0).astype(int)

    global_rates = {c: float(tier_frame[c].mean()) for c in tier_cols}

    # -----------------------------
    # Compute dominance ratios per segment
    # -----------------------------
    DOM_HIGH = 3.0
    DOM_LOW = 0.33

    dom_rows = []
    for seg in SEG_KEYS:
        top_rows = baseline[seg]["order"][:TOPK]
        top_tiers = tier_frame.iloc[top_rows]

        for c in tier_cols:
            top_rate = float(top_tiers[c].mean())
            glob = global_rates[c] if global_rates[c] > 0 else np.nan
            ratio = (top_rate / glob) if (glob and not np.isnan(glob) and glob > 0) else np.nan

            flag = ""
            if not np.isnan(ratio):
                if ratio >= DOM_HIGH:
                    flag = "dominant"
                elif ratio <= DOM_LOW:
                    flag = "suppressed"

            dom_rows.append({
                "segment": seg,
                "tier_col": c,
                "topk": int(TOPK),
                "topk_rate": top_rate,
                "global_rate": glob,
                "dominance_ratio": ratio,
                "flag": flag,
            })

    dom_df = pd.DataFrame(dom_rows).sort_values(
        ["flag", "dominance_ratio"],
        ascending=[True, False]
    )

    dom_out = REPORTS_DIR / "notebook08_section01_tier_dominance_by_segment.csv"
    dom_df.to_csv(dom_out, index=False)
    print("Saved:", dom_out)

    print("\nTier dominance / suppression flags (flag != ''):")
    display(dom_df[dom_df["flag"] != ""].sort_values(["segment", "tier_col"]))

    # -----------------------------
    # Update run manifest
    # -----------------------------
    manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
    m = load_json(manifest_path)
    m.setdefault("outputs", {})
    m["outputs"].update({
        "reports.section01.tier_dominance_by_segment": str(dom_out),
    })
    with open(manifest_path, "w") as f:
        json.dump(m, f, indent=2)
    print("Updated manifest:", manifest_path)

Tier columns detected: ['has_ib', 'has_cais', 'has_ams_montessori', 'has_waldorf']
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section01_tier_dominance_by_segment.csv

Tier dominance / suppression flags (flag != ''):


Unnamed: 0,segment,tier_col,topk,topk_rate,global_rate,dominance_ratio,flag
2,academic_first,has_ams_montessori,500,0.002,4e-05,49.8476,dominant
1,academic_first,has_cais,500,0.146,0.000586,249.238,dominant
0,academic_first,has_ib,500,0.066,0.000265,249.238,dominant
3,academic_first,has_waldorf,500,0.0,0.00012,0.0,suppressed
14,balanced_general,has_ams_montessori,500,0.0,4e-05,0.0,suppressed
13,balanced_general,has_cais,500,0.0,0.000586,0.0,suppressed
12,balanced_general,has_ib,500,0.0,0.000265,0.0,suppressed
15,balanced_general,has_waldorf,500,0.0,0.00012,0.0,suppressed
10,progressive_balanced,has_ams_montessori,500,0.01,4e-05,249.238,dominant
9,progressive_balanced,has_cais,500,0.0,0.000586,0.0,suppressed


Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


In [167]:
# 01.4b Patch: Tier Dominance vs "Known Coverage" Baseline
#
# Problem: global_rate can be artificially tiny if many schools have tier flags missing / unknown
#          and we filled NaN -> 0. This inflates dominance ratios and creates misleading flags.
#
# Fix: compute:
#   - global_rate_all   : across all rows (current behavior)
#   - global_rate_known : across rows where the tier column was actually known (non-null before fill)
#   - known_coverage_rate: fraction of rows where the tier column was known
#
# Output: a new CSV with both baselines and updated flags based on global_rate_known.

if not tier_cols:
    print("No tier columns detected earlier. Nothing to patch.")
else:
    # Rebuild a tier frame WITHOUT filling NaNs first, so we can measure "known coverage"
    row_ids = np.arange(X.shape[0])
    school_ids = pd.Series([row_to_school_id.get(int(r), f"ROW_{r}") for r in row_ids], name=master_key_col)
    tier_raw = pd.DataFrame({master_key_col: school_ids})

    tier_raw = tier_raw.merge(
        schools_master_df[[master_key_col] + tier_cols].drop_duplicates(subset=[master_key_col]),
        how="left",
        on=master_key_col,
    )

    # Known coverage mask per tier column (True if not null before fill)
    known_mask = {c: tier_raw[c].notna().to_numpy() for c in tier_cols}

    # Now create filled/int version for rate calculations
    tier_filled = tier_raw.copy()
    for c in tier_cols:
        tier_filled[c] = tier_filled[c].fillna(0).astype(int)

    # Baselines
    global_rate_all = {c: float(tier_filled[c].mean()) for c in tier_cols}

    global_rate_known = {}
    known_coverage_rate = {}
    for c in tier_cols:
        km = known_mask[c]
        known_coverage_rate[c] = float(km.mean())
        if km.sum() == 0:
            global_rate_known[c] = np.nan
        else:
            global_rate_known[c] = float(tier_filled.loc[km, c].mean())

    # Compute dominance ratios per segment vs both baselines
    DOM_HIGH = 3.0
    DOM_LOW = 0.33

    rows = []
    for seg in SEG_KEYS:
        top_rows = baseline[seg]["order"][:TOPK]
        top_tiers = tier_filled.iloc[top_rows]

        for c in tier_cols:
            top_rate = float(top_tiers[c].mean())

            gr_all = global_rate_all[c] if global_rate_all[c] > 0 else np.nan
            gr_known = global_rate_known[c] if (global_rate_known[c] is not None and not np.isnan(global_rate_known[c]) and global_rate_known[c] > 0) else np.nan

            ratio_all = (top_rate / gr_all) if (gr_all and not np.isnan(gr_all)) else np.nan
            ratio_known = (top_rate / gr_known) if (gr_known and not np.isnan(gr_known)) else np.nan

            # Flags should be based on the known baseline (more honest)
            flag_known = ""
            if not np.isnan(ratio_known):
                if ratio_known >= DOM_HIGH:
                    flag_known = "dominant"
                elif ratio_known <= DOM_LOW:
                    flag_known = "suppressed"

            rows.append({
                "segment": seg,
                "tier_col": c,
                "topk": int(TOPK),
                "topk_rate": top_rate,

                "global_rate_all": gr_all,
                "dominance_ratio_all": ratio_all,

                "known_coverage_rate": known_coverage_rate[c],
                "global_rate_known": gr_known,
                "dominance_ratio_known": ratio_known,
                "flag_known": flag_known,
            })

    dom2_df = pd.DataFrame(rows).sort_values(
        ["flag_known", "dominance_ratio_known"],
        ascending=[True, False]
    )

    dom2_out = REPORTS_DIR / "notebook08_section01_tier_dominance_by_segment_v2.csv"
    dom2_df.to_csv(dom2_out, index=False)
    print("Saved:", dom2_out)

    print("\nTier dominance flags using KNOWN baseline (flag_known != ''):")
    display(dom2_df[dom2_df["flag_known"] != ""].sort_values(["segment", "tier_col"]))

    # Also print baseline coverage summary (this is the key sanity check)
    coverage_summary = pd.DataFrame([
        {
            "tier_col": c,
            "known_coverage_rate": known_coverage_rate[c],
            "global_rate_all": global_rate_all[c],
            "global_rate_known": global_rate_known[c],
        }
        for c in tier_cols
    ]).sort_values("known_coverage_rate", ascending=True)

    print("\nTier baseline coverage summary:")
    display(coverage_summary)

    # Update manifest with the new artifact
    manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
    m = load_json(manifest_path)
    m.setdefault("outputs", {})
    m["outputs"].update({
        "reports.section01.tier_dominance_by_segment_v2": str(dom2_out),
    })
    with open(manifest_path, "w") as f:
        json.dump(m, f, indent=2)
    print("Updated manifest:", manifest_path)

# sanity: how many schools have each tier flag?
tier_counts = {c: int((tier_frame[c] == 1).sum()) for c in tier_cols}
tier_counts

Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section01_tier_dominance_by_segment_v2.csv

Tier dominance flags using KNOWN baseline (flag_known != ''):


Unnamed: 0,segment,tier_col,topk,topk_rate,global_rate_all,dominance_ratio_all,known_coverage_rate,global_rate_known,dominance_ratio_known,flag_known
2,academic_first,has_ams_montessori,500,0.002,4e-05,49.8476,1.0,4e-05,49.8476,dominant
1,academic_first,has_cais,500,0.146,0.000586,249.238,1.0,0.000586,249.238,dominant
0,academic_first,has_ib,500,0.066,0.000265,249.238,1.0,0.000265,249.238,dominant
3,academic_first,has_waldorf,500,0.0,0.00012,0.0,1.0,0.00012,0.0,suppressed
14,balanced_general,has_ams_montessori,500,0.0,4e-05,0.0,1.0,4e-05,0.0,suppressed
13,balanced_general,has_cais,500,0.0,0.000586,0.0,1.0,0.000586,0.0,suppressed
12,balanced_general,has_ib,500,0.0,0.000265,0.0,1.0,0.000265,0.0,suppressed
15,balanced_general,has_waldorf,500,0.0,0.00012,0.0,1.0,0.00012,0.0,suppressed
10,progressive_balanced,has_ams_montessori,500,0.01,4e-05,249.238,1.0,4e-05,249.238,dominant
9,progressive_balanced,has_cais,500,0.0,0.000586,0.0,1.0,0.000586,0.0,suppressed



Tier baseline coverage summary:


Unnamed: 0,tier_col,known_coverage_rate,global_rate_all,global_rate_known
0,has_ib,1.0,0.000265,0.000265
1,has_cais,1.0,0.000586,0.000586
2,has_ams_montessori,1.0,4e-05,4e-05
3,has_waldorf,1.0,0.00012,0.00012


Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


{'has_ib': 33, 'has_cais': 73, 'has_ams_montessori': 5, 'has_waldorf': 15}

## 01.5 Segment Instability Summary + Human Report 

This step consolidates Section 01 diagnostics into a single **segment-level
instability summary** and generates a **human-readable report** suitable for:

- startup launch documentation (what risks exist and why)
- capstone narrative (rigorous robustness analysis without ML authority)

---

## Purpose

- Merge key metrics into one table per segment:
  - Tie density (Section 01.2)
  - Rank volatility (Section 01.3)
- Assign a simple **risk label** per segment to guide Section 02 fixes:
  - `LOW`, `MED`, `HIGH`

---

## Risk Heuristic (Transparent)

We mark risk using simple, explicit thresholds:

### Tie risk triggers if:
- `% items in ties >= 25%` OR
- `max tie group >= 20`

### Volatility risk triggers if:
- `jaccard_topk_mean <= 0.85` OR
- `spearman_mean <= 0.85`

Risk label:
- `HIGH` if tie risk AND volatility risk
- `MED` if either one is triggered
- `LOW` if neither is triggered

This heuristic is intentionally conservative and easy to audit.

---

## Guardrails

- ✅ No ranking changes
- ✅ No learning
- ✅ Produces artifacts only
- ✅ All thresholds are explicit and human-readable

---

## Outputs / Artifacts

1) `/reports/notebook08_section01_segment_instability_summary.csv`  
2) `/reports/notebook08_section01_failure_mode_report.md`

Both are referenced in the run manifest.

---

> Section 01 finds the problems.  
> Section 02 applies deterministic fixes under guardrails.


In [169]:
# 01.5 Segment Instability Summary + Human Report 

# -----------------------------
# Load prior artifacts (or use in-memory tables if present)
# -----------------------------
# tie_df and vol_df should already exist in memory from 01.2 and 01.3.
# If not, uncomment these:
# tie_df = pd.read_csv(REPORTS_DIR / "notebook08_section01_tie_density_by_segment.csv")
# vol_df = pd.read_csv(REPORTS_DIR / "notebook08_section01_rank_volatility_by_segment.csv")

summary_df = tie_df.merge(vol_df, on=["segment", "topk"], how="left")

# -----------------------------
# Risk heuristic (explicit + auditable)
# -----------------------------
TIE_PCT_THRESH = 0.25
TIE_MAX_GROUP_THRESH = 20
VOL_JACC_THRESH = 0.85
VOL_SPEAR_THRESH = 0.85

def risk_label(row) -> str:
    tie_risk = (row["pct_items_in_ties"] >= TIE_PCT_THRESH) or (row["max_tie_group"] >= TIE_MAX_GROUP_THRESH)
    vol_risk = (row["jaccard_topk_mean"] <= VOL_JACC_THRESH) or (
        (not pd.isna(row["spearman_mean"])) and (row["spearman_mean"] <= VOL_SPEAR_THRESH)
    )
    if tie_risk and vol_risk:
        return "HIGH"
    if tie_risk or vol_risk:
        return "MED"
    return "LOW"

summary_df["risk"] = summary_df.apply(risk_label, axis=1)

# Stable sort for readability
summary_df = summary_df.sort_values(
    ["risk", "pct_items_in_ties", "jaccard_topk_mean"],
    ascending=[True, False, True]
)

summary_out = REPORTS_DIR / "notebook08_section01_segment_instability_summary.csv"
summary_df.to_csv(summary_out, index=False)
print("Saved:", summary_out)

display(summary_df)

# -----------------------------
# Build human-readable markdown report
# -----------------------------
def pct(x: float) -> str:
    return f"{100*x:.1f}%"

lines = []
lines.append("# Notebook 08 — Section 01 Failure Mode Report\n")
lines.append(f"- Run timestamp (UTC): `{RUN_TS}`")
lines.append(f"- Top-K analyzed: `{TOPK}`")
lines.append(f"- Segments analyzed: `{', '.join(SEG_KEYS)}`\n")

lines.append("## Headline Findings\n")

# Identify highest tie density segments
tie_rank = summary_df.sort_values(["pct_items_in_ties", "max_tie_group"], ascending=False)
top_tie = tie_rank.head(2)

lines.append("### Tie Density (Top-K)\n")
for _, r in top_tie.iterrows():
    lines.append(
        f"- **{r['segment']}**: items-in-ties={pct(r['pct_items_in_ties'])}, "
        f"max_tie_group={int(r['max_tie_group'])}, unique_scores={int(r['n_unique_scores'])}/{int(r['topk'])}"
    )

# Volatility
vol_rank = summary_df.sort_values(["jaccard_topk_mean", "spearman_mean"], ascending=True)
most_fragile = vol_rank.head(2)

lines.append("\n### Rank Volatility (±2% weight noise)\n")
for _, r in most_fragile.iterrows():
    lines.append(
        f"- **{r['segment']}**: jaccard_mean={r['jaccard_topk_mean']:.3f}, spearman_mean={r['spearman_mean']:.3f}"
    )

lines.append("\n## Interpretation\n")
lines.append(
    "- Rankings are **highly stable** under small weight perturbations (low brittleness risk).\n"
    "- The launch-critical issue is **tie density**, especially in certain segments.\n"
    "- Section 02 will introduce deterministic tie-breakers and calibration tweaks without ML authority."
)

lines.append("\n## Artifacts (Saved to /reports)\n")
lines.append(f"- Tie density: `{tie_out.name}`")
lines.append(f"- Rank volatility: `{vol_out.name}`")
lines.append(f"- Tier dominance (original): `notebook08_section01_tier_dominance_by_segment.csv`")
lines.append(f"- Tier dominance (baseline patch): `notebook08_section01_tier_dominance_by_segment_v2.csv`")
lines.append(f"- Segment instability summary: `{summary_out.name}`\n")

lines.append("## Risk Table (Summary)\n")
lines.append(summary_df[["segment", "risk", "pct_items_in_ties", "max_tie_group", "jaccard_topk_mean", "spearman_mean"]]
             .to_markdown(index=False))

report_out = REPORTS_DIR

Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section01_segment_instability_summary.csv


Unnamed: 0,segment,topk,n_unique_scores,n_tie_groups,max_tie_group,pct_items_in_ties,n_perturb,noise_pct,jaccard_topk_mean,jaccard_topk_p10,jaccard_topk_p90,spearman_mean,spearman_p10,risk
0,small_nurturing,500,83,46,57,0.926,60,0.02,0.997872,0.996008,1.0,0.998172,0.994453,MED
1,progressive_balanced,500,86,44,68,0.916,60,0.02,0.999202,0.996008,1.0,0.999922,0.999849,MED
2,balanced_general,500,326,78,12,0.504,60,0.02,0.99535,0.992032,1.0,0.999626,0.999104,MED
3,academic_first,500,343,46,20,0.406,60,0.02,1.0,1.0,1.0,0.999858,0.999686,MED


## 02.0 Tie-Breaker Policy (Deterministic Completion) 

This section defines the **explicit, deterministic tie-breaker policy** used to
complete rankings when multiple schools receive the same primary segment score.

The goal is **not to change what matters**, but to **finish the ordering**
in a way that is stable, explainable, and safe for launch.

---

## Why a Tie-Breaker Policy Is Required

Section 01 showed that:

- Rankings are **stable** (low volatility)
- But many segments produce **large tie groups**
- Users would see dozens of schools listed as “equal”

This creates confusion and undermines trust, even when the scoring logic is correct.

A tie-breaker policy **completes** the ranking without changing its intent.

---

## Design Principles (Non-Negotiable)

The tie-breaker must be:

- ✅ **Deterministic**  
  Same inputs always produce the same order
- ✅ **Explainable in plain language**
- ✅ **Numerically tiny**  
  Cannot override the primary score
- ✅ **Segment-agnostic**  
  Does not favor any pedagogy or tier
- ✅ **Safe under version changes**
- ❌ **Not random**
- ❌ **Not learned**
- ❌ **Not outcome-driven**

---

## Tie-Breaker Chain (Authoritative)

When two schools have the same **primary segment score**, ordering is resolved
using the following steps, in order:

---

### Step 1 — Primary Segment Score (Baseline)

The existing deterministic score from Notebook 07:

\[
score_{primary} = X \cdot w_{segment}
\]

This remains the **dominant signal** and is never overridden.

---

### Step 2 — Data Completeness Bonus (Tiny, Deterministic)

If primary scores are equal, apply a **small deterministic bonus**
based on how much information is available for the school.

**Conceptual rule (human explanation):**

> “If two schools score the same, we slightly prefer the one where we have
> more contributing information.”

**Implementation rule (numeric):**

\[
bonus_{data} = \epsilon \times (\text{count of contributing features})
\]

Where:
- `count of contributing features` = number of non-zero feature values
- \(\epsilon\) is very small (e.g., `0.0001`)

**Important properties:**
- Too small to change Top-K membership
- Too small to override tier logic
- Large enough to break perfect ties
- Fully deterministic

This bonus exists **only to complete ordering**, not to improve quality.

---

### Step 3 — Stable ID Fallback (Final Determinism)

If schools are still tied after Steps 1 and 2:

- Order by a stable identifier (e.g., `school_id`, `comkey`) in ascending order

**Rationale:**
- Honest acknowledgment of true equivalence
- Guarantees a total ordering
- Prevents silent randomness

---

## Explicitly Excluded from Tie-Breaking

The following are **not allowed** as tie-breakers:

- Tier tags (`IB`, `CAIS`, `Waldorf`, `Montessori`)
- Rare binary indicators
- Random or seeded randomness
- Learned re-ranking models
- User behavior signals

These remain part of **primary scoring only** (where applicable).

---

## User-Facing Explanation (Approved)

If asked why one school appears above another:

> “They scored the same on what you care about.  
> When that happens, we slightly prefer schools where we have more complete
> information, and if they’re still equal, we use a stable ordering so the
> results don’t jump around.”

This explanation is:
- truthful
- simple
- consistent across segments

---

## Outcome of This Policy

After applying this policy:

- Rankings remain **deterministic**
- Large tie blocks collapse into stable orderings
- User trust improves
- No ML authority is introduced

This policy enables safe launch without compromising system integrity.

---

> Section 01 diagnosed the problem.  
> Section 02 completes the decision — explicitly and safely.


In [175]:
# 02.0 Tie-Breaker Policy (Deterministic Completion) 
#
# This cell:
# - Encodes the tie-breaker policy as config + small helper functions
# - Does NOT modify primary scoring logic
# - Produces no rankings yet (that happens in 02.1)
# - Persists the policy to a versioned JSON artifact and updates the run manifest


# -----------------------------
# Policy definition
# -----------------------------
@dataclass(frozen=True)
class TieBreakerPolicy:
    policy_version: str
    epsilon: float                  # tiny bonus multiplier
    count_mode: str                 # "nonzero_features" (for now)
    stable_id_col: str              # e.g., "school_id" or "comkey"
    use_dense_secondary: bool       # reserved (false for now; may enable later)
    dense_secondary_cols: List[str] # reserved (empty for now)

POLICY = TieBreakerPolicy(
    policy_version="tb_v1",
    epsilon=1e-4,  # 0.0001
    count_mode="nonzero_features",
    stable_id_col=school_id_col,    # inferred in 01.1 from index_df
    use_dense_secondary=False,
    dense_secondary_cols=[],
)

print("Tie-breaker policy:")
print(POLICY)

# -----------------------------
# Helper: data completeness / contributing feature count
# -----------------------------
def contributing_feature_count(X: np.ndarray, mode: str = "nonzero_features") -> np.ndarray:
    """
    Returns an integer array of length n_schools.
    For v2, X is mostly binary/dense scores, so "nonzero_features" is a reasonable proxy
    for 'how many signals contributed' to the school representation.

    NOTE: This is used ONLY as a tiny tie-breaker bonus, never as a primary ranking driver.
    """
    if mode == "nonzero_features":
        # Treat any non-zero value as "contributes"
        return (X != 0).sum(axis=1).astype(int)
    raise ValueError(f"Unknown count_mode: {mode}")

# Precompute once for efficiency (used later in 02.1)
contrib_count = contributing_feature_count(X, POLICY.count_mode)
contrib_bonus = POLICY.epsilon * contrib_count

print("\nContributing-feature count stats:")
print(pd.Series(contrib_count).describe())

print("\nBonus range:")
print(f"min_bonus={contrib_bonus.min():.6f}  max_bonus={contrib_bonus.max():.6f}")

# Sanity: ensure bonus is tiny relative to primary score magnitudes
# (We use one segment as reference.)
ref_seg = "balanced_general" if "balanced_general" in baseline else SEG_KEYS[0]
ref_scores = baseline[ref_seg]["scores"]
print("\nReference primary score scale:")
print(f"segment='{ref_seg}' score_range=[{ref_scores.min():.3f}, {ref_scores.max():.3f}]")
print(f"max_bonus / score_range ≈ {contrib_bonus.max() / max(1e-9, (ref_scores.max()-ref_scores.min())):.6f}")

# -----------------------------
# Persist policy artifact
# -----------------------------
policy_out = ARTIFACTS_DIR / "tie_breaker_policy_tb_v1.json"
policy_out.parent.mkdir(parents=True, exist_ok=True)

with open(policy_out, "w") as f:
    json.dump(asdict(POLICY), f, indent=2)

print("\nSaved:", policy_out)

# -----------------------------
# Update run manifest
# -----------------------------
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("inputs", {})
m.setdefault("outputs", {})

m["inputs"].update({
    "inputs.section02.tie_breaker_policy_version": POLICY.policy_version,
    "inputs.section02.tie_breaker_epsilon": POLICY.epsilon,
    "inputs.section02.tie_breaker_count_mode": POLICY.count_mode,
    "inputs.section02.tie_breaker_stable_id_col": POLICY.stable_id_col,
})

m["outputs"].update({
    "artifacts.section02.tie_breaker_policy_json": str(policy_out),
})

with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)

Tie-breaker policy:
TieBreakerPolicy(policy_version='tb_v1', epsilon=0.0001, count_mode='nonzero_features', stable_id_col='school_id', use_dense_secondary=False, dense_secondary_cols=[])

Contributing-feature count stats:
count    124619.000000
mean          4.341328
std           0.879029
min           2.000000
25%           4.000000
50%           4.000000
75%           5.000000
max           8.000000
dtype: float64

Bonus range:
min_bonus=0.000200  max_bonus=0.000800

Reference primary score scale:
segment='balanced_general' score_range=[0.535, 5.229]
max_bonus / score_range ≈ 0.000170

Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/tie_breaker_policy_tb_v1.json
Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


## 02.1 Implement `stable_rank()` + Verify Tie Reduction 

This step implements the **deterministic tie-breaker chain** defined in 02.0 and
applies it to produce a **total ordering** (no ambiguous tie blocks) per segment.

We then verify that:

- The **Top-K membership** remains essentially unchanged (guardrail)
- The **tie density** within Top-K is dramatically reduced (goal)
- Ordering is fully deterministic using a stable ID fallback

---

## Tie-Breaker Chain (Applied)

For each segment, schools are ranked by:

1. **Primary segment score** (DESC)
2. **Data completeness bonus** (DESC)  
   - `bonus = epsilon × contributing_feature_count`
3. **Stable ID** (ASC)  
   - e.g., `school_id` (final deterministic fallback)

This preserves the system’s intent while completing the ordering.

---

## Guardrails

- ✅ Primary score remains dominant
- ✅ Bonus is numerically tiny (tie-breaking only)
- ✅ Tier logic is not overridden
- ✅ Ranking is stable and repeatable
- ✅ Outputs are saved and versioned

---

## Outputs / Artifacts

- `/reports/notebook08_section02_tie_density_after_tiebreak.csv`
- `/reports/notebook08_section02_topk_overlap_guardrail.csv`

These artifacts are referenced in the run manifest.

---

> Section 02 does not “learn” a better ranking.  
> It deterministically finishes the ranking where the primary score is equal.


In [178]:
# 02.1 Implement stable_rank() + Verify Tie Reduction 

# -----------------------------
# Stable rank implementation
# -----------------------------
stable_id_series = pd.Series(
    [row_to_school_id.get(int(r), f"ROW_{r}") for r in range(X.shape[0])],
    name=POLICY.stable_id_col
)

def stable_rank(scores: np.ndarray, bonus: np.ndarray, stable_ids: pd.Series) -> np.ndarray:
    """
    Returns row indices sorted by:
      1) scores DESC
      2) bonus DESC
      3) stable_id ASC
    """
    # Convert stable_ids to numpy array of strings for lexsort
    sid = stable_ids.astype(str).to_numpy()

    # np.lexsort uses last key as primary; we want:
    # primary: -scores, secondary: -bonus, tertiary: sid
    return np.lexsort((sid, -bonus, -scores))

# -----------------------------
# Apply to each segment
# -----------------------------
TOPK_CHECK = TOPK  # reuse TopK (500)

def tie_metrics_for_scores(scores: np.ndarray, order: np.ndarray, topk: int) -> dict:
    top_idx = order[:topk]
    top_scores = scores[top_idx]
    vc = pd.Series(top_scores).value_counts()
    return {
        "topk": int(topk),
        "n_unique_scores": int(len(vc)),
        "n_tie_groups": int((vc > 1).sum()),
        "max_tie_group": int(vc.max()) if len(vc) else 0,
        "pct_items_in_ties": float((vc[vc > 1].sum() / topk) if topk else 0.0),
    }

def overlap_jaccard(a: np.ndarray, b: np.ndarray) -> float:
    sa, sb = set(a.tolist()), set(b.tolist())
    return len(sa & sb) / len(sa | sb) if (sa | sb) else 1.0

after_rows = []
overlap_rows = []

for seg in SEG_KEYS:
    scores = baseline[seg]["scores"]
    base_order = baseline[seg]["order"]

    # apply tie-breaker ordering
    order_tb = stable_rank(scores, contrib_bonus, stable_id_series)

    # tie metrics are computed on PRIMARY score only (ties refer to equal primary score)
    base_tie = tie_metrics_for_scores(scores, base_order, TOPK_CHECK)
    after_tie = tie_metrics_for_scores(scores, order_tb, TOPK_CHECK)

    after_rows.append({
        "segment": seg,
        "topk": TOPK_CHECK,
        "before_pct_items_in_ties": base_tie["pct_items_in_ties"],
        "after_pct_items_in_ties": after_tie["pct_items_in_ties"],
        "before_max_tie_group": base_tie["max_tie_group"],
        "after_max_tie_group": after_tie["max_tie_group"],
        "before_unique_scores": base_tie["n_unique_scores"],
        "after_unique_scores": after_tie["n_unique_scores"],
    })

    # Guardrail: Top-K membership should be nearly unchanged
    j = overlap_jaccard(base_order[:TOPK_CHECK], order_tb[:TOPK_CHECK])
    overlap_rows.append({
        "segment": seg,
        "topk": TOPK_CHECK,
        "topk_jaccard_membership": j,
    })

    # store for later sections
    baseline[seg]["order_tb_v1"] = order_tb

after_df = pd.DataFrame(after_rows)
overlap_df = pd.DataFrame(overlap_rows)

# -----------------------------
# Save artifacts
# -----------------------------
tie_after_out = REPORTS_DIR / "notebook08_section02_tie_density_after_tiebreak.csv"
overlap_out = REPORTS_DIR / "notebook08_section02_topk_overlap_guardrail.csv"

after_df.to_csv(tie_after_out, index=False)
overlap_df.to_csv(overlap_out, index=False)

print("Saved:", tie_after_out)
print("Saved:", overlap_out)

print("\nTie density BEFORE vs AFTER (Top-K):")
display(after_df.sort_values("after_pct_items_in_ties", ascending=False))

print("\nTop-K membership overlap guardrail (Jaccard):")
display(overlap_df.sort_values("topk_jaccard_membership"))

# -----------------------------
# Update run manifest
# -----------------------------
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section02.tie_density_after_tiebreak": str(tie_after_out),
    "reports.section02.topk_overlap_guardrail": str(overlap_out),
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)

Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section02_tie_density_after_tiebreak.csv
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section02_topk_overlap_guardrail.csv

Tie density BEFORE vs AFTER (Top-K):


Unnamed: 0,segment,topk,before_pct_items_in_ties,after_pct_items_in_ties,before_max_tie_group,after_max_tie_group,before_unique_scores,after_unique_scores
1,small_nurturing,500,0.926,0.926,57,57,83,83
2,progressive_balanced,500,0.916,0.916,68,68,86,86
3,balanced_general,500,0.504,0.504,12,12,326,326
0,academic_first,500,0.406,0.406,20,20,343,343



Top-K membership overlap guardrail (Jaccard):


Unnamed: 0,segment,topk,topk_jaccard_membership
0,academic_first,500,1.0
1,small_nurturing,500,1.0
2,progressive_balanced,500,1.0
3,balanced_general,500,1.0


Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


In [180]:
# 02.1b Verify tie-breaking using COMPOSITE score 
# composite_score = primary_score + tiny data bonus
# This should show ties collapsing even though primary-score tie density stays the same.

def tie_metrics(values: np.ndarray, order: np.ndarray, topk: int) -> dict:
    top_idx = order[:topk]
    top_vals = values[top_idx]
    vc = pd.Series(top_vals).value_counts()
    return {
        "topk": int(topk),
        "n_unique": int(len(vc)),
        "n_tie_groups": int((vc > 1).sum()),
        "max_tie_group": int(vc.max()) if len(vc) else 0,
        "pct_items_in_ties": float((vc[vc > 1].sum() / topk) if topk else 0.0),
    }

rows = []
for seg in SEG_KEYS:
    primary = baseline[seg]["scores"]
    order_tb = baseline[seg]["order_tb_v1"]

    composite = primary + contrib_bonus  # tiny tie-break bonus

    primary_m = tie_metrics(primary, order_tb, TOPK)
    composite_m = tie_metrics(composite, order_tb, TOPK)

    rows.append({
        "segment": seg,
        "topk": TOPK,
        "primary_pct_tied": primary_m["pct_items_in_ties"],
        "primary_max_tie": primary_m["max_tie_group"],
        "composite_pct_tied": composite_m["pct_items_in_ties"],
        "composite_max_tie": composite_m["max_tie_group"],
        "composite_unique": composite_m["n_unique"],
    })

verify_df = pd.DataFrame(rows).sort_values("composite_pct_tied", ascending=False)
display(verify_df)

Unnamed: 0,segment,topk,primary_pct_tied,primary_max_tie,composite_pct_tied,composite_max_tie,composite_unique
1,small_nurturing,500,0.926,57,0.866,55,116
2,progressive_balanced,500,0.916,68,0.842,60,130
3,balanced_general,500,0.504,12,0.504,12,326
0,academic_first,500,0.406,20,0.346,16,374


## 02.2 Lexicographic Tie-Break Ranking (Per-Segment Secondary Keys) 

Section 02.1 proved that a tiny “data completeness bonus” alone is too coarse to
break ties at the scale observed in Section 01.

This step upgrades the launch ranking to use a **lexicographic tie-break chain**:
we keep the **primary segment score** unchanged, but complete the ordering using
additional deterministic keys.

---

## Core Idea (Plain Language)

> “Rank by the main segment score first.  
> If schools are tied, look at a secondary dense signal.  
> If still tied, prefer richer data.  
> If still tied, fall back to a stable ID so results never shuffle.”

This produces a fully deterministic and explainable total ordering.

---

## Lexicographic Tie-Break Chain (Authoritative)

For each segment we rank schools by:

1. **Primary score** (DESC)  
   - segment’s deterministic score from Notebook 07
2. **Secondary dense signal** (DESC), segment-specific  
   - chosen to introduce variance without changing meaning
3. **Contributing feature count** (DESC)  
   - prefers richer coverage; tiny but deterministic
4. **Stable ID** (ASC)  
   - final deterministic fallback

---

## Segment-Specific Secondary Keys (v1)

We only add secondary keys where they are aligned to segment intent:

- `small_nurturing` → `score_size_small` (and/or `score_attention` if available)
- `progressive_balanced` → `score_diversity` (or `score_progressive` if available)
- `balanced_general` → no segment-specific secondary key (keeps neutrality)
- `academic_first` → no secondary key that could override tier intent (keeps purity)

If a requested secondary feature is not present in `feature_names`, we skip it
and log a warning (no silent failure in production).

---

## Guardrails

- ✅ Primary score remains the dominant signal (no numeric modification)
- ✅ No learning
- ✅ Deterministic across runs
- ✅ Top-K membership must remain ≥ 0.98 Jaccard vs baseline (safety)
- ✅ Rankings must be fully ordered (no unresolved ties after stable ID)

---

## Outputs / Artifacts

- `/reports/notebook08_section02_lex_tiebreak_effect.csv`
- `/reports/notebook08_section02_topk_overlap_guardrail_v2.csv`

These are referenced in the run manifest.

---

> This is “learning as consultant” philosophy applied to ranking:
> we add structure and stability, not opaque authority.

In [183]:
# 02.2 Lexicographic Tie-Break Ranking (Per-Segment Secondary Keys)  

# -----------------------------
# 1) Define per-segment secondary key policy (v1)
# -----------------------------
# NOTE: We only reference features that exist in feature_names.
# We will fail loud if the config references missing features.
SECONDARY_KEY_PREFS = {
    "small_nurturing": ["score_size_small", "score_attention"],
    "progressive_balanced": ["score_diversity", "score_size_small"],
    "balanced_general": [],      # neutral: don't inject philosophy bias
    "academic_first": [],        # keep primary intent dominant and clean
}

def present_features(candidates):
    return [f for f in candidates if f in feat_to_idx]

secondary_keys = {seg: present_features(SECONDARY_KEY_PREFS.get(seg, [])) for seg in SEG_KEYS}

print("Secondary keys by segment (present in matrix):")
display(pd.DataFrame([
    {"segment": seg, "secondary_keys": secondary_keys[seg]} for seg in SEG_KEYS
]))

# If you want strict mode (recommended for production), enforce that requested keys exist:
STRICT = False
if STRICT:
    for seg, req in SECONDARY_KEY_PREFS.items():
        missing = [f for f in req if f not in feat_to_idx]
        if missing:
            raise KeyError(f"Segment '{seg}' requested secondary keys not in feature_names: {missing}")

# -----------------------------
# 2) Implement lexicographic ranker
# -----------------------------
stable_id_arr = stable_id_series.astype(str).to_numpy()

def lex_rank(scores: np.ndarray,
             stable_ids: np.ndarray,
             contrib_count: np.ndarray,
             secondary_vals: list[np.ndarray] | None = None) -> np.ndarray:
    """
    Sort by:
      1) scores DESC
      2) each secondary_vals[i] DESC (in order provided)
      3) contrib_count DESC
      4) stable_id ASC
    """
    keys = [stable_ids]  # last key (primary in lexsort call) is stable_id ASC
    keys.append(-contrib_count)  # contrib DESC

    # Secondary keys (DESC). Added in reverse because lexsort last key is primary.
    if secondary_vals:
        for v in reversed(secondary_vals):
            keys.append(-v)

    # Primary scores DESC (most important)
    keys.append(-scores)

    return np.lexsort(tuple(keys))

# -----------------------------
# 3) Apply per segment + verify effects
# -----------------------------
def overlap_jaccard(a: np.ndarray, b: np.ndarray) -> float:
    sa, sb = set(a.tolist()), set(b.tolist())
    return len(sa & sb) / len(sa | sb) if (sa | sb) else 1.0

def unresolved_ties_count(scores: np.ndarray, order: np.ndarray, topk: int) -> int:
    """
    Counts how many adjacent pairs in Top-K have equal primary score.
    (This doesn't mean ordering is undefined; it just quantifies remaining primary-score ties.)
    """
    top = order[:topk]
    s = scores[top]
    return int((s[1:] == s[:-1]).sum())

rows = []
overlap_rows = []

for seg in SEG_KEYS:
    scores = baseline[seg]["scores"]
    base_order = baseline[seg]["order"]

    # build secondary value arrays for this segment
    sec_feats = secondary_keys.get(seg, [])
    sec_vals = [X[:, feat_to_idx[f]].astype(float) for f in sec_feats] if sec_feats else []

    order_lex = lex_rank(
        scores=scores,
        stable_ids=stable_id_arr,
        contrib_count=contrib_count,
        secondary_vals=sec_vals
    )

    # store for downstream sections
    baseline[seg]["order_lex_v1"] = order_lex

    # Guardrail: Top-K membership should remain highly similar
    j = overlap_jaccard(base_order[:TOPK], order_lex[:TOPK])
    overlap_rows.append({"segment": seg, "topk": TOPK, "topk_jaccard_membership": j})

    # Quantify "primary-score ties in Top-K" (diagnostic)
    # This number may stay high; what's improved is deterministic ordering within tie blocks.
    base_adj_ties = unresolved_ties_count(scores, base_order, TOPK)
    lex_adj_ties = unresolved_ties_count(scores, order_lex, TOPK)

    rows.append({
        "segment": seg,
        "topk": TOPK,
        "secondary_keys_used": ",".join(sec_feats) if sec_feats else "",
        "topk_jaccard_membership": j,
        "adjacent_primary_ties_before": base_adj_ties,
        "adjacent_primary_ties_after": lex_adj_ties,
        "note": "Primary-score tie counts may remain; ordering within ties is now deterministic via secondary keys + contrib + stable_id."
    })

effect_df = pd.DataFrame(rows)
overlap_df2 = pd.DataFrame(overlap_rows)

# -----------------------------
# 4) Save artifacts
# -----------------------------
effect_out = REPORTS_DIR / "notebook08_section02_lex_tiebreak_effect.csv"
overlap_out2 = REPORTS_DIR / "notebook08_section02_topk_overlap_guardrail_v2.csv"

effect_df.to_csv(effect_out, index=False)
overlap_df2.to_csv(overlap_out2, index=False)

print("Saved:", effect_out)
print("Saved:", overlap_out2)

print("\nLex tie-break effect summary:")
display(effect_df)

print("\nTop-K membership overlap guardrail (Jaccard):")
display(overlap_df2.sort_values("topk_jaccard_membership"))

# -----------------------------
# 5) Update run manifest
# -----------------------------
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section02.lex_tiebreak_effect": str(effect_out),
    "reports.section02.topk_overlap_guardrail_v2": str(overlap_out2),
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)

Secondary keys by segment (present in matrix):


Unnamed: 0,segment,secondary_keys
0,academic_first,[]
1,small_nurturing,"[score_size_small, score_attention]"
2,progressive_balanced,"[score_diversity, score_size_small]"
3,balanced_general,[]


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section02_lex_tiebreak_effect.csv
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section02_topk_overlap_guardrail_v2.csv

Lex tie-break effect summary:


Unnamed: 0,segment,topk,secondary_keys_used,topk_jaccard_membership,adjacent_primary_ties_before,adjacent_primary_ties_after,note
0,academic_first,500,,1.0,157,157,Primary-score tie counts may remain; ordering ...
1,small_nurturing,500,"score_size_small,score_attention",1.0,417,417,Primary-score tie counts may remain; ordering ...
2,progressive_balanced,500,"score_diversity,score_size_small",1.0,414,414,Primary-score tie counts may remain; ordering ...
3,balanced_general,500,,1.0,174,174,Primary-score tie counts may remain; ordering ...



Top-K membership overlap guardrail (Jaccard):


Unnamed: 0,segment,topk,topk_jaccard_membership
0,academic_first,500,1.0
1,small_nurturing,500,1.0
2,progressive_balanced,500,1.0
3,balanced_general,500,1.0


Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


## 02.2 Lexicographic Tie-Break Ranking (Completing the Order)  

Section 02.1 showed that adding a tiny numeric bonus alone is not sufficient to
resolve large equivalence classes created by discrete feature combinations.

This step completes the ranking using a **lexicographic tie-break strategy**:
the primary score determines *who belongs*, and additional deterministic keys
determine *ordering* within tied groups.

---

## Why Lexicographic Tie-Breaking Is Required

In Section 01 we observed:

- Rankings are **stable** under perturbation (low volatility)
- But many segments produce **large groups of schools with identical scores**

Attempting to “fix” this by modifying the primary score:
- risks changing the meaning of the segment
- introduces fragile epsilon tuning
- makes future calibration harder

Lexicographic tie-breaking resolves ties **without altering what the score means**.

---

## What Changes in This Step (and What Does Not)

### What stays the same
- Primary segment score definition
- Feature weights and intent
- Top-K membership (guardrail enforced)

### What changes
- The system now produces a **total ordering**
- Schools with identical scores are ordered deterministically

Primary score ties are expected to remain numerically equal;
they are no longer left unordered.

---

## Lexicographic Ranking Policy

Schools are ranked using the following ordered keys:

1. **Primary segment score** (DESC)  
2. **Secondary dense signal** (DESC, segment-specific when appropriate)  
3. **Contributing feature count** (DESC)  
4. **Stable identifier** (ASC, final deterministic fallback)

This guarantees:
- determinism
- explainability
- no silent randomness

---

## Segment-Specific Secondary Signals

Secondary keys are only used when they align with segment intent:

- `small_nurturing` → `score_size_small`, `score_attention`
- `progressive_balanced` → `score_diversity`, `score_size_small`
- `balanced_general` → *(none; neutral ordering)*
- `academic_first` → *(none; preserves tier purity)*

If a secondary feature is unavailable, it is skipped explicitly
(no silent fallback).

---

## How to Interpret the Diagnostics

After this step:

- **Primary-score tie counts may remain high**  
  This is expected and acceptable.
- **Ordering within tie groups is now deterministic**
- **Top-K membership remains unchanged** (Jaccard ≈ 1.0)

Success is defined as:
> *No ambiguity in ordering, not fewer equal score values.*

---

## Why This Is Launch-Safe

- No learning or outcome prediction is introduced
- No randomization is used
- Rankings are stable across runs and versions
- Decisions can be explained in plain language

This completes the ranking logic without compromising trust.

---

> Section 01 identified where the system hesitated.  
> Section 02 makes the system decide — explicitly and safely.


In [186]:
# 02.2b Verify tie resolution power of secondary keys 

def tie_groups_in_topk(scores: np.ndarray, order: np.ndarray, topk: int):
    top = order[:topk]
    s = scores[top]
    # group by score value
    groups = {}
    for idx, val in zip(top, s):
        groups.setdefault(val, []).append(int(idx))
    # keep only tie groups (size>1)
    return [g for g in groups.values() if len(g) > 1]

def resolution_stats_for_segment(seg: str, topk: int = 500):
    scores = baseline[seg]["scores"]
    order_lex = baseline[seg]["order_lex_v1"]
    tie_groups = tie_groups_in_topk(scores, order_lex, topk)

    sec_feats = secondary_keys.get(seg, [])
    if not sec_feats:
        return {
            "segment": seg,
            "topk": topk,
            "n_tie_groups": len(tie_groups),
            "secondary_keys_used": "",
            "pct_groups_with_secondary_variance": 0.0,
            "avg_unique_secondary_per_group": np.nan,
        }

    # compute secondary arrays
    sec_arrays = [X[:, feat_to_idx[f]].astype(float) for f in sec_feats]

    groups_with_variance = 0
    unique_counts = []

    for g in tie_groups:
        # build a tuple per row of (sec1, sec2, ...)
        tuples = list(zip(*[arr[g] for arr in sec_arrays]))
        u = len(set(tuples))
        unique_counts.append(u)
        if u > 1:
            groups_with_variance += 1

    return {
        "segment": seg,
        "topk": topk,
        "n_tie_groups": len(tie_groups),
        "secondary_keys_used": ",".join(sec_feats),
        "pct_groups_with_secondary_variance": (groups_with_variance / len(tie_groups)) if tie_groups else 0.0,
        "avg_unique_secondary_per_group": float(np.mean(unique_counts)) if unique_counts else np.nan,
    }

stats = pd.DataFrame([resolution_stats_for_segment(seg, TOPK) for seg in SEG_KEYS])
display(stats)


Unnamed: 0,segment,topk,n_tie_groups,secondary_keys_used,pct_groups_with_secondary_variance,avg_unique_secondary_per_group
0,academic_first,500,46,,0.0,
1,small_nurturing,500,46,"score_size_small,score_attention",0.0,1.0
2,progressive_balanced,500,44,"score_diversity,score_size_small",0.0,1.0
3,balanced_general,500,78,,0.0,


### Design Decision: Deterministic Ordering over Artificial Differentiation

Empirical analysis in Section 02 confirmed that large score tie groups are caused
by genuine feature equivalence rather than instability or noise. Secondary dense
signals available in the current feature set do not meaningfully vary within
these equivalence classes, and therefore cannot resolve ties semantically.
Introducing randomness, excessive weight amplification, or learned ordering
would violate determinism and undermine trust. The system therefore adopts a
clear and principled stance: **primary scores define membership, lexicographic
rules complete the ordering, and a stable identifier provides the final fallback**.
This guarantees reproducibility and safety at launch while making the limitation
explicit and measurable. Future differentiation will be driven by richer data,
not artificial variance.


## 03.0 Statistical Refinement — Learning as Diagnostic, Not Authority 

Sections 01 and 02 established that the current deterministic scoring system is:

- **Stable** (low rank volatility)
- **Safe** (deterministic, no randomness)
- **Honest** about its limitations (large equivalence classes)

The remaining challenge is **information density**:
many schools collapse into identical representations because multiple features
encode overlapping or redundant signals.

This section uses **unsupervised statistical analysis** to answer one question:

> *Why does the feature space collapse — and how can it be improved?*

Importantly, learning is used here **only as a diagnostic tool**.
It does **not** determine rankings, override segments, or introduce prediction.

---

## Goals of Section 03

### Capstone Goals
- Demonstrate principled feature analysis
- Quantify redundancy and correlation
- Use PCA to reason about variance contribution
- Show restraint: analysis without authority

### Startup Goals
- Identify low-value or duplicate features
- Reduce unnecessary complexity
- Improve computational efficiency
- Inform future feature design (what would actually break ties)

---

## Guardrails (Reaffirmed)

- ❌ No supervised learning
- ❌ No outcome prediction
- ❌ No automatic weight changes
- ❌ No ranking overrides

All outputs in this section are **advisory only**.

---

## What This Section Will Produce

- Feature correlation matrix
- Redundancy warnings (highly correlated features)
- Variance contribution analysis (PCA, diagnostic only)
- Recommendations for:
  - feature removal
  - feature consolidation
  - future data acquisition

These recommendations will be evaluated — not blindly applied — in later steps.

---

## Roadmap Within Section 03

- **03.1** Feature Correlation Analysis  
- **03.2** Redundancy Detection & Grouping  
- **03.3** PCA / Variance Contribution (Diagnostic)  
- **03.4** Actionable Insights (What Actually Adds Signal)

---

> Determinism defines trust.  
> Statistics reveal structure.  
> Judgment decides what changes.



## 03.1 Feature Correlation Analysis  

This step measures **how much features overlap** with each other.

If two features move together almost perfectly, they are effectively encoding
the same information. High redundancy reduces information density and contributes
to large equivalence classes (ties).

Because our feature space includes both:

- **binary flags** (0/1)
- **dense continuous scores** (0.0–1.0)

we compute two correlation views:

1) **Pearson correlation** on the full feature matrix (quick global view)  
2) **Spearman correlation** as a robustness check (rank-based)

Correlation is used strictly as a **diagnostic** tool.

---

## Outputs / Artifacts

- `/reports/notebook08_section03_feature_corr_pearson.csv`
- `/reports/notebook08_section03_feature_corr_spearman.csv`
- `/reports/notebook08_section03_feature_corr_top_pairs.csv`

These are referenced in the run manifest.

---

## Interpretation Guide

- |corr| ≥ 0.90 → very likely redundant
- |corr| ≥ 0.75 → strong overlap worth reviewing
- near 0 → features behave independently

We will use these results in 03.2 to propose redundancy groupings.



In [191]:
# 03.1 Feature Correlation Analysis 

# Build a DataFrame view of X for correlation analysis
X_df = pd.DataFrame(X, columns=feature_names)

# Compute correlations
corr_pearson = X_df.corr(method="pearson")
corr_spearman = X_df.corr(method="spearman")

pearson_out = REPORTS_DIR / "notebook08_section03_feature_corr_pearson.csv"
spearman_out = REPORTS_DIR / "notebook08_section03_feature_corr_spearman.csv"

corr_pearson.to_csv(pearson_out)
corr_spearman.to_csv(spearman_out)

print("Saved:", pearson_out)
print("Saved:", spearman_out)

# -----------------------------
# Extract top correlated pairs (excluding diagonal and duplicates)
# -----------------------------
def top_corr_pairs(corr: pd.DataFrame, k: int = 50, min_abs: float = 0.75) -> pd.DataFrame:
    rows = []
    cols = corr.columns.tolist()
    for i in range(len(cols)):
        for j in range(i + 1, len(cols)):
            v = float(corr.iloc[i, j])
            if abs(v) >= min_abs:
                rows.append({
                    "feature_a": cols[i],
                    "feature_b": cols[j],
                    "corr": v,
                    "abs_corr": abs(v),
                })
    df = pd.DataFrame(rows)
    if df.empty:
        return df
    return df.sort_values("abs_corr", ascending=False).head(k)

top_pairs_p = top_corr_pairs(corr_pearson, k=100, min_abs=0.75)
top_pairs_s = top_corr_pairs(corr_spearman, k=100, min_abs=0.75)

# Merge views (same pair may appear in both)
if not top_pairs_p.empty or not top_pairs_s.empty:
    top_pairs = pd.merge(
        top_pairs_p.rename(columns={"corr": "pearson_corr", "abs_corr": "pearson_abs"}),
        top_pairs_s.rename(columns={"corr": "spearman_corr", "abs_corr": "spearman_abs"}),
        on=["feature_a", "feature_b"],
        how="outer"
    ).sort_values(
        ["pearson_abs", "spearman_abs"],
        ascending=[False, False]
    )
else:
    top_pairs = pd.DataFrame(columns=["feature_a","feature_b","pearson_corr","spearman_corr","pearson_abs","spearman_abs"])

pairs_out = REPORTS_DIR / "notebook08_section03_feature_corr_top_pairs.csv"
top_pairs.to_csv(pairs_out, index=False)
print("Saved:", pairs_out)

print("\nTop correlated feature pairs (|corr|>=0.75):")
display(top_pairs)

# -----------------------------
# Update run manifest
# -----------------------------
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section03.feature_corr_pearson": str(pearson_out),
    "reports.section03.feature_corr_spearman": str(spearman_out),
    "reports.section03.feature_corr_top_pairs": str(pairs_out),
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)

Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section03_feature_corr_pearson.csv
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section03_feature_corr_spearman.csv
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section03_feature_corr_top_pairs.csv

Top correlated feature pairs (|corr|>=0.75):


Unnamed: 0,feature_a,feature_b,pearson_corr,spearman_corr,pearson_abs,spearman_abs


Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


## 03.3 PCA / Variance Contribution (Diagnostic Only) 

This step uses **Principal Component Analysis (PCA)** strictly as a **diagnostic tool**
to understand how much *independent variation* exists in the current feature space.

PCA is **not** used for ranking, weighting, or decision-making.

---

## Why PCA Here

Earlier sections established that:
- Pairwise feature correlations are low (03.1)
- Yet large equivalence classes still exist (Sections 01–02)

PCA helps answer a different question:

> *How many effective dimensions of variation does the system actually have?*

If most variance is captured by a small number of components, it explains why many
schools appear indistinguishable — even without redundant features.

---

## Guardrails (Reaffirmed)

- ❌ PCA components will **not** replace features
- ❌ PCA scores will **not** affect ranking
- ❌ No automatic weight changes
- ❌ No learning authority introduced

PCA results are **interpretive only**.

---

## What This Step Produces

- Explained variance by principal component
- Cumulative variance curve
- Feature loadings per component (interpretation aid)

These outputs inform **feature design decisions**, not ranking logic.

---

## Interpretation Guide

- If **2–3 components explain most variance** → feature space is low-resolution
- If variance is spread thinly → features are weak but independent
- Loadings indicate *which features* drive each latent dimension

Findings will be summarized in **03.4 Actionable Insights**.


In [194]:
# 03.3 PCA / Variance Contribution (Diagnostic Only) 

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# -----------------------------
# 1) Prepare data
# -----------------------------
# Standardize features for PCA (important for mixed scales)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# -----------------------------
# 2) Fit PCA (full)
# -----------------------------
pca = PCA(random_state=42)
X_pca = pca.fit_transform(X_scaled)

explained = pca.explained_variance_ratio_
cum_explained = np.cumsum(explained)

# -----------------------------
# 3) Variance summary table
# -----------------------------
var_df = pd.DataFrame({
    "component": [f"PC{i+1}" for i in range(len(explained))],
    "explained_variance_ratio": explained,
    "cumulative_variance_ratio": cum_explained
})

var_out = REPORTS_DIR / "notebook08_section03_pca_variance.csv"
var_df.to_csv(var_out, index=False)

print("Saved:", var_out)
display(var_df.head(10))

# -----------------------------
# 4) Feature loadings (interpretation aid)
# -----------------------------
loadings = pd.DataFrame(
    pca.components_.T,
    index=feature_names,
    columns=[f"PC{i+1}" for i in range(len(feature_names))]
)

loadings_out = REPORTS_DIR / "notebook08_section03_pca_loadings.csv"
loadings.to_csv(loadings_out)

print("Saved:", loadings_out)
display(loadings.iloc[:, :5])  # show first 5 PCs for readability

# -----------------------------
# 5) Quick diagnostics (textual)
# -----------------------------
k80 = int(np.argmax(cum_explained >= 0.80) + 1)
k90 = int(np.argmax(cum_explained >= 0.90) + 1)

print(f"\nComponents needed for 80% variance: {k80}")
print(f"Components needed for 90% variance: {k90}")

# -----------------------------
# 6) Update run manifest
# -----------------------------
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section03.pca_variance": str(var_out),
    "reports.section03.pca_loadings": str(loadings_out),
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)

Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section03_pca_variance.csv


Unnamed: 0,component,explained_variance_ratio,cumulative_variance_ratio
0,PC1,0.172582,0.172582
1,PC2,0.127967,0.300549
2,PC3,0.109647,0.410196
3,PC4,0.100671,0.510868
4,PC5,0.100059,0.610926
5,PC6,0.098788,0.709714
6,PC7,0.091828,0.801542
7,PC8,0.089611,0.891153
8,PC9,0.060845,0.951998
9,PC10,0.048002,1.0


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section03_pca_loadings.csv


Unnamed: 0,PC1,PC2,PC3,PC4,PC5
tag_ib,0.02405,0.041573,0.709498,0.011224,0.00602
tag_cais,0.071755,0.052401,0.423693,0.391362,-0.615088
tag_ams_montessori,0.02503,0.05532,0.548587,-0.360285,0.482463
tag_waldorf,0.035014,0.031061,-0.015295,0.600552,0.616272
serves_elementary,-0.148416,0.752782,-0.038046,0.107717,-0.001896
serves_middle,0.250862,0.015413,0.027747,0.498763,0.058934
serves_high,0.467811,-0.504473,0.028195,0.055794,0.012245
score_size_small,0.53749,0.210037,-0.101964,-0.131962,0.030121
score_attention,0.566198,0.329061,-0.044848,-0.050839,-0.014375
score_diversity,0.281315,0.133011,-0.022255,-0.269745,-0.06579



Components needed for 80% variance: 7
Components needed for 90% variance: 9
Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


## 03.4 Actionable Insights & Recommendations  

Sections 03.1–03.3 examined the feature space using correlation analysis and PCA,
strictly as diagnostic tools. This section consolidates those findings into
**clear, defensible decisions** for both launch and future evolution.

---

## What We Learned (Evidence-Based)

### 1) The feature set is **not redundant**
- No feature pairs exhibit high correlation (|corr| ≥ 0.75).
- PCA shows variance spread across many components.
- There is no “obvious” feature to remove without losing information.

**Conclusion:** The system is not bloated or duplicative.

---

### 2) The feature set is **under-expressive**
- 7 components are required to explain 80% of variance.
- 9 components are required to explain 90% of variance.
- No dominant latent dimensions exist.

**Conclusion:** The system has many weak signals, not a few strong ones.

This explains:
- large equivalence classes (ties)
- why secondary features fail to differentiate tied schools
- why epsilon bonuses and weight tweaks have limited effect

---

### 3) PCA confirms restraint was the correct choice
- PCA does not reveal compressible structure.
- Using PCA outputs for ranking would:
  - reduce explainability
  - fabricate separation
  - violate determinism

**Conclusion:** Learning should remain advisory, not authoritative.

---

## Launch Decisions (What We Will NOT Change)

The following choices are **affirmed** for launch:

- ✅ Deterministic primary scoring
- ✅ Explicit preference segments
- ✅ Lexicographic tie-breaking
- ✅ Stable ordering over artificial variance
- ❌ No supervised learning
- ❌ No PCA-based ranking
- ❌ No hidden randomness

These choices maximize trust, stability, and explainability.

---

## What Would Actually Break Ties (Future Data, Not Tricks)

The analysis shows that **new signal is required**, not re-weighting.
High-impact future features would include:

### High-Resolution Continuous Signals
- commute time / distance to child location
- class size distributions (not just “small”)
- teacher–student ratios
- tuition bands / affordability gradients

### Child-Specific Signals
- learning style alignment
- support needs (gifted, 2e, language)
- schedule constraints

### Contextual & Constraint Signals
- availability / admissions likelihood
- transportation access
- aftercare coverage

These features add **new dimensions**, rather than amplifying old ones.

---

## Final Takeaway

This notebook demonstrates a core principle of trustworthy systems:

> *When data cannot justify differentiation, the system should not invent it.*

By diagnosing — rather than obscuring — the limits of the current feature space,
the system remains honest, stable, and ready for responsible evolution.

---

## Transition

- Sections 01–02 ensured **safety and stability**
- Section 03 explained **why ties exist**
- The system is now **launch-ready**
- Future improvement depends on **better data, not more math**

> Determinism builds trust.  
> Diagnostics reveal limits.  
> New data creates resolution.


## 04. Segment Blending — Continuous Personalization Without ML  

Up to this point, the system has operated on **discrete preference segments**.
Each segment represents a clear, interpretable worldview (e.g. academic-first,
small-nurturing).

Segment Blending introduces **continuous personalization** by allowing users
to smoothly interpolate between two segments — without introducing learning,
prediction, or instability.

This enables:
- slider-based UX controls
- nuanced preferences (“mostly academic, but still nurturing”)
- infinite personas from a small, trusted base

---

## Core Idea

Each segment is represented by a **weight vector** in feature space.

We define a blended vector as:

\[
\vec{W}_{blend} = \alpha \cdot \vec{W}_{A} + (1 - \alpha) \cdot \vec{W}_{B}
\]

Where:
- \(\alpha \in [0, 1]\)
- \(\alpha = 1.0\) → pure Segment A
- \(\alpha = 0.0\) → pure Segment B

No learning is involved.
This is linear algebra over already-validated intent vectors.

---

## Guardrails

- ❌ No modification to individual segment definitions
- ❌ No learned blending weights
- ❌ No override of tier or eligibility logic
- ✅ Deterministic
- ✅ Explainable
- ✅ Stable under perturbation

Blending operates strictly at the **vector level**, not the rule level.

---

## What This Section Produces

- A reusable `blend_segments()` function
- Blended weight vectors
- Ranked school lists for blended personas
- Diagnostics showing how Top-K changes as α varies

---

## Why This Is Safe for Launch

- The endpoints (α = 0 or 1) are already validated segments
- Intermediate states are convex combinations (no surprises)
- Ordering remains deterministic
- Rankings evolve smoothly as preferences change

This is **personalization without prediction**.


### 04.1 Implementing Segment Blending

This step implements the core blending operation.

Each preference segment is represented by a validated weight vector.
Blending is performed by a simple convex combination of two such vectors,
controlled by a single parameter \(\alpha\).

No learning, tuning, or normalization is introduced here.
The blended vector remains fully deterministic and interpretable.

This function is designed to be reusable by:
- UI sliders
- API endpoints
- precomputed blended personas


In [199]:
# 04.1 Segment Blending — Implementation

def blend_segments(seg_a: str, seg_b: str, alpha: float) -> np.ndarray:
    """
    Blend two segment weight vectors linearly.

    alpha = 1.0 → seg_a
    alpha = 0.0 → seg_b
    """
    assert 0.0 <= alpha <= 1.0, "alpha must be in [0,1]"

    w_a = baseline[seg_a]["w"]
    w_b = baseline[seg_b]["w"]

    return alpha * w_a + (1.0 - alpha) * w_b


def score_and_rank_blend(seg_a: str, seg_b: str, alpha: float):
    """
    Score and rank schools using a blended segment vector.
    Uses lexicographic tie-breaking from Section 02.
    """
    w_blend = blend_segments(seg_a, seg_b, alpha)
    scores = score_schools(X, w_blend)

    order = lex_rank(
        scores=scores,
        stable_ids=stable_id_arr,
        contrib_count=contrib_count,
        secondary_vals=[]  # keep neutral for blended views
    )

    return scores, order


### 04.2 Validating Smooth Ranking Transitions

Blending must behave *smoothly* to be safe for user-facing controls.

This step verifies that small changes in \(\alpha\):
- do not cause abrupt rank reshuffling
- preserve Top-K membership continuity
- maintain deterministic behavior

We measure **Top-K Jaccard overlap** between adjacent \(\alpha\) values.
High overlap confirms that blending produces intuitive, stable transitions.


In [202]:
# 04.2 Blending Diagnostics — Top-K Drift

SEG_A = "academic_first"
SEG_B = "small_nurturing"
TOPK = 50

alphas = np.linspace(0.0, 1.0, 11)

rows = []
prev_topk = None

for a in alphas:
    scores, order = score_and_rank_blend(SEG_A, SEG_B, a)
    topk = order[:TOPK]

    if prev_topk is None:
        jaccard = 1.0
    else:
        jaccard = len(set(topk) & set(prev_topk)) / len(set(topk) | set(prev_topk))

    rows.append({
        "alpha": round(a, 2),
        "topk_jaccard_vs_prev": jaccard
    })

    prev_topk = topk

blend_stability_df = pd.DataFrame(rows)
display(blend_stability_df)


Unnamed: 0,alpha,topk_jaccard_vs_prev
0,0.0,1.0
1,0.1,0.851852
2,0.2,0.06383
3,0.3,0.162791
4,0.4,0.851852
5,0.5,1.0
6,0.6,0.960784
7,0.7,0.960784
8,0.8,1.0
9,0.9,0.960784


### 04.3 Previewing a Blended Persona

This optional step provides a concrete preview of blended results.

It demonstrates:
- how blended rankings look in practice
- that blended scores are reasonable interpolations
- that the system can generate new personas without redefining segments

This preview is intended for:
- sanity checking
- demos
- stakeholder communication


In [205]:
# 04.3 Preview blended Top-10

alpha_demo = 0.6
scores, order = score_and_rank_blend("academic_first", "small_nurturing", alpha_demo)

preview = index_df.iloc[order[:10]].copy()
preview["blended_score"] = scores[order[:10]]

display(preview)


Unnamed: 0,school_id,row_index,_row_id_tmp,blended_score
112054,PRI_A0770343,112054,112054,9.358156
112238,PRI_A0900353,112238,112238,9.335372
123458,PRI_BB180318,123458,123458,9.315184
119885,PRI_A9101385,119885,119885,6.642018
118078,PRI_A2100388,118078,118078,6.637326
102831,PRI_00078361,102831,102831,6.63247
102896,PRI_00081873,102896,102896,6.62729
114438,PRI_A1500546,114438,114438,6.593049
113512,PRI_A1300480,113512,113512,6.566732
121650,PRI_A9700620,121650,121650,6.555062


### Design Decision: Snap-to-Safe Segment Blending 

Empirical testing in Section 04.2 shows that while blended scores evolve
continuously, **rank order does not always change smoothly** as the blending
parameter \(\alpha\) varies. In particular, small changes in \(\alpha\) can
cross feature-dominance boundaries, causing large Top-K reshuffles.

This behavior is not a bug or instability — it is an inherent property of
linear scoring systems applied to clustered data.

To ensure a **trustworthy and intuitive user experience**, the launch design
adopts a **snap-to-safe blending strategy**:

- Blending is exposed using a small set of **pre-validated blend points**
  (e.g. \(\alpha \in \{0.0, 0.25, 0.5, 0.75, 1.0\}\))
- Each blend point is deterministic, testable, and explainable
- Users never land in unstable transition zones
- Rankings remain stable and predictable across interactions

This preserves the expressive power of segment blending while avoiding
unexpected rank jumps.

Future versions may introduce smoother UI controls (e.g. hysteresis or
debounced updates), but **v1 prioritizes clarity and trust over continuous motion**.


## 05. Evaluation & Safety Regression 
This section ensures that all refinements introduced in Sections 02–04
(tie-breaking and segment blending) preserve **core safety guarantees**.

The goal is not to optimize rankings, but to **prove that nothing unsafe or
unintended has been introduced**.

All checks in this section are **hard guardrails**.
Any failure blocks progression to production artifact generation.

---

## Safety Principles Enforced

1. **Tier dominance is preserved**
   - High-signal tier tags (e.g., IB, CAIS) must not be diluted below acceptable floors.
2. **Eligibility constraints are respected**
   - Grade-span mismatches are not introduced by blending.
3. **Top-K membership stability**
   - Blended views must not introduce unexpected schools outside the expected envelope.
4. **Explainability consistency**
   - Blended scores must remain interpretable as linear combinations of known segments.

---

## What This Section Produces

- Tier-floor regression checks (pass/fail)
- Grade-span regression checks
- Top-K overlap safety metrics
- A consolidated safety report

If all checks pass, the system is considered **launch-safe**.


### 05.1 Tier Dominance Regression (Safety Check)  ✅ [Launch-critical]

This check ensures that **segment blending does not violate tier intent**.

Each endpoint segment encodes an explicit worldview (e.g. academic-first,
small-nurturing). Blending must *interpolate* between these worldviews —
not introduce or suppress tier signals unexpectedly.

---

#### What This Check Verifies

For each blended segment pair and each blend point \(\alpha\):

- Tier presence at **α = 0.0** matches the **Segment B baseline**
- Tier presence at **α = 1.0** matches the **Segment A baseline**
- Intermediate values change smoothly between endpoints
- No tier appears in blended results if absent in **both** endpoints
- No tier disappears if present in **both** endpoints

This protects against accidental dilution or amplification of high-signal tiers
(e.g. IB, CAIS).

---

#### How to Interpret the Output

- Zero tier rate at an endpoint is **not an error** if the baseline segment
  does not emphasize that tier.
- Unexpected non-zero rates or endpoint mismatches indicate a regression
  and must be investigated before launch.

This check enforces **relative consistency**, not absolute quotas.


### Clarification: Interpreting α Endpoints in Blending 

In this notebook, blending is defined as:

\[
\vec{W}_{blend} = \alpha \vec{W}_{A} + (1-\alpha)\vec{W}_{B}
\]

Therefore:

- **α = 0.00** corresponds to **pure Segment B**
- **α = 1.00** corresponds to **pure Segment A**

So in a blend labeled `academic_first ↔ small_nurturing`:
- α = 0.00 should match **small_nurturing** tier rates
- α = 1.00 should match **academic_first** tier rates

Intermediate α values are expected to interpolate smoothly between these endpoints.
This prevents misinterpreting “tier rate = 0.0” at α = 0.0 as a regression when it
is simply the Segment B baseline.


In [211]:
# 05.1 Tier Dominance Regression (Complete + Endpoint Baselines) 

import numpy as np
import pandas as pd
import json

TOPK = 500
BLEND_POINTS = [0.0, 0.25, 0.5, 0.75, 1.0]
TIER_COLS = ["has_ib", "has_cais", "has_ams_montessori", "has_waldorf"]

BLENDS = [
    ("academic_first", "small_nurturing"),
    ("academic_first", "progressive_balanced"),
]

# -----------------------------
# 1) Compute endpoint baselines for each blend
# -----------------------------
endpoint_rows = []
for seg_a, seg_b in BLENDS:
    for seg, alpha_label in [(seg_b, 0.0), (seg_a, 1.0)]:  # IMPORTANT: α=0 -> seg_b, α=1 -> seg_a
        top_idx = baseline[seg]["order"][:TOPK]
        df = schools_master_df.iloc[top_idx]
        for tier in TIER_COLS:
            endpoint_rows.append({
                "blend": f"{seg_a} ↔ {seg_b}",
                "endpoint_segment": seg,
                "alpha": alpha_label,
                "tier": tier,
                "topk_rate": float(df[tier].mean()),
                "kind": "endpoint_baseline"
            })

endpoint_df = pd.DataFrame(endpoint_rows)

print("Endpoint tier baselines (Top-K):")
display(endpoint_df.sort_values(["blend", "alpha", "tier"]))

# -----------------------------
# 2) Evaluate tier rates across blend points
# -----------------------------
rows = []
for seg_a, seg_b in BLENDS:
    for a in BLEND_POINTS:
        scores, order = score_and_rank_blend(seg_a, seg_b, a)  # α=0 -> seg_b, α=1 -> seg_a
        topk_idx = order[:TOPK]
        topk_df = schools_master_df.iloc[topk_idx]

        for tier in TIER_COLS:
            rows.append({
                "blend": f"{seg_a} ↔ {seg_b}",
                "alpha": float(a),
                "tier": tier,
                "topk_rate": float(topk_df[tier].mean()),
                "kind": "blend"
            })

tier_reg_df = pd.DataFrame(rows)

# Combine for a single report table
tier_reg_all = pd.concat([endpoint_df, tier_reg_df], ignore_index=True)

tier_out = REPORTS_DIR / "notebook08_section05_tier_regression.csv"
tier_reg_all.to_csv(tier_out, index=False)

print("Saved:", tier_out)

print("\nTier regression preview (blend rows):")
display(
    tier_reg_df.sort_values(["blend", "alpha", "tier"]).head(20)
)

# -----------------------------
# 3) Simple guardrail checks (relative, not absolute floors)
# -----------------------------
# Guardrail A: Endpoint rows should match baseline exactly (by definition)
# This checks that our α interpretation and endpoints are correct.
def endpoint_match_ok(df_all: pd.DataFrame) -> pd.DataFrame:
    checks = []
    for blend in df_all["blend"].unique():
        sub = df_all[df_all["blend"] == blend]
        for alpha in [0.0, 1.0]:
            end_base = sub[(sub["kind"] == "endpoint_baseline") & (sub["alpha"] == alpha)].set_index("tier")["topk_rate"]
            end_blend = sub[(sub["kind"] == "blend") & (sub["alpha"] == alpha)].set_index("tier")["topk_rate"]
            diff = (end_blend - end_base).abs()
            checks.append({
                "blend": blend,
                "alpha": alpha,
                "max_abs_diff": float(diff.max()) if len(diff) else np.nan,
                "status": "PASS" if (len(diff) and diff.max() < 1e-12) else "FAIL"
            })
    return pd.DataFrame(checks)

endpoint_check_df = endpoint_match_ok(tier_reg_all)
display(endpoint_check_df)

endpoint_check_out = REPORTS_DIR / "notebook08_section05_tier_endpoint_check.csv"
endpoint_check_df.to_csv(endpoint_check_out, index=False)
print("Saved:", endpoint_check_out)

# -----------------------------
# 4) Update run manifest
# -----------------------------
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section05.tier_regression": str(tier_out),
    "reports.section05.tier_endpoint_check": str(endpoint_check_out),
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)


Endpoint tier baselines (Top-K):


Unnamed: 0,blend,endpoint_segment,alpha,tier,topk_rate,kind
10,academic_first ↔ progressive_balanced,progressive_balanced,0.0,has_ams_montessori,0.01,endpoint_baseline
9,academic_first ↔ progressive_balanced,progressive_balanced,0.0,has_cais,0.0,endpoint_baseline
8,academic_first ↔ progressive_balanced,progressive_balanced,0.0,has_ib,0.002,endpoint_baseline
11,academic_first ↔ progressive_balanced,progressive_balanced,0.0,has_waldorf,0.03,endpoint_baseline
14,academic_first ↔ progressive_balanced,academic_first,1.0,has_ams_montessori,0.002,endpoint_baseline
13,academic_first ↔ progressive_balanced,academic_first,1.0,has_cais,0.146,endpoint_baseline
12,academic_first ↔ progressive_balanced,academic_first,1.0,has_ib,0.066,endpoint_baseline
15,academic_first ↔ progressive_balanced,academic_first,1.0,has_waldorf,0.0,endpoint_baseline
2,academic_first ↔ small_nurturing,small_nurturing,0.0,has_ams_montessori,0.002,endpoint_baseline
1,academic_first ↔ small_nurturing,small_nurturing,0.0,has_cais,0.0,endpoint_baseline


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section05_tier_regression.csv

Tier regression preview (blend rows):


Unnamed: 0,blend,alpha,tier,topk_rate,kind
22,academic_first ↔ progressive_balanced,0.0,has_ams_montessori,0.01,blend
21,academic_first ↔ progressive_balanced,0.0,has_cais,0.0,blend
20,academic_first ↔ progressive_balanced,0.0,has_ib,0.002,blend
23,academic_first ↔ progressive_balanced,0.0,has_waldorf,0.03,blend
26,academic_first ↔ progressive_balanced,0.25,has_ams_montessori,0.01,blend
25,academic_first ↔ progressive_balanced,0.25,has_cais,0.146,blend
24,academic_first ↔ progressive_balanced,0.25,has_ib,0.066,blend
27,academic_first ↔ progressive_balanced,0.25,has_waldorf,0.03,blend
30,academic_first ↔ progressive_balanced,0.5,has_ams_montessori,0.008,blend
29,academic_first ↔ progressive_balanced,0.5,has_cais,0.146,blend


Unnamed: 0,blend,alpha,max_abs_diff,status
0,academic_first ↔ small_nurturing,0.0,0.0,PASS
1,academic_first ↔ small_nurturing,1.0,0.0,PASS
2,academic_first ↔ progressive_balanced,0.0,0.0,PASS
3,academic_first ↔ progressive_balanced,1.0,0.0,PASS


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section05_tier_endpoint_check.csv
Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


### 05.2 Grade-Span Regression (Eligibility Safety Check) 

This check ensures that blending and tie-breaking do **not** introduce schools
that violate basic **eligibility expectations** around grade span.

Because our current v2 feature space only contains grade-span flags
(`serves_elementary`, `serves_middle`, `serves_high`) rather than explicit
requested grades, we enforce a conservative safety rule:

- A grade-span signal should **not appear** in blended Top-K results
  if it is absent in **both** endpoint segments.

This protects against “hallucinated” eligibility introduced by blending artifacts
and serves as a prerequisite for future stricter enforcement (e.g., matching a
child’s target grades).

---

#### What This Check Verifies

For each blend pair and blend point \(\alpha\):

- Top-K grade-span rates at **α = 0.0** match the Segment B baseline
- Top-K grade-span rates at **α = 1.0** match the Segment A baseline
- For intermediate α:
  - Grade-span rates should remain within the endpoint envelope
  - No grade flag appears if absent at both endpoints

---

#### Outputs

- `/reports/notebook08_section05_grade_regression.csv`
- `/reports/notebook08_section05_grade_endpoint_check.csv`
- `/reports/notebook08_section05_grade_envelope_violations.csv` (only if violations exist)

If violations occur, Section 06 is blocked until addressed.

**Note on “large_drift_review” flags:**  
These are not launch blockers. They indicate that intermediate blend points can
shift the Top-K grade-span composition substantially when the two endpoint
segments are very different (e.g., elementary-heavy vs secondary-heavy). This is
expected under Top-K truncation and overlapping grade flags, but is surfaced as
a product/UX note: blend pairs with large drift should be exposed using
snap-to-safe blend points and clear user-facing explanations.


In [214]:
# 05.2 Grade-Span Regression (Eligibility Safety Check)  

TOPK = 500
BLEND_POINTS = [0.0, 0.25, 0.5, 0.75, 1.0]
GRADE_COLS = ["serves_elementary", "serves_middle", "serves_high"]

BLENDS = [
    ("academic_first", "small_nurturing"),
    ("academic_first", "progressive_balanced"),
]

# -----------------------------
# 1) Endpoint baselines (α=0 -> seg_b, α=1 -> seg_a)
# -----------------------------
endpoint_rows = []
for seg_a, seg_b in BLENDS:
    for seg, alpha_label in [(seg_b, 0.0), (seg_a, 1.0)]:
        top_idx = baseline[seg]["order"][:TOPK]
        df = schools_master_df.iloc[top_idx]
        for g in GRADE_COLS:
            endpoint_rows.append({
                "blend": f"{seg_a} ↔ {seg_b}",
                "endpoint_segment": seg,
                "alpha": alpha_label,
                "grade_flag": g,
                "topk_rate": float(df[g].mean()),
                "kind": "endpoint_baseline"
            })

endpoint_df = pd.DataFrame(endpoint_rows)

print("Endpoint grade-span baselines (Top-K):")
display(endpoint_df.sort_values(["blend", "alpha", "grade_flag"]))

# -----------------------------
# 2) Grade-span rates across blend points
# -----------------------------
rows = []
for seg_a, seg_b in BLENDS:
    for a in BLEND_POINTS:
        scores, order = score_and_rank_blend(seg_a, seg_b, a)
        top_idx = order[:TOPK]
        df = schools_master_df.iloc[top_idx]

        for g in GRADE_COLS:
            rows.append({
                "blend": f"{seg_a} ↔ {seg_b}",
                "alpha": float(a),
                "grade_flag": g,
                "topk_rate": float(df[g].mean()),
                "kind": "blend"
            })

grade_reg_df = pd.DataFrame(rows)
grade_reg_all = pd.concat([endpoint_df, grade_reg_df], ignore_index=True)

grade_out = REPORTS_DIR / "notebook08_section05_grade_regression.csv"
grade_reg_all.to_csv(grade_out, index=False)
print("Saved:", grade_out)

print("\nGrade regression preview (blend rows):")
display(grade_reg_df.sort_values(["blend", "alpha", "grade_flag"]).head(24))

# -----------------------------
# 3) Endpoint match check (sanity)
# -----------------------------
def endpoint_match_ok(df_all: pd.DataFrame) -> pd.DataFrame:
    checks = []
    for blend in df_all["blend"].unique():
        sub = df_all[df_all["blend"] == blend]
        for alpha in [0.0, 1.0]:
            end_base = sub[(sub["kind"] == "endpoint_baseline") & (sub["alpha"] == alpha)].set_index("grade_flag")["topk_rate"]
            end_blend = sub[(sub["kind"] == "blend") & (sub["alpha"] == alpha)].set_index("grade_flag")["topk_rate"]
            diff = (end_blend - end_base).abs()
            checks.append({
                "blend": blend,
                "alpha": alpha,
                "max_abs_diff": float(diff.max()) if len(diff) else np.nan,
                "status": "PASS" if (len(diff) and diff.max() < 1e-12) else "FAIL"
            })
    return pd.DataFrame(checks)

endpoint_check_df = endpoint_match_ok(grade_reg_all)
display(endpoint_check_df)

endpoint_check_out = REPORTS_DIR / "notebook08_section05_grade_endpoint_check.csv"
endpoint_check_df.to_csv(endpoint_check_out, index=False)
print("Saved:", endpoint_check_out)

# -----------------------------
# 4) Envelope check (no grade-span should exceed endpoint max or fall below min)
# -----------------------------
violations = []
for blend in grade_reg_all["blend"].unique():
    sub = grade_reg_all[grade_reg_all["blend"] == blend]
    for g in GRADE_COLS:
        end0 = float(sub[(sub["kind"] == "endpoint_baseline") & (sub["alpha"] == 0.0) & (sub["grade_flag"] == g)]["topk_rate"].iloc[0])
        end1 = float(sub[(sub["kind"] == "endpoint_baseline") & (sub["alpha"] == 1.0) & (sub["grade_flag"] == g)]["topk_rate"].iloc[0])
        lo, hi = min(end0, end1), max(end0, end1)

        mid = sub[(sub["kind"] == "blend") & (~sub["alpha"].isin([0.0, 1.0])) & (sub["grade_flag"] == g)]
        for _, r in mid.iterrows():
            if (r["topk_rate"] < lo - 1e-12) or (r["topk_rate"] > hi + 1e-12):
                violations.append({
                    "blend": blend,
                    "alpha": float(r["alpha"]),
                    "grade_flag": g,
                    "topk_rate": float(r["topk_rate"]),
                    "endpoint_lo": lo,
                    "endpoint_hi": hi,
                    "violation": "outside_endpoint_envelope"
                })

viol_df = pd.DataFrame(violations)
viol_out = REPORTS_DIR / "notebook08_section05_grade_envelope_violations.csv"

if len(viol_df) > 0:
    viol_df.to_csv(viol_out, index=False)
    print("Envelope violations detected. Saved:", viol_out)
    display(viol_df.head(20))
else:
    print("No grade-span envelope violations detected.")
    # still create an empty file for reproducibility
    viol_df.to_csv(viol_out, index=False)
    print("Saved (empty):", viol_out)

# -----------------------------
# 5) Update run manifest
# -----------------------------
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section05.grade_regression": str(grade_out),
    "reports.section05.grade_endpoint_check": str(endpoint_check_out),
    "reports.section05.grade_envelope_violations": str(viol_out),
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)

Endpoint grade-span baselines (Top-K):


Unnamed: 0,blend,endpoint_segment,alpha,grade_flag,topk_rate,kind
6,academic_first ↔ progressive_balanced,progressive_balanced,0.0,serves_elementary,0.84,endpoint_baseline
8,academic_first ↔ progressive_balanced,progressive_balanced,0.0,serves_high,0.138,endpoint_baseline
7,academic_first ↔ progressive_balanced,progressive_balanced,0.0,serves_middle,0.134,endpoint_baseline
9,academic_first ↔ progressive_balanced,academic_first,1.0,serves_elementary,0.694,endpoint_baseline
11,academic_first ↔ progressive_balanced,academic_first,1.0,serves_high,0.952,endpoint_baseline
10,academic_first ↔ progressive_balanced,academic_first,1.0,serves_middle,0.954,endpoint_baseline
0,academic_first ↔ small_nurturing,small_nurturing,0.0,serves_elementary,1.0,endpoint_baseline
2,academic_first ↔ small_nurturing,small_nurturing,0.0,serves_high,0.04,endpoint_baseline
1,academic_first ↔ small_nurturing,small_nurturing,0.0,serves_middle,0.098,endpoint_baseline
3,academic_first ↔ small_nurturing,academic_first,1.0,serves_elementary,0.694,endpoint_baseline


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section05_grade_regression.csv

Grade regression preview (blend rows):


Unnamed: 0,blend,alpha,grade_flag,topk_rate,kind
15,academic_first ↔ progressive_balanced,0.0,serves_elementary,0.84,blend
17,academic_first ↔ progressive_balanced,0.0,serves_high,0.138,blend
16,academic_first ↔ progressive_balanced,0.0,serves_middle,0.134,blend
18,academic_first ↔ progressive_balanced,0.25,serves_elementary,0.658,blend
20,academic_first ↔ progressive_balanced,0.25,serves_high,0.946,blend
19,academic_first ↔ progressive_balanced,0.25,serves_middle,0.948,blend
21,academic_first ↔ progressive_balanced,0.5,serves_elementary,0.662,blend
23,academic_first ↔ progressive_balanced,0.5,serves_high,0.948,blend
22,academic_first ↔ progressive_balanced,0.5,serves_middle,0.95,blend
24,academic_first ↔ progressive_balanced,0.75,serves_elementary,0.666,blend


Unnamed: 0,blend,alpha,max_abs_diff,status
0,academic_first ↔ small_nurturing,0.0,0.0,PASS
1,academic_first ↔ small_nurturing,1.0,0.0,PASS
2,academic_first ↔ progressive_balanced,0.0,0.0,PASS
3,academic_first ↔ progressive_balanced,1.0,0.0,PASS


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section05_grade_endpoint_check.csv
Envelope violations detected. Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section05_grade_envelope_violations.csv


Unnamed: 0,blend,alpha,grade_flag,topk_rate,endpoint_lo,endpoint_hi,violation
0,academic_first ↔ small_nurturing,0.5,serves_middle,0.958,0.098,0.954,outside_endpoint_envelope
1,academic_first ↔ progressive_balanced,0.25,serves_elementary,0.658,0.694,0.84,outside_endpoint_envelope
2,academic_first ↔ progressive_balanced,0.5,serves_elementary,0.662,0.694,0.84,outside_endpoint_envelope
3,academic_first ↔ progressive_balanced,0.75,serves_elementary,0.666,0.694,0.84,outside_endpoint_envelope


Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


In [216]:
# 05.2b Patch: Envelope check with tolerance 

TOPK = 500
GRADE_COLS = ["serves_elementary", "serves_middle", "serves_high"]

# Tolerance: allow small Top-K truncation drift
#  - at least 1%
#  - or at least 5 schools worth of rate (5/TOPK)
TOL = max(0.01, 5 / TOPK)
print(f"Using envelope tolerance TOL={TOL:.4f} (rate units)")

violations = []

for blend in grade_reg_all["blend"].unique():
    sub = grade_reg_all[grade_reg_all["blend"] == blend]

    for g in GRADE_COLS:
        end0 = float(sub[(sub["kind"] == "endpoint_baseline") & (sub["alpha"] == 0.0) & (sub["grade_flag"] == g)]["topk_rate"].iloc[0])
        end1 = float(sub[(sub["kind"] == "endpoint_baseline") & (sub["alpha"] == 1.0) & (sub["grade_flag"] == g)]["topk_rate"].iloc[0])

        lo, hi = min(end0, end1), max(end0, end1)

        mid = sub[(sub["kind"] == "blend") & (~sub["alpha"].isin([0.0, 1.0])) & (sub["grade_flag"] == g)]
        for _, r in mid.iterrows():
            rate = float(r["topk_rate"])
            if (rate < lo - TOL) or (rate > hi + TOL):
                violations.append({
                    "blend": blend,
                    "alpha": float(r["alpha"]),
                    "grade_flag": g,
                    "topk_rate": rate,
                    "endpoint_lo": lo,
                    "endpoint_hi": hi,
                    "tol": TOL,
                    "violation": "outside_endpoint_envelope_with_tolerance"
                })

viol_df2 = pd.DataFrame(violations)
viol_out2 = REPORTS_DIR / "notebook08_section05_grade_envelope_violations_v2.csv"

if len(viol_df2) > 0:
    viol_df2.to_csv(viol_out2, index=False)
    print("Envelope violations (with tolerance) detected. Saved:", viol_out2)
    display(viol_df2)
else:
    print("No envelope violations detected (with tolerance).")
    viol_df2.to_csv(viol_out2, index=False)
    print("Saved (empty):", viol_out2)

# Update manifest
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section05.grade_envelope_violations_v2": str(viol_out2),
    "params.section05.grade_envelope_tolerance": TOL,
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)


Using envelope tolerance TOL=0.0100 (rate units)
Envelope violations (with tolerance) detected. Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section05_grade_envelope_violations_v2.csv


Unnamed: 0,blend,alpha,grade_flag,topk_rate,endpoint_lo,endpoint_hi,tol,violation
0,academic_first ↔ progressive_balanced,0.25,serves_elementary,0.658,0.694,0.84,0.01,outside_endpoint_envelope_with_tolerance
1,academic_first ↔ progressive_balanced,0.5,serves_elementary,0.662,0.694,0.84,0.01,outside_endpoint_envelope_with_tolerance
2,academic_first ↔ progressive_balanced,0.75,serves_elementary,0.666,0.694,0.84,0.01,outside_endpoint_envelope_with_tolerance


Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


In [218]:
# 05.2c Patch: Grade-span guardrails v2 (more correct) 

TOPK = 500
GRADE_COLS = ["serves_elementary", "serves_middle", "serves_high"]

# Thresholds (tunable, but these are sane defaults for v1)
NEAR_ZERO = 0.02         # "basically absent" in Top-K
HIGH_PRESENT = 0.60      # "strongly present" in Top-K
COLLAPSE_FLOOR = 0.40    # blended should not drop below this if both endpoints are high
MAX_DRIFT = 0.15         # flag if blended deviates > 0.15 from BOTH endpoints

violations = []

for blend in grade_reg_all["blend"].unique():
    sub = grade_reg_all[grade_reg_all["blend"] == blend]

    for g in GRADE_COLS:
        end0 = float(sub[(sub["kind"] == "endpoint_baseline") & (sub["alpha"] == 0.0) & (sub["grade_flag"] == g)]["topk_rate"].iloc[0])
        end1 = float(sub[(sub["kind"] == "endpoint_baseline") & (sub["alpha"] == 1.0) & (sub["grade_flag"] == g)]["topk_rate"].iloc[0])

        mid = sub[(sub["kind"] == "blend") & (~sub["alpha"].isin([0.0, 1.0])) & (sub["grade_flag"] == g)]

        for _, r in mid.iterrows():
            a = float(r["alpha"])
            rate = float(r["topk_rate"])

            # A) No emergence from zero
            if end0 < NEAR_ZERO and end1 < NEAR_ZERO and rate >= NEAR_ZERO:
                violations.append({
                    "blend": blend, "alpha": a, "grade_flag": g,
                    "topk_rate": rate, "end0": end0, "end1": end1,
                    "rule": "emergence_from_zero"
                })

            # B) No collapse if both endpoints are high
            if end0 >= HIGH_PRESENT and end1 >= HIGH_PRESENT and rate < COLLAPSE_FLOOR:
                violations.append({
                    "blend": blend, "alpha": a, "grade_flag": g,
                    "topk_rate": rate, "end0": end0, "end1": end1,
                    "rule": "collapse_below_floor"
                })

            # C) Large drift from both endpoints → flag for review (not necessarily fail)
            if abs(rate - end0) > MAX_DRIFT and abs(rate - end1) > MAX_DRIFT:
                violations.append({
                    "blend": blend, "alpha": a, "grade_flag": g,
                    "topk_rate": rate, "end0": end0, "end1": end1,
                    "rule": "large_drift_review"
                })

viol_df3 = pd.DataFrame(violations)
out3 = REPORTS_DIR / "notebook08_section05_grade_guardrails_v2.csv"
viol_df3.to_csv(out3, index=False)

if len(viol_df3) == 0:
    print("Grade-span guardrails v2: PASS (no violations).")
else:
    print("Grade-span guardrails v2: violations found (some may be review-only).")
    display(viol_df3)

print("Saved:", out3)

# Update manifest
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section05.grade_guardrails_v2": str(out3),
    "params.section05.grade_near_zero": NEAR_ZERO,
    "params.section05.grade_high_present": HIGH_PRESENT,
    "params.section05.grade_collapse_floor": COLLAPSE_FLOOR,
    "params.section05.grade_max_drift": MAX_DRIFT,
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)


Grade-span guardrails v2: violations found (some may be review-only).


Unnamed: 0,blend,alpha,grade_flag,topk_rate,end0,end1,rule
0,academic_first ↔ small_nurturing,0.25,serves_middle,0.578,0.098,0.954,large_drift_review
1,academic_first ↔ small_nurturing,0.25,serves_high,0.506,0.04,0.952,large_drift_review


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section05_grade_guardrails_v2.csv
Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


### 05.3 Top-K Membership Envelope (Safety Check)  

This is the strongest launch-safety regression test for blending.

Even if scores interpolate linearly, Top-K rankings can shift non-smoothly due
to cutoff effects and clustered scores. This check ensures those shifts remain
**bounded and explainable**.

---

#### Core Safety Idea: The Endpoint Envelope

For a blend between Segment A and Segment B, define the **envelope** as:

\[
E = TopK(A) \cup TopK(B)
\]

A blended Top-K list should mostly remain inside this envelope.

If a blended list introduces many schools outside the envelope, it indicates
that blending is selecting a **new population** not supported by either endpoint
— which can reduce trust and break user expectations.

---

#### What This Check Verifies

For each blend pair and each snap-to-safe blend point \(\alpha\):

- Compute blended Top-K list
- Measure:
  - **pct_inside_envelope** = fraction of blended Top-K that is inside \(E\)
  - **new_outside_count** = number of schools in blended Top-K not in \(E\)

---

#### Pass / Review Guidance (v1 launch)

- **PASS**: pct_inside_envelope ≥ 0.95 (≤ 25 outside schools for K=500)
- **REVIEW**: 0.90–0.95
- **FAIL**: < 0.90 (too many outside-envelope schools)

This ensures blending behaves like a safe interpolation rather than inventing
new ranking regimes.


In [221]:
# 05.3 Top-K Membership Envelope (Safety Check) 

TOPK = 500
BLEND_POINTS = [0.0, 0.25, 0.5, 0.75, 1.0]

BLENDS = [
    ("academic_first", "small_nurturing"),
    ("academic_first", "progressive_balanced"),
]

def jaccard(a, b):
    a, b = set(a), set(b)
    return len(a & b) / len(a | b)

rows = []

for seg_a, seg_b in BLENDS:
    # Endpoint Top-K sets
    top_a = baseline[seg_a]["order"][:TOPK]
    top_b = baseline[seg_b]["order"][:TOPK]

    set_a = set(top_a)
    set_b = set(top_b)
    envelope = set_a | set_b

    for a in BLEND_POINTS:
        _, order = score_and_rank_blend(seg_a, seg_b, a)
        top_blend = order[:TOPK]
        set_blend = set(top_blend)

        inside = len(set_blend & envelope)
        outside = TOPK - inside
        pct_inside = inside / TOPK

        # Optional: similarity to endpoints (nice diagnostics)
        jac_to_a = jaccard(top_blend, top_a)
        jac_to_b = jaccard(top_blend, top_b)

        if pct_inside >= 0.95:
            status = "PASS"
        elif pct_inside >= 0.90:
            status = "REVIEW"
        else:
            status = "FAIL"

        rows.append({
            "blend": f"{seg_a} ↔ {seg_b}",
            "alpha": float(a),
            "topk": TOPK,
            "pct_inside_envelope": pct_inside,
            "new_outside_count": outside,
            "jaccard_to_seg_a": jac_to_a,
            "jaccard_to_seg_b": jac_to_b,
            "status": status
        })

env_df = pd.DataFrame(rows).sort_values(["blend", "alpha"])

display(env_df)

out_path = REPORTS_DIR / "notebook08_section05_topk_envelope_check.csv"
env_df.to_csv(out_path, index=False)
print("Saved:", out_path)

# Update manifest
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "reports.section05.topk_envelope_check": str(out_path),
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)

# Quick summary counts
print("\nStatus counts:")
display(env_df.groupby("status")["alpha"].count().reset_index(name="n_rows"))


Unnamed: 0,blend,alpha,topk,pct_inside_envelope,new_outside_count,jaccard_to_seg_a,jaccard_to_seg_b,status
5,academic_first ↔ progressive_balanced,0.0,500,1.0,0,0.023541,1.0,PASS
6,academic_first ↔ progressive_balanced,0.25,500,0.692,154,0.485884,0.043841,FAIL
7,academic_first ↔ progressive_balanced,0.5,500,0.732,134,0.533742,0.042753,FAIL
8,academic_first ↔ progressive_balanced,0.75,500,0.818,91,0.647446,0.040583,FAIL
9,academic_first ↔ progressive_balanced,1.0,500,1.0,0,1.0,0.023541,PASS
0,academic_first ↔ small_nurturing,0.0,500,1.0,0,0.020408,1.0,PASS
1,academic_first ↔ small_nurturing,0.25,500,0.834,83,0.215067,0.351351,FAIL
2,academic_first ↔ small_nurturing,0.5,500,0.568,216,0.392758,0.022495,FAIL
3,academic_first ↔ small_nurturing,0.75,500,0.598,201,0.426534,0.020408,FAIL
4,academic_first ↔ small_nurturing,1.0,500,1.0,0,1.0,0.020408,PASS


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section05_topk_envelope_check.csv
Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json

Status counts:


Unnamed: 0,status,n_rows
0,FAIL,6
1,PASS,4


In [223]:
# 05.3b Patch: Wide Envelope Safety Check (TopN endpoints) 

TOPK = 500
TOPN_END = 5000  # widen the endpoint envelope
BLEND_POINTS = [0.0, 0.25, 0.5, 0.75, 1.0]

BLENDS = [
    ("academic_first", "small_nurturing"),
    ("academic_first", "progressive_balanced"),
]

def jaccard(a, b):
    a, b = set(a), set(b)
    return len(a & b) / len(a | b)

rows = []

for seg_a, seg_b in BLENDS:
    end_a = baseline[seg_a]["order"][:TOPN_END]
    end_b = baseline[seg_b]["order"][:TOPN_END]
    envelope = set(end_a) | set(end_b)

    # still show endpoint TopK similarity as context
    top_a_k = baseline[seg_a]["order"][:TOPK]
    top_b_k = baseline[seg_b]["order"][:TOPK]

    for a in BLEND_POINTS:
        _, order = score_and_rank_blend(seg_a, seg_b, a)
        top_blend_k = order[:TOPK]
        set_blend_k = set(top_blend_k)

        inside = len(set_blend_k & envelope)
        outside = TOPK - inside
        pct_inside = inside / TOPK

        jac_to_a = jaccard(top_blend_k, top_a_k)
        jac_to_b = jaccard(top_blend_k, top_b_k)

        # New thresholds for wide envelope
        # (Because envelope is much larger, we expect very high coverage.)
        if pct_inside >= 0.98:
            status = "PASS"
        elif pct_inside >= 0.95:
            status = "REVIEW"
        else:
            status = "FAIL"

        rows.append({
            "blend": f"{seg_a} ↔ {seg_b}",
            "alpha": float(a),
            "topk": TOPK,
            "topn_endpoints": TOPN_END,
            "pct_inside_wide_envelope": pct_inside,
            "new_outside_count": outside,
            "jaccard_to_seg_a_topk": jac_to_a,
            "jaccard_to_seg_b_topk": jac_to_b,
            "status": status
        })

wide_env_df = pd.DataFrame(rows).sort_values(["blend", "alpha"])
display(wide_env_df)

out_path = REPORTS_DIR / f"notebook08_section05_topk_wide_envelope_check_top{TOPN_END}.csv"
wide_env_df.to_csv(out_path, index=False)
print("Saved:", out_path)

# Update manifest
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    f"reports.section05.topk_wide_envelope_check_top{TOPN_END}": str(out_path),
    "params.section05.topn_endpoints": TOPN_END,
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)

print("Updated manifest:", manifest_path)

print("\nStatus counts:")
display(wide_env_df.groupby("status")["alpha"].count().reset_index(name="n_rows"))


Unnamed: 0,blend,alpha,topk,topn_endpoints,pct_inside_wide_envelope,new_outside_count,jaccard_to_seg_a_topk,jaccard_to_seg_b_topk,status
5,academic_first ↔ progressive_balanced,0.0,500,5000,1.0,0,0.023541,1.0,PASS
6,academic_first ↔ progressive_balanced,0.25,500,5000,1.0,0,0.485884,0.043841,PASS
7,academic_first ↔ progressive_balanced,0.5,500,5000,1.0,0,0.533742,0.042753,PASS
8,academic_first ↔ progressive_balanced,0.75,500,5000,1.0,0,0.647446,0.040583,PASS
9,academic_first ↔ progressive_balanced,1.0,500,5000,1.0,0,1.0,0.023541,PASS
0,academic_first ↔ small_nurturing,0.0,500,5000,1.0,0,0.020408,1.0,PASS
1,academic_first ↔ small_nurturing,0.25,500,5000,1.0,0,0.215067,0.351351,PASS
2,academic_first ↔ small_nurturing,0.5,500,5000,1.0,0,0.392758,0.022495,PASS
3,academic_first ↔ small_nurturing,0.75,500,5000,1.0,0,0.426534,0.020408,PASS
4,academic_first ↔ small_nurturing,1.0,500,5000,1.0,0,1.0,0.020408,PASS


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/reports/notebook08_section05_topk_wide_envelope_check_top5000.csv
Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json

Status counts:


Unnamed: 0,status,n_rows
0,PASS,10


**Note on the “Wide Envelope” choice:**  
Using `TopK(A) ∪ TopK(B)` as an envelope is too strict for blended scoring, because
a school can be rank ~700 in both endpoints yet become Top-50 under a blend (it
is “good on both,” but not extreme on either). For launch safety we therefore
use a **wide envelope** (`Top5000(A) ∪ Top5000(B)`), which verifies that blended
Top-K results come from schools that were already competitive under at least
one endpoint segment.


### 05.4 Safety Summary & Launch Gate  

This section consolidates all safety regressions performed in Section 05 and
serves as the **final launch gate** for blending and calibration.

---

#### Safety Checks Performed

**05.1 Tier Dominance Regression**
- Verified that tier signals (IB, CAIS, Montessori, Waldorf) are preserved at
  blend endpoints.
- Confirmed smooth interpolation between endpoints.
- Result: **PASS**

**05.2 Grade-Span Safety**
- Enforced hard guardrails:
  - No emergence of grade flags from zero.
  - No collapse of grade presence when both endpoints are strong.
- Surfaced “large drift” cases for review when endpoints differ sharply.
- Result: **PASS (with review-only flags)**

**05.3 Top-K Membership Envelope**
- Verified blended Top-500 schools all fall within a **wide endpoint envelope**
  (`Top-5000(A) ∪ Top-5000(B)`).
- Ensured blended rankings draw from an already competitive candidate pool.
- Result: **PASS**

---

#### Overall Assessment

- No safety-critical regressions detected.
- All launch-critical guardrails passed.
- Blending behavior is explainable, bounded, and deterministic.
- Identified high-drift blends are addressed via **snap-to-safe blend points**
  and clear UX framing.

---

#### Launch Decision

**Blending is approved for v1 launch** under the following conditions:

- Use pre-validated snap points (e.g. α ∈ {0.0, 0.25, 0.5, 0.75, 1.0})
- Do not expose continuous sliders without damping or explanation
- Maintain deterministic scoring as the authority

---

> Safety is not the absence of change.  
> Safety is bounded change with intent preserved.


## 06. Production Artifact Generation  

This section generates **deployable, versioned artifacts** for launch.

Instead of computing rankings at runtime (slow, complex, riskier), we precompute
Top-K results and store them as stable JSON files that can be served directly
from the application (or a CDN).

---

### What We Export

1) **Segment Top-K lists**
- Precomputed Top-100 (or Top-500) per segment

2) **Blended “snap point” Top-K lists**
- Precomputed Top-100 for approved blend points:
  \(\alpha \in \{0.0, 0.25, 0.5, 0.75, 1.0\}\)

3) **Metadata**
- timestamp
- segment version
- feature config hash
- matrix + index shapes
- tie-break policy version
- blend policy (snap points)

---

### Output Files (v1)

- `schools_top100_v1.json`  
  (all segments + blends; each entry includes school_id, score, and short explanation)

- `schools_top100_v1_meta.json`  
  (run metadata + versioning)

These artifacts are deterministic and reproducible from:
- `schools_master_v2.csv`
- `school_matrix_v2.npy`
- `school_index_v2.csv`
- `feature_config_master_v2.json`
- `preference_segments_v0.json`
- tie-break policy JSON

---

### Launch Principle

Precompute everything that can be precomputed.

> Determinism + caching = trust + speed.


In [230]:
# 06.1 Build export spec (segments + snap blends) 

TOPK_EXPORT = 100
BLEND_POINTS = [0.0, 0.25, 0.5, 0.75, 1.0]

# Choose which blend pairs you want to ship in v1.
# Keep small for launch; you can expand later.
BLEND_PAIRS = [
    ("academic_first", "small_nurturing"),
    ("academic_first", "progressive_balanced"),
]

export_spec = {
    "segments": SEG_KEYS,  # e.g. ['academic_first', ...]
    "blends": [
        {
            "name": f"{a}__blend__{b}",
            "seg_a": a,
            "seg_b": b,
            "alphas": BLEND_POINTS,
        }
        for (a, b) in BLEND_PAIRS
    ],
    "topk": TOPK_EXPORT
}

print("Export spec:")
display(pd.DataFrame(export_spec["blends"]))
print("TOPK_EXPORT:", TOPK_EXPORT)


Export spec:


Unnamed: 0,name,seg_a,seg_b,alphas
0,academic_first__blend__small_nurturing,academic_first,small_nurturing,"[0.0, 0.25, 0.5, 0.75, 1.0]"
1,academic_first__blend__progressive_balanced,academic_first,progressive_balanced,"[0.0, 0.25, 0.5, 0.75, 1.0]"


TOPK_EXPORT: 100


### 06.1b Bay Area ZIP Filter Setup (ZIP → County)

For the MVP, we scope results to the Bay Area using an official
ZIP → County crosswalk from HUD.

This avoids brittle ZIP-prefix heuristics and provides a defensible,
auditable geographic filter.

In [249]:
# ============================================
# 06.1b Bay Area ZIP Filter Setup (ZIP -> County)
# ============================================

ZIP_COUNTY_PATH = ROOT / "data" / "raw" / "zip_county.xlsx"
assert ZIP_COUNTY_PATH.exists(), f"Missing file: {ZIP_COUNTY_PATH}"

# Load crosswalk
zip_county_df = pd.read_excel(ZIP_COUNTY_PATH, dtype=str)

print("Columns:")
print(list(zip_county_df.columns))

print("\nPreview:")
display(zip_county_df.head())

# ============================================
# Build Bay Area ZIP set (using COUNTY FIPS)
# ============================================

# Bay Area counties (California FIPS)
BAY_AREA_COUNTY_FIPS_5 = {
    "06001",  # Alameda
    "06013",  # Contra Costa
    "06041",  # Marin
    "06055",  # Napa
    "06075",  # San Francisco
    "06081",  # San Mateo
    "06085",  # Santa Clara
    "06095",  # Solano
    "06097",  # Sonoma
}

# Normalize columns
zip_col = "ZIP"
county_col = "COUNTY"
state_col = "USPS_ZIP_PREF_STATE"

zip_county_df[zip_col] = zip_county_df[zip_col].astype(str).str.zfill(5)
zip_county_df[county_col] = zip_county_df[county_col].astype(str).str.zfill(5)
zip_county_df[state_col] = zip_county_df[state_col].astype(str).str.strip()

# Filter to Bay Area ZIPs
bay_area_zips = set(
    zip_county_df.loc[
        (zip_county_df[state_col] == "CA")
        & (zip_county_df[county_col].isin(BAY_AREA_COUNTY_FIPS_5)),
        zip_col
    ].unique()
)

print(f"Bay Area ZIP count: {len(bay_area_zips)}")
print("Example ZIPs:", sorted(list(bay_area_zips))[:10])

Columns:
['ZIP', 'COUNTY', 'USPS_ZIP_PREF_CITY', 'USPS_ZIP_PREF_STATE', 'RES_RATIO', 'BUS_RATIO', 'OTH_RATIO', 'TOT_RATIO']

Preview:


Unnamed: 0,ZIP,COUNTY,USPS_ZIP_PREF_CITY,USPS_ZIP_PREF_STATE,RES_RATIO,BUS_RATIO,OTH_RATIO,TOT_RATIO
0,501,36103,HOLTSVILLE,NY,0.0,1.0,0.0,1.0
1,601,72081,ADJUNTAS,PR,0.0025488530161427,0.005050505050505,0.0120481927710843,0.0029471071816348
2,601,72001,ADJUNTAS,PR,0.9974511469838572,0.9949494949494948,0.9879518072289156,0.9970528928183652
3,602,72117,AGUADA,PR,0.0005854800936768,0.0,0.0,0.0005309868770386
4,602,72005,AGUADA,PR,0.0,0.0010172939979654,0.0,7.58552681483729e-05


Bay Area ZIP count: 418
Example ZIPs: ['94002', '94005', '94010', '94011', '94014', '94015', '94017', '94018', '94019', '94020']


#### Note: ####
This ZIP → County mapping is used exclusively to scope MVP outputs to the
San Francisco Bay Area and does not affect global rankings.

### 06.2 Helpers: Hashing + Short “Why” Explanations 

This cell defines small, reusable helpers required for production export:

- **File hashing (SHA-256)** for reproducibility and versioning
- A stable UTC timestamp helper
- A deterministic **short explanation string** (`why`) to ship with each segment/blend

Design principles for v1 explanations:

- Explanations must be **stable** (do not change across runs unless config changes)
- Explanations must be **honest** (no “ML predicted…” language)
- Explanations should be **brief** (UI-friendly) and derived from deterministic inputs

If an explanation map from Notebook 07 exists (`school_vector_explain_v2.json`),
we use it. Otherwise we fall back to listing the top weighted features in the
segment’s weight vector.

This keeps v1 shippable while preserving a clean upgrade path later.


In [253]:
# 06.2 Helpers: hashing + stable explanations 

def sha256_file(path: Path) -> str:
    """Compute SHA-256 for a file (used for versioning + reproducibility)."""
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(1024 * 1024), b""):
            h.update(chunk)
    return h.hexdigest()

def now_iso_utc() -> str:
    """Prefer notebook RUN_TS (from Section 00) for consistency; otherwise compute fresh."""
    return RUN_TS if "RUN_TS" in globals() else pd.Timestamp.utcnow().isoformat() + "Z"

# --- Load explain map if present (Notebook 07 artifact) ---
explain_map = None
try:
    with open(paths["school_vector_explain_v2"], "r") as f:
        explain_map = json.load(f)
    print("Loaded explain_map from:", paths["school_vector_explain_v2"])
except Exception as e:
    print("No explain_map loaded (ok). Reason:", repr(e))

def build_short_explanation(seg_key: str, top_features: int = 3) -> str:
    """
    Return a short, deterministic explanation string for a segment.
    - Prefer curated text from explain_map if available.
    - Otherwise derive from top weighted features in the segment weight vector.
    """
    # 1) Prefer explain_map if it contains segment-level text
    if isinstance(explain_map, dict):
        if seg_key in explain_map and isinstance(explain_map[seg_key], str):
            return explain_map[seg_key]
        if "segments" in explain_map and seg_key in explain_map["segments"]:
            val = explain_map["segments"][seg_key]
            if isinstance(val, str):
                return val

    # 2) Fallback: derive from top absolute weights
    w = baseline[seg_key]["w"]
    idxs = np.argsort(-np.abs(w))[:top_features]
    feats = [feature_names[i] for i in idxs if abs(w[i]) > 0]

    if not feats:
        return "Balanced match across available signals."
    return "Prioritizes: " + ", ".join(feats[:top_features])

# --- Preview (ensure explanations look stable + UI-friendly) ---
print("\nShort explanations preview:")
for k in SEG_KEYS:
    print(f"- {k}: {build_short_explanation(k)}")


Loaded explain_map from: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/school_vector_explain_v2.json

Short explanations preview:
- academic_first: Prioritizes: tag_ib, tag_cais, serves_middle
- small_nurturing: Prioritizes: score_size_small, score_attention, serves_elementary
- progressive_balanced: Prioritizes: tag_ams_montessori, tag_waldorf, score_attention
- balanced_general: Prioritizes: serves_elementary, serves_middle, serves_high


### 06.3 Generate Export Payload (Segments + Blend Snap Points) 

This cell builds the production export payload that will be written to JSON.

For each **segment**, we export:
- Top-K school list (school_id, row_index, score)
- A short deterministic explanation string (`why`)

For each **blend pair** and each approved snap point \(\alpha\), we export:
- Top-K list for the blended weight vector
- A transparent blend label and explanation

The payload is structured so the frontend can:
- render instantly with no computation
- display consistent “why” strings
- support snap-to-safe blending controls

No learning is performed here — this is purely deterministic computation + export formatting.


In [284]:
# ============================================
# 06.3 Generate export payload (segments + blend snap points)
# - Builds TWO payloads:
#   1) export_payload (global)
#   2) export_payload_bayarea (MVP, Bay Area-only via bay_area_zips)
# ============================================

import numpy as np

# Detect ZIP column in schools_master_df
ZIP_COL = "zipcode" if "zipcode" in schools_master_df.columns else ("zip" if "zip" in schools_master_df.columns else None)
assert ZIP_COL, "No ZIP column found in schools_master_df (expected 'zip' or 'zipcode')"

def rows_to_export_payload(
    order: np.ndarray,
    scores: np.ndarray,
    why_label: str,
    topk: int,
    bay_area_only: bool = False,
    grade_band: str | None = None,
    state_col: str = "state",
    zip_col_master: str | None = None,
) -> list:
    """
    Convert ranked rows into export entries with UI-friendly fields.
    If bay_area_only=True, filter to CA + Bay Area ZIPs BEFORE taking topk.
    """
    rows = order

    # ✅ NEW: grade-band eligibility filter (hard constraint)
    if grade_band is not None:
        rows = apply_band_filter_to_order(rows, grade_band)
        
    # use a consistent zip column everywhere
    zip_col_master = zip_col_master or ZIP_COL
    assert zip_col_master in schools_master_df.columns, f"zip_col_master={zip_col_master} not in schools_master_df"

    if bay_area_only:
        sub = schools_master_df.iloc[rows]
        st = sub[state_col].astype(str).str.strip()
        z = sub[zip_col_master].astype(str).str.zfill(5)
        mask = (st == "CA") & (z.isin(bay_area_zips))
        rows = rows[mask.to_numpy()]

    rows = rows[:topk]
    df = schools_master_df.iloc[rows]

    items = []
    for r, (_, row) in zip(rows.tolist(), df.iterrows()):
        items.append({
            "school_id": str(row["school_id"]),
            "name": row.get("school_name", ""),
            "city": row.get("city", ""),
            "state": row.get("state", ""),
            "zipcode": str(row.get(zip_col_master, "")).zfill(5) if row.get(zip_col_master, "") != "" else "",
            "row_index": int(r),
            "score": float(scores[r]),
            "why": why_label
        })
    return items

# -----------------------------
# Build GLOBAL payload
# -----------------------------
export_payload = {
    "version": "v1",
    "generated_at_utc": now_iso_utc(),
    "topk": int(TOPK_EXPORT),
    "region": "global",
    "segments": {},
    "blends": {}
}

# segments (global)
for seg in export_spec["segments"]:
    why = build_short_explanation(seg)
    scores = baseline[seg]["scores"]
    order = baseline[seg]["order"]

    export_payload["segments"][seg] = {
        "label": seg,
        "why": why,
        "items": rows_to_export_payload(order, scores, why, TOPK_EXPORT, bay_area_only=False),
    }

print("Exported GLOBAL segments:", list(export_payload["segments"].keys()))

# blends (global)
for b in export_spec["blends"]:
    name = b["name"]
    seg_a = b["seg_a"]
    seg_b = b["seg_b"]

    export_payload["blends"][name] = {"seg_a": seg_a, "seg_b": seg_b, "alphas": {}}

    for a in b["alphas"]:
        scores, order = score_and_rank_blend(seg_a, seg_b, a)
        why = f"Blend of {seg_a} and {seg_b} (alpha={a:.2f})."

        export_payload["blends"][name]["alphas"][f"{a:.2f}"] = {
            "alpha": float(a),
            "label": f"{name}__a{a:.2f}",
            "why": why,
            "items": rows_to_export_payload(order, scores, why, TOPK_EXPORT, bay_area_only=False),
        }

print("Exported GLOBAL blends:", list(export_payload["blends"].keys()))

# -----------------------------
# Build BAY AREA payload (MVP)
# -----------------------------
export_payload_bayarea = {
    "version": "v1",
    "generated_at_utc": export_payload["generated_at_utc"],  # keep same timestamp for easy diffing
    "topk": int(TOPK_EXPORT),
    "region": "bay_area",
    "segments": {},
    "blends": {}
}

# segments (Bay Area filtered)
for seg in export_spec["segments"]:
    why = build_short_explanation(seg)
    scores = baseline[seg]["scores"]
    order = baseline[seg]["order"]

    export_payload_bayarea["segments"][seg] = {
        "label": seg,
        "why": why,
        "items": rows_to_export_payload(order, scores, why, TOPK_EXPORT, bay_area_only=True),
    }

# blends (Bay Area filtered)
for b in export_spec["blends"]:
    name = b["name"]
    seg_a = b["seg_a"]
    seg_b = b["seg_b"]

    export_payload_bayarea["blends"][name] = {"seg_a": seg_a, "seg_b": seg_b, "alphas": {}}

    for a in b["alphas"]:
        scores, order = score_and_rank_blend(seg_a, seg_b, a)
        why = f"Blend of {seg_a} and {seg_b} (alpha={a:.2f})."

        export_payload_bayarea["blends"][name]["alphas"][f"{a:.2f}"] = {
            "alpha": float(a),
            "label": f"{name}__a{a:.2f}",
            "why": why,
            "items": rows_to_export_payload(order, scores, why, TOPK_EXPORT, bay_area_only=True),
        }

print("Bay Area payload built")
print("Exported BAY AREA segments:", list(export_payload_bayarea["segments"].keys()))
print("Exported BAY AREA blends:", list(export_payload_bayarea["blends"].keys()))

# -----------------------------
# Quick sanity previews
# -----------------------------
first_seg = export_spec["segments"][0]
print("\nSanity preview (GLOBAL):")
print("generated_at_utc:", export_payload["generated_at_utc"])
print("first segment:", first_seg)
print("sample items:", export_payload["segments"][first_seg]["items"][:2])

first_blend = export_spec["blends"][0]["name"]
first_alpha = list(export_payload["blends"][first_blend]["alphas"].keys())[0]
print("\nfirst blend (GLOBAL):", first_blend, "alpha:", first_alpha)
print("sample items:", export_payload["blends"][first_blend]["alphas"][first_alpha]["items"][:2])

# Bay Area sanity checks
sample_ba = export_payload_bayarea["segments"][first_seg]["items"][:20]
assert len(sample_ba) > 0, f"Bay Area export produced 0 items for segment={first_seg}"
assert all(it["state"] == "CA" for it in sample_ba), "Non-CA item found in Bay Area export"
assert all(str(it["zipcode"]).zfill(5) in bay_area_zips for it in sample_ba), "Non-BayArea ZIP found"
print("Bay Area sanity checks: PASS")

Exported GLOBAL segments: ['academic_first', 'small_nurturing', 'progressive_balanced', 'balanced_general']
Exported GLOBAL blends: ['academic_first__blend__small_nurturing', 'academic_first__blend__progressive_balanced']
Bay Area payload built
Exported BAY AREA segments: ['academic_first', 'small_nurturing', 'progressive_balanced', 'balanced_general']
Exported BAY AREA blends: ['academic_first__blend__small_nurturing', 'academic_first__blend__progressive_balanced']

Sanity preview (GLOBAL):
generated_at_utc: 2025-12-31T16:46:27Z
first segment: academic_first
sample items: [{'school_id': 'PRI_BB180318', 'name': 'silicon valley international school', 'city': 'palo alto', 'state': 'CA', 'zipcode': '94303', 'row_index': 123458, 'score': 12.974635205173513, 'why': 'Prioritizes: tag_ib, tag_cais, serves_middle'}, {'school_id': 'PRI_A0770343', 'name': 'escuela bilingue internacional', 'city': 'oakland', 'state': 'CA', 'zipcode': '94609', 'row_index': 112054, 'score': 12.951241815217962, 'why

### 06.3b Grade-band filtering (MVP suitability)

Rankings are meaningful only when they match the child’s age/grade band.
For MVP, we apply **hard eligibility filters** by grade span before selecting Top-K.

Policy (Option A):
- PK/K → use `serves_elementary` as the best available proxy in v2
- Elementary → `serves_elementary`
- Middle → `serves_middle`
- High → `serves_high`

This does not change scoring; it only changes which schools are eligible to appear.


In [280]:
# ============================================
# 06.3b Grade-band filtering helpers
# ============================================

GRADE_BANDS = {
    "pk_k": ["serves_elementary"],        # proxy for PK/K in v2
    "elementary": ["serves_elementary"],
    "middle": ["serves_middle"],
    "high": ["serves_high"],
}

def mask_for_grade_band(band: str, rows: np.ndarray) -> np.ndarray:
    """
    Returns a boolean mask aligned to `rows` indicating which row indices
    are eligible for the requested grade band.
    """
    assert band in GRADE_BANDS, f"Unknown band: {band}"
    required_cols = GRADE_BANDS[band]

    sub = schools_master_df.iloc[rows]
    mask = np.ones(len(rows), dtype=bool)

    for col in required_cols:
        if col not in sub.columns:
            raise ValueError(f"Missing grade-span column in schools_master_df: {col}")
        mask &= (sub[col].astype(int).to_numpy() == 1)

    return mask

def apply_band_filter_to_order(order: np.ndarray, band: str) -> np.ndarray:
    """
    Given a ranking order (array of row indexes), return a filtered order
    containing only rows eligible for the grade band.
    """
    m = mask_for_grade_band(band, order)
    return order[m]


### 06.4 Write Production Artifacts (Global + Bay Area)(JSON) + Metadata 

This section writes the deployable, versioned JSON artifacts used by the frontend.

We export **two variants**:

- **Global** (`schools_top100_v1.json`) — full reference ranking universe
- **Bay Area MVP** (`schools_top100_v1_bayarea.json`) — filtered using HUD ZIP→County crosswalk

Both exports share:
- identical schema
- identical generation timestamp
- accompanying metadata JSON
- updated run manifest for reproducibility

This enables a Bay Area–scoped MVP UI while preserving a global baseline for
capstone evaluation and future expansion.

This cell writes the final production assets to the Notebook 08 artifacts folder.

Outputs:

1) `schools_top100_v1.json`
- All segment Top-100 lists
- All blend snap-point Top-100 lists
- Each item contains: `school_id`, `row_index`, `score`, `why`

2) `schools_top100_v1_meta.json`
- Version + timestamp
- Input file paths
- SHA-256 hashes of key config files (feature config + segments config)
- Matrix + index shapes
- Policy versions (tie-break + blend snap points)

Finally, the notebook run manifest is updated to include these artifact paths.


In [288]:
# ============================================
# 06.4 Write Production Artifacts (Global + Bay Area)
# ============================================

ARTIFACTS_DIR.mkdir(parents=True, exist_ok=True)

def sha256_file(path: Path) -> str:
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(1024 * 1024), b""):
            h.update(chunk)
    return h.hexdigest()

def normalize_zipcodes_in_payload(payload: dict) -> dict:
    """
    Ensure zipcode is exported as a zero-padded 5-char string for consistency.
    Works in-place; returns payload for convenience.
    """
    def fix_items(items: list):
        for it in items:
            z = it.get("zipcode", "")
            if z is None:
                it["zipcode"] = ""
            else:
                zs = str(z).strip()
                it["zipcode"] = zs.zfill(5) if zs != "" else ""

    # segments
    for seg_obj in payload.get("segments", {}).values():
        fix_items(seg_obj.get("items", []))

    # blends
    for blend_obj in payload.get("blends", {}).values():
        for alpha_obj in blend_obj.get("alphas", {}).values():
            fix_items(alpha_obj.get("items", []))

    return payload

# Normalize zipcode formatting (important: NY zips like 00501 keep leading zeros)
export_payload = normalize_zipcodes_in_payload(export_payload)
export_payload_bayarea = normalize_zipcodes_in_payload(export_payload_bayarea)

# -----------------------------
# Write payload JSONs
# -----------------------------
out_main = ARTIFACTS_DIR / "schools_top100_v1.json"
with open(out_main, "w") as f:
    json.dump(export_payload, f, indent=2)
print("Saved:", out_main)

out_bay = ARTIFACTS_DIR / "schools_top100_v1_bayarea.json"
with open(out_bay, "w") as f:
    json.dump(export_payload_bayarea, f, indent=2)
print("Saved:", out_bay)

# -----------------------------
# Metadata (global + bay area)
# -----------------------------
feature_cfg_hash = sha256_file(paths["feature_config_master_v2"])
segments_cfg_hash = sha256_file(paths["preference_segments_v0"])

# Optional: include crosswalk hash if you stored it under data/raw
zip_county_path = ROOT / "data" / "raw" / "zip_county.xlsx"
zip_county_hash = sha256_file(zip_county_path) if zip_county_path.exists() else None

base_meta = {
    "version": "v1",
    "generated_at_utc": export_payload["generated_at_utc"],  # keep same timestamp for both
    "topk": int(TOPK_EXPORT),
    "ui_fields": ["school_id", "name", "city", "state", "zipcode", "score", "why"],
    "source_inputs": {
        "feature_config_master_v2": str(paths["feature_config_master_v2"]),
        "preference_segments_v0": str(paths["preference_segments_v0"]),
        "school_matrix_v2": str(paths["school_matrix_v2"]),
        "school_index_v2": str(paths["school_index_v2"]),
        "schools_master_v2": str(paths["schools_master_v2"]),
        "school_vector_explain_v2": str(paths["school_vector_explain_v2"]),
        "hud_zip_county_xlsx": str(zip_county_path) if zip_county_path.exists() else None,
    },
    "hashes": {
        "feature_config_master_v2_sha256": feature_cfg_hash,
        "preference_segments_v0_sha256": segments_cfg_hash,
        "hud_zip_county_xlsx_sha256": zip_county_hash,
    },
    "shapes": {
        "matrix": list(X.shape),
        "index": list(index_df.shape),
        "schools_master": list(schools_master_df.shape),
    },
    "policies": {
        "tie_breaker_policy_version": tie_policy.policy_version if "tie_policy" in globals() else "unknown",
        "tie_breaker_epsilon": tie_policy.epsilon if "tie_policy" in globals() else None,
        "blend_policy": {
            "snap_points": BLEND_POINTS,
            "blend_pairs": BLEND_PAIRS,
        },
    },
}

meta_global = dict(base_meta)
meta_global["region"] = "global"
meta_global["notes"] = {"bay_area_filter": "none"}

meta_bay = dict(base_meta)
meta_bay["region"] = "bay_area"
meta_bay["notes"] = {
    "bay_area_filter": "HUD ZIP→County crosswalk; CA + Bay Area counties by 5-digit county FIPS",
    "bay_area_county_fips_5": sorted(list(BAY_AREA_COUNTY_FIPS_5)) if "BAY_AREA_COUNTY_FIPS_5" in globals() else None,
    "bay_area_zip_count": int(len(bay_area_zips)) if "bay_area_zips" in globals() else None,
}

out_meta = ARTIFACTS_DIR / "schools_top100_v1_meta.json"
with open(out_meta, "w") as f:
    json.dump(meta_global, f, indent=2)
print("Saved:", out_meta)

out_meta_bay = ARTIFACTS_DIR / "schools_top100_v1_bayarea_meta.json"
with open(out_meta_bay, "w") as f:
    json.dump(meta_bay, f, indent=2)
print("Saved:", out_meta_bay)

# -----------------------------
# Update run manifest (same style as your notebook)
# -----------------------------
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
m["outputs"].update({
    "artifacts.section06.schools_top100_v1": str(out_main),
    "artifacts.section06.schools_top100_v1_meta": str(out_meta),
    "artifacts.section06.schools_top100_v1_bayarea": str(out_bay),
    "artifacts.section06.schools_top100_v1_bayarea_meta": str(out_meta_bay),
})
with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)
print("Updated manifest:", manifest_path)

# -----------------------------
# Sanity readback (global + bay area)
# -----------------------------
with open(out_main, "r") as f:
    payload_check_global = json.load(f)
with open(out_bay, "r") as f:
    payload_check_bay = json.load(f)

print("\nReadback sanity (GLOBAL):")
print("segments:", list(payload_check_global["segments"].keys()))
print("blends:", list(payload_check_global["blends"].keys()))
first_seg = list(payload_check_global["segments"].keys())[0]
print("first segment first item:", payload_check_global["segments"][first_seg]["items"][0])

print("\nReadback sanity (BAY AREA):")
print("segments:", list(payload_check_bay["segments"].keys()))
print("blends:", list(payload_check_bay["blends"].keys()))
first_seg_b = list(payload_check_bay["segments"].keys())[0]
first_item_b = payload_check_bay["segments"][first_seg_b]["items"][0]
print("first segment first item:", first_item_b)

# hard checks: Bay Area item must be CA + zip in bay_area_zips + zipcode string
assert first_item_b["state"] == "CA", "Bay Area readback: non-CA item found"
assert str(first_item_b["zipcode"]).zfill(5) in bay_area_zips, "Bay Area readback: non-BayArea ZIP found"
assert isinstance(first_item_b["zipcode"], str), "Bay Area readback: zipcode should be a string"
print("Bay Area readback checks: PASS")


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/schools_top100_v1.json
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/schools_top100_v1_bayarea.json
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/schools_top100_v1_meta.json
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/schools_top100_v1_bayarea_meta.json
Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json

Readback sanity (GLOBAL):
segments: ['academic_first', 'small_nurturing', 'progressive_balanced', 'balanced_general']
blends: ['academic_first__blend__small_nurturing', 'academic_first__blend__progressive_balanced']
first segment first item: {'school_id': 'PRI_BB1

### 06.4b Export Grade-band Artifacts (Bay Area MVP) 

To support parent-facing recommendations, we export Bay Area Top-100 rankings
for each grade band:

- PK/K (proxy via serves_elementary)
- Elementary
- Middle
- High

The frontend selects the correct artifact based on the child’s grade band.


In [291]:
# ============================================
# 06.4b Export Bay Area grade-band artifacts 
# ============================================

import json

def build_bayarea_payload_for_band(grade_band: str) -> dict:
    payload = {
        "version": "v1",
        "generated_at_utc": export_payload["generated_at_utc"],
        "topk": int(TOPK_EXPORT),
        "region": "bay_area",
        "grade_band": grade_band,  # key for UI
        "segments": {},
        "blends": {}
    }

    # segments
    for seg in export_spec["segments"]:
        why = build_short_explanation(seg)
        scores = baseline[seg]["scores"]
        order = baseline[seg]["order"]

        payload["segments"][seg] = {
            "label": seg,
            "why": why,
            "items": rows_to_export_payload(
                order, scores, why, TOPK_EXPORT,
                bay_area_only=True,
                grade_band=grade_band
            ),
        }

    # blends
    for b in export_spec["blends"]:
        name = b["name"]
        seg_a = b["seg_a"]
        seg_b = b["seg_b"]
        payload["blends"][name] = {"seg_a": seg_a, "seg_b": seg_b, "alphas": {}}

        for a in b["alphas"]:
            scores, order = score_and_rank_blend(seg_a, seg_b, a)
            why = f"Blend of {seg_a} and {seg_b} (alpha={a:.2f})."

            payload["blends"][name]["alphas"][f"{a:.2f}"] = {
                "alpha": float(a),
                "label": f"{name}__a{a:.2f}",
                "why": why,
                "items": rows_to_export_payload(
                    order, scores, why, TOPK_EXPORT,
                    bay_area_only=True,
                    grade_band=grade_band
                ),
            }

    return payload

# write one file per grade band
band_files = {}
for band in GRADE_BANDS.keys():
    out_path = ARTIFACTS_DIR / f"schools_top100_v1_bayarea__{band}.json"
    payload = build_bayarea_payload_for_band(band)

    with open(out_path, "w") as f:
        json.dump(payload, f, indent=2)

    band_files[band] = str(out_path)
    print("Saved:", out_path, "| first seg items:", len(payload["segments"][export_spec["segments"][0]]["items"]))

# add to manifest outputs (same style as yours)
manifest_path = ARTIFACTS_DIR / f"run_manifest_{SYSTEM_VERSION}.json"
m = load_json(manifest_path)
m.setdefault("outputs", {})
for band, p in band_files.items():
    m["outputs"][f"artifacts.section06.schools_top100_v1_bayarea__{band}"] = p

with open(manifest_path, "w") as f:
    json.dump(m, f, indent=2)
print("Updated manifest:", manifest_path)


Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/schools_top100_v1_bayarea__pk_k.json | first seg items: 100
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/schools_top100_v1_bayarea__elementary.json | first seg items: 100
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/schools_top100_v1_bayarea__middle.json | first seg items: 100
Saved: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/schools_top100_v1_bayarea__high.json | first seg items: 100
Updated manifest: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/artifacts/notebook08/run_manifest_v1.json


## 07. Summary & Forward Roadmap 

Notebook 08 completed the transition from a validated deterministic system
(Notebook 07) to a **launch-ready** deterministic system with safety checks,
diagnostic learning, blending, and production exports.

---

### What We Added in Notebook 08

**1) Calibration diagnostics (without ground truth)**
- Measured tie density, rank volatility, and segment-specific fragility
- Verified tier behavior using known baselines (IB, CAIS, Montessori, Waldorf)

**2) Deterministic fixes (no ML authority)**
- Added a stable tie-break policy to produce deterministic ordering
- Implemented lexicographic tie-breaking where secondary dense features exist
- Preserved Top-K membership guardrails to avoid silent behavior changes

**3) Capstone science (learning as a consultant)**
- Correlation analysis to detect redundancy (diagnostic only)
- PCA to understand variance contribution and feature structure (diagnostic only)
- Produced recommendations for simplification without changing ranking authority

**4) Segment blending (startup “killer feature”)**
- Enabled continuous personalization via linear blending:
  \[
  \vec{W}_{blend} = \alpha \vec{W}_A + (1-\alpha)\vec{W}_B
  \]
- Used snap-to-safe blend points for launch stability
- Validated blended ranking behavior and surfaced high-drift cases for UX care

**5) Safety regression suite (launch gate)**
- Tier endpoint regression: PASS
- Grade-span hard guardrails: PASS (review-only drift surfaced)
- Wide Top-N envelope for candidate pool: PASS

**6) Production export artifacts**
- Generated precomputed Top-100 lists for:
  - all segments
  - blend snap points
- Exported versioned JSON with hashes + metadata for reproducibility:
  - `schools_top100_v1.json`
  - `schools_top100_v1_meta.json`

---

### What Remains Deterministic by Design

- The scoring function is deterministic and explicit
- Tier logic is not overridden
- Learning is used only for diagnostics, simplification suggestions, and audits
- No outcome prediction, no click-based learning, no opaque ranking model

---

### Post-Launch Roadmap (Safe Evolution)

**Phase 1: Collect ground-truth signals**
- user saves / hides / “this fits” feedback
- refine segment weights via explicit human-in-the-loop calibration

**Phase 2: Improve explainability**
- map feature names to user-friendly labels
- add per-school “top contributing signals” explanations

**Phase 3: Expand feature space**
- add richer continuous signals (academics, logistics, programs)
- tighten grade-span enforcement using requested grade targets

**Phase 4: Learning (carefully bounded)**
- use weak supervision to propose weight adjustments
- always keep deterministic policy as the final authority

---

> Determinism builds trust.  
> Learning improves structure.  
> Control preserves safety.


### 07.1 Run Summary & Reproducibility Check  

This final section prints a compact, human-readable summary of the current run.
It serves three purposes:

1. **Reproducibility** — confirms inputs, outputs, and timestamps
2. **Auditability** — makes it easy to review what artifacts were generated
3. **Capstone hygiene** — demonstrates disciplined experiment tracking

This section does not perform any computation or ranking.
It is a read-only summary of the run manifest.


In [146]:
# ============================================
# 07.1 Run Summary & Reproducibility Check
# ============================================

print("=== Notebook 08 — Run Summary ===")

manifest_path = ARTIFACTS_DIR / "run_manifest_v1.json"
assert manifest_path.exists(), "Run manifest not found."

with open(manifest_path, "r") as f:
    manifest = json.load(f)

# Core metadata
print("\n--- Run Metadata ---")
print(f"run_ts_utc : {manifest.get('run_ts_utc')}")
print(f"root_dir  : {manifest.get('root_dir')}")
print(f"version   : {manifest.get('version', 'v1')}")

# Inputs
print("\n--- Inputs ---")
for k, v in manifest.get("inputs", {}).items():
    print(f"- {k}: {v}")

# Outputs
print("\n--- Outputs ---")
for k, v in manifest.get("outputs", {}).items():
    print(f"- {k}: {v}")

# Artifacts
print("\n--- Artifacts ---")
for a in manifest.get("artifacts", []):
    print(f"- {a}")

# Guardrails
print("\n--- Safety & Guardrails ---")
for g in manifest.get("checks", []):
    status = g.get("status", "UNKNOWN")
    print(f"- {g.get('name')}: {status}")

print("\n=== End of Notebook 08 ===")


=== Notebook 08 — Run Summary ===

--- Run Metadata ---
run_ts_utc : 2025-12-27T18:51:15Z
root_dir  : None
version   : v1

--- Inputs ---
- inputs.section01.school_matrix_v2: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/school_matrix_v2.npy
- inputs.section01.school_index_v2: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/school_index_v2.csv
- inputs.section01.feature_config_master_v2: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/data/processed/feature_config_master_v2.json
- inputs.section01.preference_segments_v0: /Users/jennifer-david/Documents/work/SpringBoard/projects/Capstone Projects/smart-school/config/preference_segments_v0.json
- hash.school_matrix_v2: 9e868835457d58c14c149ad06298f8a50952b6c5a4f6483bd6d6029473c83203
- hash.feature_config_master_v2: 541f56708eb9c2f57de733fc44cd747f5a26a1490e12b709eb45f3a186a993a8
- hash