### 02b_matrix_coverage_fast_ops — end-to-end, sparse-matrix engine 

**Notebook purpose (plain language)**  
This notebook swaps the *engine* that finds minimum times in `02a_coverage`.  
Instead of pandas group-bys on the long travel table, we build two *sparse* matrices and use fast, vectorised reductions. All inputs, thresholds, blue-light factors, KPIs, and maps stay the same—only the internals get faster and more scalable, enabling instant “what-if” scenarios.

---

#### What this notebook does

- **Builds matrices (reshape only, no routing):**  
  - **R** (response): rows = demand LSOAs, cols = station LSOAs, values = minutes station→LSOA.  
  - **C** (conveyance): rows = demand LSOAs, cols = acute LSOAs, values = minutes LSOA→acute.
- **Computes nearest times (vectorised, no loops):**  
  - `t_resp = rowwise_min(R[:, active_stations])`  
  - `t_conv = rowwise_min(C[:, active_acutes])`
- **Applies business rules:** blue-light factors applied *after* minima (per-leg), thresholds as in 02a.
- **Outputs unchanged:** coverage KPIs (% pop within 7/15 and 18/40), binary coverage columns, maps.
- **Adds optional diagnostic:** `t_total = t_resp + on_scene_buffer + t_conv` (three-leg view).
- **Enables scenarios:** select different station sets by column subset—no re-grouping or re-reading.

---

#### Inputs (same as 02a)

- LSOA universe (`lsoa_index`), centroids, populations (+ optional IMD / rural-urban).  
- Station & acute site files resolved to LSOA codes.  
- Long-form travel-time table already in the repo (response & conveyance legs).  
- Thresholds & blue-light factors (ARP, handover, conveyance) defined up front.

---

#### Outputs

- KPIs for response & conveyance at configured thresholds (overall and, optionally, by IMD/rural-urban).  
- Binary coverage columns per threshold for mapping.  
- (Optional) End-to-end time columns for transparency in pathway discussions.

---

#### Performance & storage

- **Sparse CSR** matrices for R and C; optional “has-edge” masks to distinguish true zeros from missing pairs.  
- **Radius thinning** (e.g., drop times > 60 min) to shrink matrices without losing feasible options.  
- **Caching:** save matrices + ordered labels to `data/.../matrices/*.npz` for instant reloads.

---

#### Scenario selector

Define scenarios as sets of active station/acute columns (e.g., `baseline`, `baseline + Site X`).  
Switching scenario = re-taking per-row minima → near-instant “what-if” diffs and coverage deltas.

---

#### Validation (first run)

- **Parity check vs 02a:** times and KPIs should match within tight tolerance on the Cornwall slice.  
- After validation, retire the legacy `min_time_from_any_origin` calls in this notebook.

---

#### Notes & cautions

- Any change to travel-time inputs **invalidates caches** → rebuild matrices.  
- LSOAs with no reachable station/acute remain at **∞** and are reported explicitly.  
- Keep LSOA codes categorical and ordering stable to avoid misalignment bugs.

---

#### Quick explainer (02a → 02b)

| Area / Step | 02a does now | 02b change | Benefit |
|---|---|---|---|
| Min times | pandas group-by on long table | rowwise min on sparse matrices | Much faster; scalable; no loops |
| Scenarios | re-filter + re-group | column subset on R/C | Instant “what-if” |
| Blue-light | applied in KPIs | apply after minima per leg | Correct nearest selection |
| Outputs | KPIs, maps | same | No UX change |



In [1]:
# Step 1 — Imports & file paths

from pathlib import Path
import platform
from datetime import datetime

import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt

# Project root for this ICB slice
DATA_ROOT = Path(
    "/Users/rosstaylor/Downloads/Code Repositories/REACH Map (NHS SW)/"
    "GitHub Repo/REACH-Map-NHS-SW/data/raw/test_data_ICB_level"
)

# Inputs
LOOKUP_CSV   = DATA_ROOT / "cornwall_icb_lsoa_lookup.csv"
AGE_GPKG     = DATA_ROOT / "demographics_age_continuous_icb.gpkg"
AGE_LAYER    = "LSOA_continuous_age_icb"
TRAVEL_CSV   = DATA_ROOT / "travel_matrix_lsoa_icb.csv"
STATIONS_CSV = DATA_ROOT / "ambulance_stations_icb.csv"
ACUTE_CSV    = DATA_ROOT / "acute_hospitals_icb.csv"   # optional overlay

# Outputs
MATRICES_DIR = DATA_ROOT / "matrices"
MAPS_DIR     = DATA_ROOT / "maps"
TABLES_DIR   = DATA_ROOT / "tables"
for d in (MATRICES_DIR, MAPS_DIR, TABLES_DIR):
    d.mkdir(parents=True, exist_ok=True)

# Display prefs
pd.set_option("display.width", 120)
pd.set_option("display.max_columns", 120)


In [2]:
# Parameters: ARP thresholds, handover bands, blue-light factors

# ARP response standards (minutes)
RESP = {
    "cat1": {"mean": 7,  "p90": 15},
    "cat2": {"mean": 18, "p90": 40},
    # optional completeness
    "cat3": {"p90": 120},
    "cat4": {"p90": 180},
}

# Convenience tuple for KPI lookups in this notebook
RESPONSE_THRESHOLDS = (
    RESP["cat1"]["mean"], RESP["cat1"]["p90"],
    RESP["cat2"]["mean"], RESP["cat2"]["p90"]
)  # -> (7, 15, 18, 40)

# Hospital handover/turnaround (NOT scene→A&E drive time)
HANDOVER = {"target": 15, "breach": 30, "severe": 60}
HANDOVER_THRESHOLDS = (HANDOVER["target"], HANDOVER["breach"], HANDOVER["severe"])

# Scene→A&E conveyance bands (geographic potential; no national target)
SCENE_TO_AE_THRESHOLDS = (30, 45, 60)

# Optional blue-light multipliers (applied AFTER minima per leg)
BLUE_LIGHT_FACTOR_RESPONSE = 1.0
BLUE_LIGHT_FACTOR_CONVEY   = 1.0

# End-to-end diagnostic buffer (minutes) for t_total (set >0 if you want to show it)
ON_SCENE_BUFFER_MIN = 0.0

# CRS for mapping
TARGET_CRS = "EPSG:27700"


In [4]:
# Step 1 — Load & align core data
# What this does:
# - Loads the LSOA universe, population & geometry, long travel table, and site lists.
# - Cleans obvious time issues (negative, off-diagonal zeros).
# - Produces aligned indices, population vector, and site code arrays for later steps.

from __future__ import annotations
from typing import Iterable, Dict
import re

# ---- small helpers ----
def _ok(msg: str) -> None:   print(f"[OK] {msg}")
def _warn(msg: str) -> None: print(f"[WARN] {msg}")
def _fail(msg: str) -> None: raise AssertionError(msg)
def _expect_columns(df: pd.DataFrame, cols: Iterable[str], label: str) -> None:
    missing = [c for c in cols if c not in df.columns]
    if missing: _fail(f"{label}: missing columns {missing}")

# ---- 1.1 LSOA universe ----
lookup = pd.read_csv(LOOKUP_CSV, dtype={"lsoa_code": "string"})
_expect_columns(lookup, ["lsoa_code"], "LSOA lookup")
lookup = lookup.drop_duplicates(subset=["lsoa_code"]).copy()
lsoa_index = pd.Index(lookup["lsoa_code"].astype("string"), name="lsoa_code")
if lsoa_index.empty or not lsoa_index.is_unique:
    _fail("LSOA lookup must provide a non-empty, unique list of LSOA codes.")
_ok(f"Universe: {len(lsoa_index):,} LSOAs")

# ---- 1.2 Population & geometry ----
lsoa_g = gpd.read_file(AGE_GPKG, layer=AGE_LAYER)
_expect_columns(lsoa_g, ["lsoa_code", "geometry"], "Age GPKG")
lsoa_g["lsoa_code"] = lsoa_g["lsoa_code"].astype("string")

# Prefer 'population_total' if present, else sum continuous age columns
pop_col = next((c for c in ("population_total", "population") if c in lsoa_g.columns), None)
if pop_col:
    population = lsoa_g.set_index("lsoa_code")[pop_col].astype("float64")
else:
    age_cols = []
    for c in lsoa_g.columns:
        if c in ("lsoa_code", "geometry"): continue
        if (re.fullmatch(r"\d{1,3}\+?", str(c)) or str(c).startswith("age_")) and np.issubdtype(lsoa_g[c].dtype, np.number):
            age_cols.append(c)
    if not age_cols:
        _fail("No 'population_total' nor numeric age columns found.")
    population = lsoa_g.set_index("lsoa_code")[age_cols].sum(axis=1).astype("float64")

# Align population & geometry to universe
population = population.reindex(lsoa_index).fillna(0.0)
lsoa_g = (
    lsoa_g[["lsoa_code", "geometry"]]
    .drop_duplicates("lsoa_code")
    .set_index("lsoa_code")
    .reindex(lsoa_index)
)
lsoa_g = gpd.GeoDataFrame(lsoa_g, geometry="geometry", crs=lsoa_g.crs)
_ok(f"Population sum={int(population.sum()):,}; CRS={lsoa_g.crs}")

# ---- 1.3 Travel table (load → normalise → clean) ----
travel = pd.read_csv(
    TRAVEL_CSV,
    dtype={"origin_lsoa": "string", "dest_lsoa": "string"},
)
# Normalise minutes column to 'time_car_min'
time_col = next((c for c in ("time_car_min", "time_min", "minutes", "drive_min", "t_min") if c in travel.columns), None)
if time_col is None:
    _fail("Travel CSV must include a minutes column (e.g., 'time_car_min').")
travel = travel.rename(columns={time_col: "time_car_min"})
_expect_columns(travel, ["origin_lsoa", "dest_lsoa", "time_car_min"], "Travel CSV")

# Types & NA
travel["origin_lsoa"] = travel["origin_lsoa"].astype("string")
travel["dest_lsoa"]   = travel["dest_lsoa"].astype("string")
travel["time_car_min"] = pd.to_numeric(travel["time_car_min"], errors="coerce").astype("float32")
travel = travel.dropna(subset=["origin_lsoa", "dest_lsoa", "time_car_min"]).copy()

# Keep only rows inside universe
in_uni = travel["origin_lsoa"].isin(lsoa_index) & travel["dest_lsoa"].isin(lsoa_index)
dropped = int((~in_uni).sum())
if dropped: _warn(f"Dropping {dropped:,} travel rows outside universe.")
travel = travel.loc[in_uni].copy()

# Clean non-positive and huge times (policy: floor off-diagonal zeros to 0.5; drop negatives)
is_diag = travel["origin_lsoa"] == travel["dest_lsoa"]
offdiag_zero = (travel["time_car_min"] <= 0) & (~is_diag)
neg_rows = int((travel["time_car_min"] < 0).sum())
if neg_rows:
    travel = travel.loc[travel["time_car_min"] >= 0].copy()
    _warn(f"Dropped {neg_rows:,} negative-minute rows.")
floored = int(offdiag_zero.sum())
if floored:
    travel.loc[offdiag_zero, "time_car_min"] = np.float32(0.5)
    _warn(f"Floored {floored:,} off-diagonal zero-minute rows to 0.5 min.")
_ok(f"Travel rows={len(travel):,}; origins={travel['origin_lsoa'].nunique():,}; dests={travel['dest_lsoa'].nunique():,}")

# ---- 1.4 Sites (stations & acutes → LSOA codes) ----
def _load_site_lsoas(csv_path: Path, label: str) -> pd.Index:
    if not csv_path.exists():
        if label.lower().startswith("acute"):
            _warn("Acute CSV not found; conveyance leg optional.")
            return pd.Index([], dtype="string", name="lsoa_code")
        _fail(f"{label} CSV not found: {csv_path}")
    df = pd.read_csv(csv_path)
    df.columns = [c.strip().lower() for c in df.columns]
    code_col = next((c for c in ("lsoa_code", "lsoa21cd") if c in df.columns), None)
    if code_col is None: _fail(f"{label}: expected 'lsoa_code' or 'lsoa21cd'.")
    codes = pd.Index(df[code_col].astype("string"), name="lsoa_code").dropna().drop_duplicates()
    codes = codes[codes.isin(lsoa_index)]
    if codes.empty: _warn(f"{label}: no valid LSOAs after filtering to universe.")
    return codes

station_lsoas = _load_site_lsoas(STATIONS_CSV, "Ambulance stations")
acute_lsoas   = _load_site_lsoas(ACUTE_CSV,    "Acute hospitals")
_ok(f"Stations={len(station_lsoas)} LSOAs; Acutes={len(acute_lsoas)} LSOAs")

# ---- 1.5 Integer maps for later matrix ops ----
lsoa_to_idx: Dict[str, int] = {code: i for i, code in enumerate(lsoa_index)}
idx_to_lsoa = lsoa_index.to_numpy()
station_idx = np.array([lsoa_to_idx[c] for c in station_lsoas], dtype=np.int32) if len(station_lsoas) else np.array([], dtype=np.int32)
acute_idx   = np.array([lsoa_to_idx[c] for c in acute_lsoas],   dtype=np.int32) if len(acute_lsoas)   else np.array([], dtype=np.int32)

# ---- 1.6 Quick readout ----
print("\n== STEP 1 SUMMARY ==")
print(pd.Series({
    "n_lsoas": len(lsoa_index),
    "population_sum": int(population.sum()),
    "travel_rows": len(travel),
    "unique_origins": travel["origin_lsoa"].nunique(),
    "unique_dests": travel["dest_lsoa"].nunique(),
    "n_station_lsoas": len(station_lsoas),
    "n_acute_lsoas": len(acute_lsoas),
}).to_string())
_ok("Step 1 complete — data aligned and cleaned.")


[OK] Universe: 336 LSOAs
[OK] Population sum=575,628; CRS=EPSG:27700
[WARN] Floored 41 off-diagonal zero-minute rows to 0.5 min.
[OK] Travel rows=112,560; origins=336; dests=336
[OK] Stations=14 LSOAs; Acutes=3 LSOAs

== STEP 1 SUMMARY ==
n_lsoas               336
population_sum     575628
travel_rows        112560
unique_origins        336
unique_dests          336
n_station_lsoas        14
n_acute_lsoas           3
[OK] Step 1 complete — data aligned and cleaned.


In [5]:
# Step 2 — Build & cache sparse matrices (R: station→LSOA, C: LSOA→acute)
# What this does:
# - Reshapes the cleaned long travel table into CSR sparse matrices.
# - Optionally thins by a max-radius (minutes).
# - Caches matrices + metadata for instant reloads next time.

from __future__ import annotations
from typing import Dict, Tuple
from scipy import sparse

# Optional: set a max radius to thin pairs (e.g., 60.0). Use None to keep all pairs.
MAX_RADIUS_MIN: float | None = None  # set to 60.0 if you want to prune long pairs

def build_csr(
    travel_df: pd.DataFrame,
    row_codes: pd.Index,       # demand LSOAs (rows)
    col_codes: pd.Index,       # site LSOAs for this leg (columns)
    row_key: str,              # column in travel_df for rows
    col_key: str,              # column in travel_df for cols
    value_key: str = "time_car_min",
    max_radius: float | None = MAX_RADIUS_MIN,
) -> sparse.csr_matrix:
    if len(col_codes) == 0:
        return sparse.csr_matrix((len(row_codes), 0), dtype=np.float32)
    # Filter to needed pairs
    m = travel_df[row_key].isin(row_codes) & travel_df[col_key].isin(col_codes)
    df = travel_df.loc[m, [row_key, col_key, value_key]].dropna(subset=[value_key]).copy()
    if max_radius is not None:
        df = df.loc[df[value_key] <= float(max_radius)].copy()
    # Group duplicates to min
    df = df.groupby([row_key, col_key], observed=True, sort=False)[value_key].min().reset_index()
    # Integer maps for this matrix
    rmap: Dict[str, int] = {c: i for i, c in enumerate(row_codes.astype("string"))}
    cmap: Dict[str, int] = {c: j for j, c in enumerate(col_codes.astype("string"))}
    rows = df[row_key].map(rmap).to_numpy(dtype=np.int32, na_value=-1)
    cols = df[col_key].map(cmap).to_numpy(dtype=np.int32, na_value=-1)
    vals = df[value_key].astype(np.float32).to_numpy()
    good = (rows >= 0) & (cols >= 0) & np.isfinite(vals)
    rows, cols, vals = rows[good], cols[good], vals[good]
    mat = sparse.coo_matrix((vals, (rows, cols)), shape=(len(row_codes), len(col_codes)), dtype=np.float32).tocsr()
    return mat

# Build matrices
R = build_csr(  # station → demand LSOA
    travel_df=travel, row_codes=lsoa_index, col_codes=station_lsoas,
    row_key="dest_lsoa", col_key="origin_lsoa", value_key="time_car_min",
)
C = build_csr(  # demand LSOA → acute
    travel_df=travel, row_codes=lsoa_index, col_codes=acute_lsoas,
    row_key="origin_lsoa", col_key="dest_lsoa", value_key="time_car_min",
)

# Report shapes and densities
def _report(name: str, mat: sparse.csr_matrix):
    m, n, nnz = mat.shape[0], mat.shape[1], mat.nnz
    dens = (nnz / (m * n)) if (m > 0 and n > 0) else 0.0
    print(f"[OK] {name}: shape={mat.shape}, nnz={nnz:,}, density={dens:.4f}")
_report("R (station→LSOA)", R)
_report("C (LSOA→acute)",  C)

# Cache matrices & metadata
sparse.save_npz(MATRICES_DIR / "R_response_csr.npz", R)
sparse.save_npz(MATRICES_DIR / "C_convey_csr.npz",   C)
np.savez(
    MATRICES_DIR / "matrix_metadata.npz",
    lsoa_index=lsoa_index.to_numpy(),
    station_lsoas=station_lsoas.to_numpy(),
    acute_lsoas=acute_lsoas.to_numpy(),
    response_thresholds=np.array(RESPONSE_THRESHOLDS, dtype=np.int32),
    convey_thresholds=np.array(SCENE_TO_AE_THRESHOLDS, dtype=np.int32),
    blue_light=np.array([BLUE_LIGHT_FACTOR_RESPONSE, BLUE_LIGHT_FACTOR_CONVEY], dtype=np.float32),
    max_radius=np.array([np.nan if MAX_RADIUS_MIN is None else float(MAX_RADIUS_MIN)], dtype=np.float32),
)
_ok(f"Cached matrices & metadata → {MATRICES_DIR}")


[OK] R (station→LSOA): shape=(336, 14), nnz=4,690, density=0.9970
[OK] C (LSOA→acute): shape=(336, 3), nnz=1,005, density=0.9970
[OK] Cached matrices & metadata → /Users/rosstaylor/Downloads/Code Repositories/REACH Map (NHS SW)/GitHub Repo/REACH-Map-NHS-SW/data/raw/test_data_ICB_level/matrices


In [6]:
# Step 3 — Vectorised minima, KPIs, and baseline exports
# What this does:
# - Computes nearest response/convey times (row-wise mins, no loops).
# - Applies blue-light AFTER minima.
# - Produces population-weighted coverage KPIs and exports tidy tables.

from __future__ import annotations
from typing import Sequence
from scipy.sparse import csr_matrix

# Row-wise minimum in CSR without densifying
def rowmin_csr(mat: csr_matrix) -> np.ndarray:
    """Return per-row minima; rows with no entries → +inf."""
    m = mat.shape[0]
    if mat.shape[1] == 0:
        return np.full(m, np.inf, dtype=np.float32)
    mins = np.full(m, np.inf, dtype=np.float32)
    indptr, data = mat.indptr, mat.data
    for i in range(m):
        start, end = indptr[i], indptr[i+1]
        if start < end:
            mins[i] = data[start:end].min()
    return mins

# Nearest times (use all columns for baseline)
baseline_station_cols = np.arange(R.shape[1], dtype=np.int32)
baseline_acute_cols   = np.arange(C.shape[1], dtype=np.int32)

t_resp = rowmin_csr(R[:, baseline_station_cols]) * np.float32(BLUE_LIGHT_FACTOR_RESPONSE)
t_conv = rowmin_csr(C[:, baseline_acute_cols])   * np.float32(BLUE_LIGHT_FACTOR_CONVEY)
t_total = (t_resp + ON_SCENE_BUFFER_MIN + t_conv).astype(np.float32)

# Pack into a tidy frame
out_df = pd.DataFrame({
    "lsoa_code": lsoa_index,
    "t_resp_min": t_resp,
    "t_conv_min": t_conv,
    "t_total_min": t_total,
}).set_index("lsoa_code")

# Boolean coverage flags for mapping/summary
for thr in RESPONSE_THRESHOLDS:
    out_df[f"resp_le_{thr}"] = (out_df["t_resp_min"] <= thr).astype("uint8")
for thr in SCENE_TO_AE_THRESHOLDS:
    out_df[f"conv_le_{thr}"] = (out_df["t_conv_min"] <= thr).astype("uint8")

# KPI helper
def coverage_table(times: pd.Series | np.ndarray, thresholds: Sequence[int], weights: pd.Series) -> pd.DataFrame:
    arr = times.to_numpy() if isinstance(times, pd.Series) else np.asarray(times)
    w = weights.reindex(lsoa_index).to_numpy(dtype=np.float64)
    tot = w.sum() if w.sum() > 0 else 1.0
    rows = []
    for thr in thresholds:
        covered = (arr <= thr)
        rows.append({"threshold_min": int(thr), "pct_population": round(float((w*covered).sum()/tot*100.0), 2)})
    return pd.DataFrame(rows)

# Compute KPIs
resp_kpis = coverage_table(out_df["t_resp_min"], RESPONSE_THRESHOLDS, population)
conv_kpis = coverage_table(out_df["t_conv_min"], SCENE_TO_AE_THRESHOLDS, population)

# Exports
times_path = TABLES_DIR / "times_baseline.csv"
resp_kpi_path = TABLES_DIR / "coverage_response_baseline.csv"
conv_kpi_path = TABLES_DIR / "coverage_conveyance_baseline.csv"
by_lsoa_path = TABLES_DIR / "coverage_by_lsoa_baseline.csv"

out_df.reset_index().to_csv(times_path, index=False)
resp_kpis.to_csv(resp_kpi_path, index=False)
conv_kpis.to_csv(conv_kpi_path, index=False)
by_lsoa = out_df.copy()
by_lsoa.insert(0, "lsoa_code", by_lsoa.index)
by_lsoa["population"] = population.reindex(by_lsoa.index).astype(int).to_numpy()
by_lsoa.to_csv(by_lsoa_path, index=False)

# Simple readout
def _summ(name: str, arr: np.ndarray) -> str:
    finite = np.isfinite(arr)
    if not finite.any(): return f"{name}: all inf"
    q = np.percentile(arr[finite], [0,25,50,90,95,100]).round(2)
    return f"{name}: min={q[0]}, p25={q[1]}, med={q[2]}, p90={q[3]}, p95={q[4]}, max={q[5]}"

print("\n== STEP 3 SUMMARY ==")
print(_summ("t_resp", t_resp))
print(_summ("t_conv", t_conv))
print("Response KPIs:\n", resp_kpis.to_string(index=False))
print("Conveyance KPIs:\n", conv_kpis.to_string(index=False))
_ok("Step 3 complete — minima computed and KPIs exported.")



== STEP 3 SUMMARY ==
t_resp: min=0.5, p25=5.7, med=11.22, p90=24.64, p95=28.76, max=92.22
t_conv: min=0.5, p25=16.95, med=29.28, p90=74.47, p95=80.82, max=129.12
Response KPIs:
  threshold_min  pct_population
             7           30.77
            15           62.33
            18           75.15
            40           99.26
Conveyance KPIs:
  threshold_min  pct_population
            30           51.77
            45           69.35
            60           79.03
[OK] Step 3 complete — minima computed and KPIs exported.


In [7]:
# Step 4 — Scenario scaffold (define → run → export)
# What this does:
# - Lets you create “what-if” station sets (add/remove bases) via column selection.
# - Reuses the same vectorised minima & KPI functions for instant diffs.

from __future__ import annotations
from dataclasses import dataclass

# Lookups for station/acute columns
station_col_lookup: Dict[str, int] = {code: j for j, code in enumerate(station_lsoas)}
acute_col_lookup:   Dict[str, int] = {code: j for j, code in enumerate(acute_lsoas)}

@dataclass(frozen=True)
class Scenario:
    name: str
    station_cols: np.ndarray  # indices into R's columns
    acute_cols:   np.ndarray  # indices into C's columns

def run_scenario(scn: Scenario) -> dict[str, pd.DataFrame]:
    t_r = rowmin_csr(R[:, scn.station_cols]) * np.float32(BLUE_LIGHT_FACTOR_RESPONSE)
    t_c = rowmin_csr(C[:, scn.acute_cols])   * np.float32(BLUE_LIGHT_FACTOR_CONVEY)
    times = pd.DataFrame({"lsoa_code": lsoa_index, "t_resp_min": t_r, "t_conv_min": t_c})
    return {
        "times": times,
        "resp_kpis": coverage_table(times.set_index("lsoa_code")["t_resp_min"], RESPONSE_THRESHOLDS, population),
        "conv_kpis": coverage_table(times.set_index("lsoa_code")["t_conv_min"], SCENE_TO_AE_THRESHOLDS, population),
    }

# Baseline scenario (all current columns)
SCENARIOS = [
    Scenario("baseline",
             station_cols=np.arange(R.shape[1], dtype=np.int32),
             acute_cols=np.arange(C.shape[1], dtype=np.int32)),
    # Example: add a station by code (uncomment & replace):
    # Scenario("add_E01XXXXXX",
    #          station_cols=np.sort(np.unique(np.r_[np.arange(R.shape[1]), station_col_lookup["E01XXXXXX"]])).astype(np.int32),
    #          acute_cols=np.arange(C.shape[1], dtype=np.int32)),
]

# Run & export
rows = []
for scn in SCENARIOS:
    res = run_scenario(scn)
    res["times"].to_csv(TABLES_DIR / f"times_{scn.name}.csv", index=False)
    for _, r in res["resp_kpis"].iterrows():
        rows.append({"scenario": scn.name, "leg": "response", "threshold_min": int(r["threshold_min"]), "pct_population": float(r["pct_population"])})
    for _, r in res["conv_kpis"].iterrows():
        rows.append({"scenario": scn.name, "leg": "conveyance", "threshold_min": int(r["threshold_min"]), "pct_population": float(r["pct_population"])})

scen_kpis = pd.DataFrame(rows).sort_values(["scenario", "leg", "threshold_min"])
scen_kpis.to_csv(TABLES_DIR / "scenario_kpis.csv", index=False)
print("\n== STEP 4 SUMMARY ==")
print(scen_kpis.to_string(index=False))
_ok("Step 4 complete — scenarios run and KPIs exported.")



== STEP 4 SUMMARY ==
scenario        leg  threshold_min  pct_population
baseline conveyance             30           51.77
baseline conveyance             45           69.35
baseline conveyance             60           79.03
baseline   response              7           30.77
baseline   response             15           62.33
baseline   response             18           75.15
baseline   response             40           99.26
[OK] Step 4 complete — scenarios run and KPIs exported.


In [8]:
# Step 5 — Scenario diffs, exports, and QA
# What this does:
# - Compares every non-baseline scenario against baseline (minute-saved deltas).
# - Reports population-weighted avg minutes saved and KPI percentage-point shifts.
# - Flags any LSOAs where response gets worse (shouldn’t when adding stations).

from __future__ import annotations
from typing import Dict, Sequence

# Helper: compute times for a given set of columns
def _scenario_times(station_cols: np.ndarray, acute_cols: np.ndarray) -> pd.DataFrame:
    t_r = rowmin_csr(R[:, station_cols]) * np.float32(BLUE_LIGHT_FACTOR_RESPONSE)
    t_c = rowmin_csr(C[:, acute_cols])   * np.float32(BLUE_LIGHT_FACTOR_CONVEY)
    t_tot = (t_r + ON_SCENE_BUFFER_MIN + t_c).astype(np.float32)
    return pd.DataFrame(
        {"lsoa_code": lsoa_index, "t_resp_min": t_r, "t_conv_min": t_c, "t_total_min": t_tot}
    )

# KPI helper (re-use from Step 3 if present)
def _coverage_kpis(times_df: pd.DataFrame) -> tuple[pd.DataFrame, pd.DataFrame]:
    r = coverage_table(times_df.set_index("lsoa_code")["t_resp_min"], RESPONSE_THRESHOLDS, population)
    c = coverage_table(times_df.set_index("lsoa_code")["t_conv_min"], SCENE_TO_AE_THRESHOLDS, population)
    return r, c

# Baseline artefacts
baseline_times = _scenario_times(
    station_cols=np.arange(R.shape[1], dtype=np.int32),
    acute_cols=np.arange(C.shape[1], dtype=np.int32),
)
base_resp_kpi, base_conv_kpi = _coverage_kpis(baseline_times)

# Filter to non-baseline scenarios
scenarios_to_run = [s for s in SCENARIOS if s.name != "baseline"]

rows_summary: list[Dict] = []
if len(scenarios_to_run) == 0:
    # Write an empty summary so downstream steps don’t break
    empty_summary = pd.DataFrame(
        columns=["scenario", "w_mean_resp_minutes_saved", "w_mean_total_minutes_saved", "n_worsened_resp"]
    )
    empty_summary.to_csv(TABLES_DIR / "scenario_delta_summary.csv", index=False)
    print("[WARN] No non-baseline scenarios defined. Wrote empty scenario_delta_summary.csv.")
else:
    for scn in scenarios_to_run:
        scn_times = _scenario_times(scn.station_cols, scn.acute_cols)

        # Per-LSOA deltas (positive = minutes saved vs baseline)
        merged = baseline_times.merge(scn_times, on="lsoa_code", suffixes=("_base", "_scn"))
        merged["d_resp_min"]  = merged["t_resp_min_base"]  - merged["t_resp_min_scn"]
        merged["d_conv_min"]  = merged["t_conv_min_base"]  - merged["t_conv_min_scn"]
        merged["d_total_min"] = merged["t_total_min_base"] - merged["t_total_min_scn"]

        # QA: response should not worsen when adding stations
        worsened = int((merged["d_resp_min"] < -1e-6).sum())
        if worsened:
            print(f"[WARN] {scn.name}: {worsened} LSOAs have worse response times than baseline.")

        # KPI deltas (percentage-point change)
        scn_resp_kpi, scn_conv_kpi = _coverage_kpis(scn_times)
        resp_delta = scn_resp_kpi.merge(base_resp_kpi, on="threshold_min", suffixes=("_scn", "_base"))
        resp_delta["delta_pp"] = resp_delta["pct_population_scn"] - resp_delta["pct_population_base"]
        conv_delta = scn_conv_kpi.merge(base_conv_kpi, on="threshold_min", suffixes=("_scn", "_base"))
        conv_delta["delta_pp"] = conv_delta["pct_population_scn"] - conv_delta["pct_population_base"]

        # Population-weighted average minutes saved
        pop = population.reindex(merged["lsoa_code"]).to_numpy(dtype=float)
        tot_pop = pop.sum() if pop.sum() > 0 else 1.0
        w_mean_resp_save  = float((pop * merged["d_resp_min"].to_numpy()).sum() / tot_pop)
        w_mean_total_save = float((pop * merged["d_total_min"].to_numpy()).sum() / tot_pop)

        rows_summary.append(
            {
                "scenario": scn.name,
                "w_mean_resp_minutes_saved": round(w_mean_resp_save, 3),
                "w_mean_total_minutes_saved": round(w_mean_total_save, 3),
                "n_worsened_resp": worsened,
            }
        )

        # Exports per scenario
        merged.to_csv(TABLES_DIR / f"delta_by_lsoa_{scn.name}.csv", index=False)
        resp_delta.to_csv(TABLES_DIR / f"delta_kpi_response_{scn.name}.csv", index=False)
        conv_delta.to_csv(TABLES_DIR / f"delta_kpi_convey_{scn.name}.csv", index=False)

        print(f"[OK] {scn.name}: wrote delta tables (by_lsoa / kpis).")

    # Consolidated scenario summary
    summary_df = pd.DataFrame(rows_summary).sort_values("scenario")
    summary_df.to_csv(TABLES_DIR / "scenario_delta_summary.csv", index=False)
    print("[OK] Scenario delta summary →", TABLES_DIR / "scenario_delta_summary.csv")

print("[OK] Step 5 complete — scenario diffs & QA exported.")


[WARN] No non-baseline scenarios defined. Wrote empty scenario_delta_summary.csv.
[OK] Step 5 complete — scenario diffs & QA exported.


In [9]:
# Step 6 — Maps (binary coverage + continuous layers)
# What this does:
# - Builds simple choropleths for response/convey thresholds and continuous time surfaces.
# - Overlays station/acute centroids (derived from LSOA polygons).
# - Saves PNGs under maps/ for slides or dashboards.

from __future__ import annotations
import contextlib
from matplotlib.patches import Patch
from matplotlib.lines import Line2D

# Join mapping geometry with times/flags from Step 3
gmap = lsoa_g.join(out_df, how="left")

# Centroids for station/acute LSOAs (EPSG:27700 → safe for centroid)
centroids = gmap.geometry.centroid
station_pts = gpd.GeoDataFrame({"lsoa_code": station_lsoas}, geometry=centroids.reindex(station_lsoas), crs=gmap.crs)
acute_pts   = gpd.GeoDataFrame({"lsoa_code": acute_lsoas},   geometry=centroids.reindex(acute_lsoas),   crs=gmap.crs)

# Colours
COVERED_COLOUR = "#2ca25f"
UNCOVERED_COLOUR = "#de2d26"
BORDER_COLOUR = "#ffffff"
BG_COLOUR = "#f7f7f7"
PTS_STATION_COLOUR = "#1f78b4"
PTS_ACUTE_COLOUR = "#6a3d9a"

def _legend_binary(ax, covered_label="Covered", uncovered_label="Not covered"):
    patches = [
        Patch(facecolor=COVERED_COLOUR, edgecolor=BORDER_COLOUR, label=covered_label),
        Patch(facecolor=UNCOVERED_COLOUR, edgecolor=BORDER_COLOUR, label=uncovered_label),
        Line2D([0], [0], marker="o", color="w", markerfacecolor=PTS_STATION_COLOUR, markersize=8, label="Station"),
        Line2D([0], [0], marker="o", color="w", markerfacecolor=PTS_ACUTE_COLOUR, markersize=8, label="Acute"),
    ]
    ax.legend(handles=patches, loc="lower left", frameon=True, framealpha=0.9)

def _plot_binary(layer_col: str, title: str, outfile: Path):
    fig, ax = plt.subplots(figsize=(8.5, 9), dpi=150, facecolor="white")
    ax.set_facecolor(BG_COLOUR)
    if layer_col not in gmap.columns:
        ax.text(0.5, 0.5, f"Column '{layer_col}' not found.", ha="center", va="center", transform=ax.transAxes)
    else:
        covered = gmap[gmap[layer_col] == 1]
        not_cov = gmap[gmap[layer_col] != 1]
        with contextlib.suppress(Exception):
            not_cov.plot(ax=ax, color=UNCOVERED_COLOUR, edgecolor=BORDER_COLOUR, linewidth=0.2, rasterized=True)
            covered.plot(ax=ax, color=COVERED_COLOUR, edgecolor=BORDER_COLOUR, linewidth=0.2, rasterized=True)
        if not station_pts.empty:
            station_pts.plot(ax=ax, markersize=10, color=PTS_STATION_COLOUR, alpha=0.9)
        if not acute_pts.empty:
            acute_pts.plot(ax=ax, markersize=10, color=PTS_ACUTE_COLOUR, alpha=0.9)
        _legend_binary(ax)
    ax.set_title(title, fontsize=13, pad=10)
    ax.set_axis_off()
    plt.tight_layout()
    plt.savefig(outfile, bbox_inches="tight")
    plt.close(fig)

def _plot_continuous(value_col: str, title: str, outfile: Path, vmin: float | None = None, vmax: float | None = None):
    fig, ax = plt.subplots(figsize=(8.5, 9), dpi=150, facecolor="white")
    ax.set_facecolor(BG_COLOUR)
    if value_col not in gmap.columns:
        ax.text(0.5, 0.5, f"Column '{value_col}' not found.", ha="center", va="center", transform=ax.transAxes)
    else:
        data = gmap[value_col].replace([np.inf, -np.inf], np.nan)
        if vmin is None: vmin = float(np.nanpercentile(data, 2)) if np.isfinite(data).any() else 0.0
        if vmax is None: vmax = float(np.nanpercentile(data, 98)) if np.isfinite(data).any() else 1.0
        vmin, vmax = (min(vmin, vmax), max(vmin, vmax))
        with contextlib.suppress(Exception):
            gmap.plot(column=value_col, ax=ax, cmap="viridis", vmin=vmin, vmax=vmax,
                      edgecolor=BORDER_COLOUR, linewidth=0.2, legend=True,
                      legend_kwds={"label": "Minutes", "shrink": 0.6}, rasterized=True)
        if not station_pts.empty:
            station_pts.plot(ax=ax, markersize=10, color=PTS_STATION_COLOUR, alpha=0.9)
        if not acute_pts.empty:
            acute_pts.plot(ax=ax, markersize=10, color=PTS_ACUTE_COLOUR, alpha=0.9)
    ax.set_title(title, fontsize=13, pad=10)
    ax.set_axis_off()
    plt.tight_layout()
    plt.savefig(outfile, bbox_inches="tight")
    plt.close(fig)

# Generate maps
written = []
for thr in RESPONSE_THRESHOLDS:
    f = MAPS_DIR / f"map_response_le_{thr}min.png"
    _plot_binary(f"resp_le_{thr}", f"Response coverage ≤{thr} min", f)
    written.append(f)
for thr in SCENE_TO_AE_THRESHOLDS:
    f = MAPS_DIR / f"map_conveyance_le_{thr}min.png"
    _plot_binary(f"conv_le_{thr}", f"Conveyance coverage ≤{thr} min", f)
    written.append(f)

f = MAPS_DIR / "map_t_resp_min.png"
_plot_continuous("t_resp_min", "Nearest response time (min)", f); written.append(f)
f = MAPS_DIR / "map_t_conv_min.png"
_plot_continuous("t_conv_min", "Nearest conveyance time (min)", f); written.append(f)
if "t_total_min" in gmap.columns:
    f = MAPS_DIR / "map_t_total_min.png"
    _plot_continuous("t_total_min", "End-to-end (resp + scene + convey)", f); written.append(f)

print("[OK] Step 6 complete — maps written:")
for p in written:
    print(" -", p)


[OK] Step 6 complete — maps written:
 - /Users/rosstaylor/Downloads/Code Repositories/REACH Map (NHS SW)/GitHub Repo/REACH-Map-NHS-SW/data/raw/test_data_ICB_level/maps/map_response_le_7min.png
 - /Users/rosstaylor/Downloads/Code Repositories/REACH Map (NHS SW)/GitHub Repo/REACH-Map-NHS-SW/data/raw/test_data_ICB_level/maps/map_response_le_15min.png
 - /Users/rosstaylor/Downloads/Code Repositories/REACH Map (NHS SW)/GitHub Repo/REACH-Map-NHS-SW/data/raw/test_data_ICB_level/maps/map_response_le_18min.png
 - /Users/rosstaylor/Downloads/Code Repositories/REACH Map (NHS SW)/GitHub Repo/REACH-Map-NHS-SW/data/raw/test_data_ICB_level/maps/map_response_le_40min.png
 - /Users/rosstaylor/Downloads/Code Repositories/REACH Map (NHS SW)/GitHub Repo/REACH-Map-NHS-SW/data/raw/test_data_ICB_level/maps/map_conveyance_le_30min.png
 - /Users/rosstaylor/Downloads/Code Repositories/REACH Map (NHS SW)/GitHub Repo/REACH-Map-NHS-SW/data/raw/test_data_ICB_level/maps/map_conveyance_le_45min.png
 - /Users/rosstayl

In [10]:
# Step 7 — Validation (parity vs group-by) and data sentinels
# What this does:
# - Recomputes nearest times using plain pandas group-bys (02a style).
# - Confirms the matrix minima match (within tiny tolerance).
# - Reports any +∞ rows (no-route) per leg.

from __future__ import annotations

ATOL = 1e-6  # numeric tolerance for parity

# 7.1 Response parity (station→LSOA)
resp_groupby = (
    travel[travel["origin_lsoa"].isin(station_lsoas)]
    .groupby("dest_lsoa", observed=True)["time_car_min"]
    .min()
    .reindex(lsoa_index)
    .astype("float32")
    .to_numpy()
) * np.float32(BLUE_LIGHT_FACTOR_RESPONSE)

# 7.2 Conveyance parity (LSOA→acute)
conv_groupby = (
    travel[travel["dest_lsoa"].isin(acute_lsoas)]
    .groupby("origin_lsoa", observed=True)["time_car_min"]
    .min()
    .reindex(lsoa_index)
    .astype("float32")
    .to_numpy()
) * np.float32(BLUE_LIGHT_FACTOR_CONVEY)

# 7.3 Compare to matrix-derived
resp_diff = np.nanmax(np.abs(resp_groupby - out_df["t_resp_min"].to_numpy()))
conv_diff = np.nanmax(np.abs(conv_groupby - out_df["t_conv_min"].to_numpy()))
resp_ok = bool(np.allclose(resp_groupby, out_df["t_resp_min"].to_numpy(), atol=ATOL, equal_nan=True))
conv_ok = bool(np.allclose(conv_groupby, out_df["t_conv_min"].to_numpy(), atol=ATOL, equal_nan=True))

print("\n== STEP 7 — Parity check ==")
print(f"Response parity: {'PASS' if resp_ok else 'FAIL'} (max abs diff = {resp_diff:.6f} min)")
print(f"Conveyance parity: {'PASS' if conv_ok else 'FAIL'} (max abs diff = {conv_diff:.6f} min)")

# 7.4 Sentinel report for +∞ (no-route) rows
n_inf_resp = int(np.isinf(out_df['t_resp_min'].to_numpy()).sum())
n_inf_conv = int(np.isinf(out_df['t_conv_min'].to_numpy()).sum())
print(f"No-route sentinels → response: {n_inf_resp}, conveyance: {n_inf_conv}")
print("[OK] Step 7 complete — validation run finished.")



== STEP 7 — Parity check ==
Response parity: PASS (max abs diff = 0.000000 min)
Conveyance parity: PASS (max abs diff = 0.000000 min)
No-route sentinels → response: 0, conveyance: 0
[OK] Step 7 complete — validation run finished.


In [11]:
# Step 8 — Convenience: scenario helpers (add/remove stations) and quick runner
# What this does:
# - Adds ergonomic helpers to build scenarios by LSOA code(s).
# - Avoids manual index math; validates input codes.

from __future__ import annotations
from typing import Sequence

# Lookups (if not already created)
station_col_lookup = {code: j for j, code in enumerate(station_lsoas)}
acute_col_lookup   = {code: j for j, code in enumerate(acute_lsoas)}

def make_add_station_scenario(name: str, add_station_codes: Sequence[str]) -> Scenario:
    missing = [c for c in add_station_codes if c not in station_col_lookup]
    if missing:
        raise ValueError(f"Unknown station LSOA codes: {missing}")
    add_cols = np.array([station_col_lookup[c] for c in add_station_codes], dtype=np.int32)
    base_cols = np.arange(R.shape[1], dtype=np.int32)
    new_cols = np.sort(np.unique(np.r_[base_cols, add_cols])).astype(np.int32)
    return Scenario(name=name, station_cols=new_cols, acute_cols=np.arange(C.shape[1], dtype=np.int32))

def make_remove_station_scenario(name: str, remove_station_codes: Sequence[str]) -> Scenario:
    missing = [c for c in remove_station_codes if c not in station_col_lookup]
    if missing:
        raise ValueError(f"Unknown station LSOA codes: {missing}")
    base_cols = np.arange(R.shape[1], dtype=np.int32)
    remove_cols = np.array([station_col_lookup[c] for c in remove_station_codes], dtype=np.int32)
    keep_mask = ~np.isin(base_cols, remove_cols)
    return Scenario(name=name, station_cols=base_cols[keep_mask], acute_cols=np.arange(C.shape[1], dtype=np.int32))

print("[OK] Step 8 ready — use make_add_station_scenario(...) or make_remove_station_scenario(...) to extend SCENARIOS.")


[OK] Step 8 ready — use make_add_station_scenario(...) or make_remove_station_scenario(...) to extend SCENARIOS.
