# 🧭 SpectraMind V50 — 02 · Diagnostics & Explainability

**NeurIPS 2025 Ariel Data Challenge** · *NASA‑grade reproducibility w/ Hydra + DVC + CLI*

This notebook focuses on diagnostics and explainability artifacts produced by the SpectraMind V50 pipeline.
It is *CLI‑first*, meaning: wherever possible, we **call the `spectramind` CLI** to generate artifacts,
then **render them here**. Each section also includes a *graceful fallback* path that reads existing files
(e.g., `diagnostic_summary.json`, `*.html`, `*.csv`, `*.npy`) directly if present.

> Golden rule: **No hidden analytics**. The GUI/notebook **reflects** what the CLI generated to keep faithful,
> reproducible results. Cells here log their actions and record run metadata for later auditing.


## ✅ What you can do here

1. Run *lightweight* CLI diagnostics (UMAP/t‑SNE, SHAP overlays, symbolic rule analysis, FFT/smoothness, calibration).
2. Render previously generated HTML dashboards and plots.
3. Inspect metrics from `diagnostic_summary.json` across planets and configs.
4. Export a refreshed diagnostics dashboard and append metadata into `v50_debug_log.md`.

> Tip: Start with the **Environment & Paths** cell below, then try **Quick CLI sanity** to verify your setup.


## 🔧 Environment & Paths

In [None]:
# This cell sets up common paths and logs environment context.
# Adjust ROOT to your repository root if needed.
from pathlib import Path
import json, os, sys, platform, shutil, textwrap
from datetime import datetime

# ---- Repository root deduction: prefer CWD; allow manual override ----
ROOT = Path.cwd()
ARTEFACTS = ROOT / "artifacts"
DIAG_DIR = ARTEFACTS / "diagnostics"
HTML_DIR = DIAG_DIR / "html"
PLOTS_DIR = DIAG_DIR / "plots"

# Common artifact inputs (created by CLI)
DIAG_SUMMARY = DIAG_DIR / "diagnostic_summary.json"
SHAP_JSON = DIAG_DIR / "shap_symbolic_fusion_topk_bins.json"
SYMB_JSON = DIAG_DIR / "symbolic_violation_summary.json"
COREL_JSON = DIAG_DIR / "corel_calibration_summary.json"
UMAP_HTML = HTML_DIR / "umap_v50.html"
TSNE_HTML = HTML_DIR / "tsne_interactive.html"
DASHBOARD_HTML = HTML_DIR / "diagnostic_report_v1.html"
LOG_MD = ROOT / "v50_debug_log.md"

# Optional latent/label fallbacks
LATENTS_NPY = DIAG_DIR / "latents.npy"
LATENTS_CSV = DIAG_DIR / "latents.csv"
LABELS_CSV = DIAG_DIR / "labels.csv"

# Create folders if missing (no-op if existing)
for d in [ARTEFACTS, DIAG_DIR, HTML_DIR, PLOTS_DIR]:
    d.mkdir(parents=True, exist_ok=True)

env = {
    "timestamp": datetime.now().isoformat(timespec="seconds"),
    "python": sys.version.replace("\n", " "),
    "platform": platform.platform(),
    "cwd": str(ROOT),
    "paths": {
        "ARTEFACTS": str(ARTEFACTS),
        "DIAG_DIR": str(DIAG_DIR),
        "HTML_DIR": str(HTML_DIR),
        "PLOTS_DIR": str(PLOTS_DIR),
        "DIAG_SUMMARY": str(DIAG_SUMMARY),
        "SHAP_JSON": str(SHAP_JSON),
        "SYMB_JSON": str(SYMB_JSON),
        "COREL_JSON": str(COREL_JSON),
        "UMAP_HTML": str(UMAP_HTML),
        "TSNE_HTML": str(TSNE_HTML),
        "DASHBOARD_HTML": str(DASHBOARD_HTML),
        "LOG_MD": str(LOG_MD),
        "LATENTS_NPY": str(LATENTS_NPY),
        "LATENTS_CSV": str(LATENTS_CSV),
        "LABELS_CSV": str(LABELS_CSV),
    },
}
print(json.dumps(env, indent=2))


## 🩺 Quick CLI sanity (optional)

In [None]:
# This cell *optionally* checks CLI availability. It's safe to skip if your CLI isn't on PATH.
# In many notebook runtimes, subprocess may not find your local CLI; that's okay.
import shutil, subprocess

def check_cli(cmd="spectramind", args=["--version"]):
    exe = shutil.which(cmd)
    if not exe:
        print("⚠️ 'spectramind' CLI not found on PATH. Skipping CLI sanity check.")
        return {"available": False}
    try:
        out = subprocess.check_output([cmd] + args, stderr=subprocess.STDOUT, text=True, timeout=30)
        print(out)
        return {"available": True, "output": out}
    except Exception as e:
        print(f"⚠️ CLI call failed: {e}")
        return {"available": True, "error": str(e)}

cli_info = check_cli()
cli_info


## 🌈 UMAP Diagnostics — generate or load

In [None]:
# Attempt to call the CLI to generate UMAP HTML; otherwise load existing HTML if present.
import subprocess, webbrowser

def run_umap_cli():
    try:
        exe = shutil.which("spectramind")
        if not exe:
            return False, "CLI not found"
        # Minimal example — adjust flags to your configs
        cmd = ["spectramind", "diagnose", "umap",
               "--html-out", str(UMAP_HTML),
               "--log-file", str(LOG_MD),
               "--open-browser", "false"]
        print("Running:", " ".join(cmd))
        subprocess.check_call(cmd, timeout=600)
        return True, "OK"
    except Exception as e:
        return False, str(e)

ok, msg = run_umap_cli()
if ok:
    print(f"✅ UMAP generated at: {UMAP_HTML}")
elif UMAP_HTML.exists():
    print("⚠️ CLI skipped/failed; using existing:", UMAP_HTML)
else:
    print("❌ UMAP not available; create latents or run CLI separately.")

# Inline display hint (cannot iframe automatically in all notebook environments)
print("To open UMAP HTML manually if needed:", str(UMAP_HTML))


## 🌀 t‑SNE Diagnostics — generate or load

In [None]:
# Attempt to call CLI t‑SNE; otherwise load existing HTML if present.
def run_tsne_cli():
    try:
        exe = shutil.which("spectramind")
        if not exe:
            return False, "CLI not found"
        cmd = ["spectramind", "diagnose", "tsne-latents",
               "--html-out", str(TSNE_HTML),
               "--log-file", str(LOG_MD),
               "--open-browser", "false"]
        print("Running:", " ".join(cmd))
        subprocess.check_call(cmd, timeout=600)
        return True, "OK"
    except Exception as e:
        return False, str(e)

ok, msg = run_tsne_cli()
if ok:
    print(f"✅ t‑SNE generated at: {TSNE_HTML}")
elif TSNE_HTML.exists():
    print("⚠️ CLI skipped/failed; using existing:", TSNE_HTML)
else:
    print("❌ t‑SNE not available; ensure latents exist or run CLI separately.")

print("To open t‑SNE HTML manually if needed:", str(TSNE_HTML))


### 🔁 Fallback: quickscatter for latents (matplotlib)

In [None]:
# If HTMLs are not present, try to render a simple 2D scatter from CSV/NPY latents.
# Matplotlib only (no seaborn, and a single plot per chart per project constraints).
import numpy as np
import matplotlib.pyplot as plt

def load_matrix(path):
    if path.suffix == ".csv":
        import pandas as pd
        return pd.read_csv(path, index_col=None).values
    elif path.suffix == ".npy":
        return np.load(path)
    else:
        raise ValueError(f"Unsupported format for {path}")

latent_matrix = None
for p in [LATENTS_CSV, LATENTS_NPY]:
    if p.exists():
        try:
            latent_matrix = load_matrix(p)
            print(f"Loaded latents from {p} with shape {latent_matrix.shape}")
            break
        except Exception as e:
            print(f"Failed to load {p}: {e}")

if latent_matrix is not None:
    # Use first 2 columns as a crude projection
    x = latent_matrix[:, 0]
    y = latent_matrix[:, 1] if latent_matrix.shape[1] > 1 else np.zeros_like(x)
    plt.figure(figsize=(6, 5))
    plt.scatter(x, y, s=10, alpha=0.7)
    plt.title("Latent Quickscatter (fallback)")
    plt.xlabel("Dim 1")
    plt.ylabel("Dim 2")
    plt.show()
else:
    print("No latents found for fallback quickscatter.")


## 🔍 SHAP × Symbolic Overlay Inspection

In [None]:
# Load SHAP/symbolic overlay JSON if present; show top-K bins per planet or aggregate stats.
import json
from collections import Counter, defaultdict

def safe_load_json(path):
    if not path.exists():
        return None
    try:
        with open(path, "r", encoding="utf-8") as f:
            return json.load(f)
    except Exception as e:
        print(f"Failed to parse {path}: {e}")
        return None

overlay = safe_load_json(SHAP_JSON)
if overlay is None:
    print(f"⚠️ No overlay JSON at {SHAP_JSON}. Generate via CLI or scripts first.")
else:
    # Expecting schema like: { "planets": { "planet_id": { "top_bins": [...], "scores": {...} } }, ...}
    planets = overlay.get("planets") or overlay  # tolerate flat schema
    print(f"Loaded overlay for {len(planets)} planets.")
    # Simple aggregate: most common bins across planets (if present)
    bin_counts = Counter()
    for pid, rec in planets.items():
        top_bins = rec.get("top_bins") or []
        bin_counts.update(top_bins)
    most_common = bin_counts.most_common(10)
    print("Top 10 recurrent bins across planets:", most_common)


## 🧩 Symbolic Rule Violations — summary view

In [None]:
# Inspect symbolic violation summaries to find dominant rules and hotspots.
symb = safe_load_json(SYMB_JSON)
if symb is None:
    print(f"⚠️ No symbolic violation summary at {SYMB_JSON}.")
else:
    # Tolerate either list or dict formats
    if isinstance(symb, dict) and "rules" in symb:
        rules = symb["rules"]
    elif isinstance(symb, list):
        rules = symb
    else:
        rules = symb

    print("Symbolic rules summary keys:", list(rules)[:10] if isinstance(rules, dict) else "list")
    # Very simple aggregate if dict: sort rules by mean violation
    if isinstance(rules, dict):
        agg = []
        for rname, vals in rules.items():
            v = vals if isinstance(vals, (int, float)) else vals.get("mean", None) if isinstance(vals, dict) else None
            if v is not None:
                agg.append((rname, float(v)))
        agg.sort(key=lambda x: x[1], reverse=True)
        print("Top 10 rules by mean violation:")
        for r, v in agg[:10]:
            print(f"  {r:40s}  {v:.4f}")


## 📈 FFT & Smoothness — quick checks on μ spectra

In [None]:
# Try a lightweight smoothness/FFT visualization from diagnostic_summary.json if available.
import numpy as np
import matplotlib.pyplot as plt

diag = safe_load_json(DIAG_SUMMARY)
if not diag:
    print(f"⚠️ No diagnostic summary at {DIAG_SUMMARY}. Generate via CLI or scripts first.")
else:
    # Heuristic: look for one planet entry with "mu" or "spectrum" field
    # and draw its FFT magnitude and finite-difference smoothness.
    # The exact schema varies; we attempt to be permissive.
    candidates = []
    if isinstance(diag, dict):
        # Possible structures: {"planets": {...}}, or {"items": [...]}, or direct
        root = diag.get("planets") or diag.get("items") or diag
        if isinstance(root, dict):
            candidates = list(root.values())
        elif isinstance(root, list):
            candidates = root
    if not candidates:
        print("Could not find planet entries in diagnostic summary.")
    else:
        # Find first record with a 1D mu-like array
        mu = None
        for rec in candidates:
            arr = rec.get("mu") or rec.get("spectrum") or rec.get("mu_mean")
            if isinstance(arr, list) and len(arr) >= 16:
                mu = np.array(arr, dtype=float)
                break
        if mu is None:
            print("No suitable μ array found in diagnostic summary.")
        else:
            # FFT magnitude
            fft_mag = np.abs(np.fft.rfft(mu))
            plt.figure(figsize=(6,4))
            plt.plot(fft_mag)
            plt.title("FFT magnitude of μ (example planet)")
            plt.xlabel("Frequency bin")
            plt.ylabel("|FFT|")
            plt.show()

            # Smoothness (finite differences)
            grad = np.diff(mu)
            curv = np.diff(mu, n=2)
            plt.figure(figsize=(6,4))
            plt.plot(np.abs(grad), label="|∂μ|")
            plt.plot(np.abs(curv), label="|∂²μ|")
            plt.title("Smoothness diagnostics of μ (example planet)")
            plt.xlabel("Spectral bin")
            plt.ylabel("Magnitude")
            plt.legend()
            plt.show()


## 🎯 Calibration (σ) — COREL overview

In [None]:
# Load calibration summary if available and show simple histograms.
import numpy as np
import matplotlib.pyplot as plt

corel = safe_load_json(COREL_JSON)
if corel is None:
    print(f"⚠️ No COREL calibration summary at {COREL_JSON}.")
else:
    # Heuristic: expect per-bin coverage or residuals arrays
    cov = corel.get("coverage") or corel.get("per_bin_coverage")
    res = corel.get("residuals") or corel.get("per_bin_residual")
    if isinstance(cov, list) and len(cov) > 0:
        cov = np.array(cov, dtype=float)
        plt.figure(figsize=(6,4))
        plt.hist(cov, bins=30, alpha=0.9)
        plt.title("Coverage histogram (COREL)")
        plt.xlabel("Coverage")
        plt.ylabel("Count")
        plt.show()
    if isinstance(res, list) and len(res) > 0:
        res = np.array(res, dtype=float)
        plt.figure(figsize=(6,4))
        plt.hist(res, bins=30, alpha=0.9)
        plt.title("Residual histogram (COREL)")
        plt.xlabel("Residual")
        plt.ylabel("Count")
        plt.show()


## 🧪 Build Diagnostics Dashboard (optional)

In [None]:
# Call the CLI to generate the full HTML diagnostics dashboard, or load an existing one.
def run_dashboard_cli():
    try:
        exe = shutil.which("spectramind")
        if not exe:
            return False, "CLI not found"
        cmd = ["spectramind", "diagnose", "dashboard",
               "--html-out", str(DASHBOARD_HTML),
               "--log-file", str(LOG_MD),
               "--open-browser", "false"]
        print("Running:", " ".join(cmd))
        subprocess.check_call(cmd, timeout=1200)
        return True, "OK"
    except Exception as e:
        return False, str(e)

ok, msg = run_dashboard_cli()
if ok:
    print(f"✅ Diagnostics dashboard generated at: {DASHBOARD_HTML}")
elif DASHBOARD_HTML.exists():
    print("⚠️ CLI skipped/failed; using existing:", DASHBOARD_HTML)
else:
    print("❌ Dashboard not available. Generate via CLI when ready.")

print("To open manually if needed:", str(DASHBOARD_HTML))


## 🧾 Append run metadata to `v50_debug_log.md`

In [None]:
# Append a structured entry to v50_debug_log.md for auditability.
from datetime import datetime
entry = f"""### Notebook: 02_diagnostics_explainability.ipynb
- timestamp: {datetime.now().isoformat(timespec="seconds")}
- cwd: {ROOT}
- python: {platform.python_version()}
- actions:
  - env_init
  - umap_try_cli: {'exists' if UMAP_HTML.exists() else 'missing'}
  - tsne_try_cli: {'exists' if TSNE_HTML.exists() else 'missing'}
  - shap_overlay_loaded: {SHAP_JSON.exists()}
  - symbolic_summary_loaded: {SYMB_JSON.exists()}
  - corel_summary_loaded: {COREL_JSON.exists()}
  - dashboard: {'exists' if DASHBOARD_HTML.exists() else 'missing'}
"""

try:
    with open(LOG_MD, "a", encoding="utf-8") as f:
        f.write(entry + "\n")
    print(f"Appended notebook log entry to {LOG_MD}")
except Exception as e:
    print(f"⚠️ Could not append to {LOG_MD}: {e}")


## 📚 References & Next Steps

- `spectramind diagnose umap` · Generate UMAP HTML with symbolic overlays and links.
- `spectramind diagnose tsne-latents` · Interactive t‑SNE with confidence/links.
- `spectramind diagnose smoothness` · Produce smoothness maps and CSV summaries.
- `spectramind diagnose dashboard` · Unified HTML diagnostics dashboard.

**Pro tip:** Pair this notebook with `00_quickstart.ipynb` and `03_ablation_and_tuning.ipynb` for the full pipeline flow.
