# Gaussian — Reproducibility & Similarity to Healthy (complementary provenance)

**Goal.** Mirror the KSG reproducibility analysis for the **Gaussian** estimator using session-wise binary adjacencies (per-edge permutation threshold, **no across-edge FDR**):
- Build **edge-presence** matrices per group (fraction of sessions with the edge).
- Define **robust sets** at **70%** and **90%** presence.
- Quantify similarity to Healthy via the **Jaccard index** between robust sets.
- Produce **presence-difference maps** *(Healthy − PD)* with a **robust overlay** (opaque where either group is ≥ threshold).
- Write a CSV with robust-edge **counts** and **Jaccard** values.

> **Note:** This Gaussian section is provided for **completeness/appendix transparency**. The main Results narrative bases inferential claims on the **KSG** estimator.

**Inputs**
- `Results/gauss_results/sub-XXX_ses-YYY_combined_gauss.pkl` (one per session; IDTxl `ResultsNetworkInference`).
- `subject_session_metadata.csv` (`subject`, `session`, `group`).

**Outputs (for Appendix)**
- PNGs:  
  `.../Step-wise/figs/groups_comp/Healthy_vs_PDoff_Gaussian_diffmap.png`  
  `.../Step-wise/figs/groups_comp/Healthy_vs_PDon_Gaussian_diffmap.png`
- CSV: `.../Step-wise/gauss_repro_counts_jaccard.csv`
- (Optionally cached presence arrays in `Results/gauss_results/<group>_gauss_edge_presence.npy`.)

**Key analysis choices**
- Binary adjacency via `get_adjacency_matrix('binary', fdr=False)`; diagonal zeroed.  
- Robust thresholds emphasized: **70%** and **90%** (50% may be inspected but not reported).  
- Jaccard compares **Healthy robust set** with each PD robust set.  
- Diff maps show *(Healthy − PD)*; edges robust in **either** group at the threshold are drawn **opaque**.

**Design caveat**
- Presence matrices are **between-subject** summaries within groups. PD-off and PD-on are the **same individuals** measured twice, which can inflate ON–OFF similarity relative to Healthy. We therefore benchmark against the Healthy backbone and emphasize stricter thresholds.


In [1]:
# ============================
# Gaussian — Reproducibility & Similarity to Healthy (complementary)
# Mirrors the KSG Step 2 analysis using Gaussian estimator results.
# ============================
from pathlib import Path
import pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

BASE      = Path("/lustre/majlepy2/myproject")
RESULTS   = BASE / "Results" / "gauss_results"
META_CSV  = BASE / "subject_session_metadata.csv"

# Figures are written here; copy/symlink to LaTeX path afterwards if desired.
OUTROOT   = Path("/home/majlepy2/myproject/Step-wise")
FIGDIR    = OUTROOT / "figs" / "groups_comp"
FIGDIR.mkdir(parents=True, exist_ok=True)

# ---------- Load metadata ----------
# Build a (sub_ses -> group) map to collect sessions per group.
meta = pd.read_csv(META_CSV)
meta["sub_ses"] = meta["subject"] + "_" + meta["session"]
subses_to_group = dict(zip(meta["sub_ses"], meta["group"]))

# ---------- Helper: consistent binary extraction ----------
# IMPORTANT: fdr=False (no across-edge FDR at extraction), diagonal → 0.
def get_binary(res, fdr=False):
    A = np.array(res.get_adjacency_matrix("binary", fdr=fdr)).astype(np.uint8)
    np.fill_diagonal(A, 0)
    return A

# ---------- Presence arrays: load if available, else compute ----------
def maybe_load_presence(name):
    f = RESULTS / f"{name}_gauss_edge_presence.npy"
    return np.load(f) if f.exists() else None

def save_presence(name, P):
    np.save(RESULTS / f"{name}_gauss_edge_presence.npy", P)

group_names = ["healthy", "PD-off", "PD-on"]
presence = {g: maybe_load_presence(g) for g in group_names}

if any(P is None for P in presence.values()):
    print("[INFO] Computing group presence from session pickles (Gaussian, fdr=False)...")
    # Gather session files
    session_pkls = sorted(RESULTS.glob("sub-*_*_combined_gauss.pkl"))
    group_to_mats = {g: [] for g in group_names}
    N = None

    for pkl in session_pkls:
        stem = pkl.name.replace("_combined_gauss.pkl", "")  # sub-XXX_ses-YY
        sub, ses = stem.split("_", 1)
        sub_ses = f"{sub}_{ses}"
        g = subses_to_group.get(sub_ses)
        if g not in group_to_mats:
            continue

        with open(pkl, "rb") as f:
            res = pickle.load(f)
        A = get_binary(res, fdr=False)
        if N is None:
            N = A.shape[0]
        group_to_mats[g].append(A)

    # Stack to (S,N,N) and average → presence fraction in [0,1]
    for g in group_names:
        mats = group_to_mats[g]
        if not mats:
            raise RuntimeError(f"No sessions found for group '{g}'.")
        M = np.stack(mats, axis=0)  # (S,N,N), uint8
        P = M.mean(axis=0)
        np.fill_diagonal(P, 0.0)
        presence[g] = P
        save_presence(g, P)
        print(f"[OK] {g}: {len(mats)} sessions -> presence saved, shape={P.shape}")
else:
    for g in group_names:
        print(f"[OK] Loaded saved presence: {g} -> {presence[g].shape}")

P_H   = presence["healthy"]
P_OFF = presence["PD-off"]
P_ON  = presence["PD-on"]
N = P_H.shape[0]
diag = np.eye(N, dtype=bool)

# ---------- Robust masks, Jaccard, and figures ----------
def robust_mask(P, thr):
    """Boolean mask of edges with presence >= thr, excluding diagonal."""
    return (P >= thr) & (~diag)

def jaccard(A, B):
    """Jaccard index between two boolean masks."""
    inter = (A & B).sum()
    union = (A | B).sum()
    return float(inter)/float(union) if union else np.nan

def diffmap_with_overlay(PA, PB, thr, title, outfile):
    """
    Presence-difference map D = PA - PB (Healthy - PD).
    Robust edges (in either group at 'thr') are drawn opaque; others faded.
    """
    D = PA - PB
    robust_any = (robust_mask(PA, thr) | robust_mask(PB, thr))
    fig, ax = plt.subplots(figsize=(7,6), dpi=200)
    vmax = max(abs(D.min()), abs(D.max()))
    im = ax.imshow(D, vmin=-vmax, vmax=+vmax, cmap="bwr", interpolation="nearest")
    im.set_alpha(np.where(robust_any, 1.0, 0.15))
    ax.set_title(f"{title}\nPresence difference (Healthy − PD); robust (≥{int(thr*100)}%) opaque")
    ax.set_xlabel("Target"); ax.set_ylabel("Source")
    cbar = plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
    cbar.set_label("Presence difference")
    plt.tight_layout(); plt.savefig(outfile, bbox_inches="tight"); plt.close(fig)
    print(f"Saved: {outfile}")

# Compute counts + Jaccard at 70% and 90%.
rows = []
for thr in (0.70, 0.90):
    R_H   = robust_mask(P_H, thr)
    R_OFF = robust_mask(P_OFF, thr)
    R_ON  = robust_mask(P_ON, thr)

    cnt_H, cnt_OFF, cnt_ON = int(R_H.sum()), int(R_OFF.sum()), int(R_ON.sum())
    J_H_OFF, J_H_ON = jaccard(R_H, R_OFF), jaccard(R_H, R_ON)

    rows.append({
        "threshold": f"{int(thr*100)}%",
        "robust_edges_Healthy": cnt_H,
        "robust_edges_PD_off": cnt_OFF,
        "robust_edges_PD_on": cnt_ON,
        "Jaccard_H_vs_PDoff": None if np.isnan(J_H_OFF) else round(J_H_OFF, 3),
        "Jaccard_H_vs_PDon":  None if np.isnan(J_H_ON)  else round(J_H_ON,  3),
    })

    # Two difference maps per threshold (H−PDoff and H−PDon), with robust overlay.
    diffmap_with_overlay(P_H, P_OFF, thr,
                         title="Healthy vs PD-off (Gaussian)",
                         outfile=FIGDIR / "Healthy_vs_PDoff_Gaussian_diffmap.png")
    diffmap_with_overlay(P_H, P_ON, thr,
                         title="Healthy vs PD-on (Gaussian)",
                         outfile=FIGDIR / "Healthy_vs_PDon_Gaussian_diffmap.png")

# ---------- Save summary CSV for Appendix ----------
df = pd.DataFrame(rows)
csv_out = OUTROOT / "gauss_repro_counts_jaccard.csv"
df.to_csv(csv_out, index=False)
print("Saved:", csv_out)
print(df)


[OK] Loaded saved presence: healthy -> (23, 23)
[OK] Loaded saved presence: PD-off -> (23, 23)
[OK] Loaded saved presence: PD-on -> (23, 23)
Saved: /home/majlepy2/myproject/Step-wise/figs/groups_comp/Healthy_vs_PDoff_Gaussian_diffmap.png
Saved: /home/majlepy2/myproject/Step-wise/figs/groups_comp/Healthy_vs_PDon_Gaussian_diffmap.png
Saved: /home/majlepy2/myproject/Step-wise/figs/groups_comp/Healthy_vs_PDoff_Gaussian_diffmap.png
Saved: /home/majlepy2/myproject/Step-wise/figs/groups_comp/Healthy_vs_PDon_Gaussian_diffmap.png
Saved: /home/majlepy2/myproject/Step-wise/gauss_repro_counts_jaccard.csv
  threshold  robust_edges_Healthy  robust_edges_PD_off  robust_edges_PD_on  \
0       70%                     6                    9                  12   
1       90%                     1                    1                   2   

   Jaccard_H_vs_PDoff  Jaccard_H_vs_PDon  
0               0.071              0.125  
1               0.000              0.500  
