## Evaluating discovered n-plets on held-out subjects

This notebook evaluates the n-plets found in the previous step.

Each n-plet was *discovered* using a specific pair of subjects: one conscious
and one non-responsive. Here, we test how informative that same n-plet is for
separating the two groups when evaluated on all other subjects, excluding
the original discovery pair.

For every discovered n-plet, we compute subject-level values under several
metrics:
- high-order interaction (HOI) measures computed from the covariance matrices
  (TC, DTC, O-information, S, and a normalized O-information), and
- a classical functional connectivity summary: the mean Fisher z-transformed
  correlation within the n-plet.

We then quantify how well each metric separates conscious vs. non-responsive
subjects using:
- ANOVA F-score 
- Area Under the Precisionâ€“Recall Curve (PRAUC), reported both for the metric and for its sign-inverted version (Cpos and NRpos contexts).

Because the discovery subjects are held out from this evaluation, this is a form
of leave-one-pair-out (holdout-pair) assessment that reduces circularity
(i.e., testing on the same subjects used to select the n-plet).


**Note**: This notebook should be run from the `high-order-anesthesia` folder to ensure the correct imports and file paths are used.

In [1]:
from pathlib import Path
import os
def ensure_project_root(target_name: str = "high-order-anesthesia") -> Path:
    cwd = Path.cwd().resolve()
    if cwd.name == target_name:
        return cwd
    for parent in cwd.parents:
        if parent.name == target_name:
            os.chdir(parent)
            return parent
    raise RuntimeError(
        f"Could not find '{target_name}' in current path or parents. "
        f"Please run the notebook from inside the project."
    )
ROOT = ensure_project_root("high-order-anesthesia")
print(f"Now in: {ROOT.name}")


Now in: high-order-anesthesia


In [2]:
import pandas as pd
from tqdm.notebook import tqdm, trange
import ast
from torch import cuda

#### Custom libraries

In [18]:
from src.hoi_anesthesia.io import load_covariance_dict
from src.hoi_anesthesia.utils import evaluate_nplet_batched

#### Data loading and preparation

In [4]:
results_path = "results"
data_path = "data"

# Load covariance matrices
all_covs = load_covariance_dict(f"{data_path}/covariance_matrices_gc.h5")

# States for each dataset; MA: Multi-anesthesia - DBS: Deep Brain Stimulation
conscious_states = {
    "MA": ["MA_awake"],  
    "DBS": ["DBS_awake", "ts_on_5V"],
}
nonresponsive_states = {
    "MA": ["ts_selv2", "ts_selv4", "moderate_propofol", "deep_propofol", "ketamine"],
     "DBS": ["ts_off", "ts_on_3V_control", "ts_on_5V_control"],
}


In [5]:
results_df_MA = pd.read_csv(
            f"{results_path}/R1_A_max_O_diff_MA_all_orders.csv",
            encoding="utf-8-sig",
            sep=";",
            decimal=",",
        )

results_df_DBS = pd.read_csv(
            f"{results_path}/R1_A_max_O_diff_DBS_all_orders.csv",
            encoding="utf-8-sig",
            sep=";",
            decimal=",",
        )

In [6]:
print("MA",results_df_MA.value_counts(["order"]))
print("DBS",results_df_DBS.value_counts(["order"]))

MA order
3        4560
4        4560
5        4560
Name: count, dtype: int64
DBS order
3        7808
4        7808
5        7808
Name: count, dtype: int64


In [7]:
print("MA",results_df_MA.value_counts(["state_c","state_nr"]))
print("DBS",results_df_DBS.value_counts(["state_c","state_nr"]))

MA state_c   state_nr         
MA_awake  deep_propofol        3312
          ketamine             3168
          moderate_propofol    3024
          ts_selv2             2592
          ts_selv4             1584
Name: count, dtype: int64
DBS state_c    state_nr        
DBS_awake  ts_off              6048
ts_on_5V   ts_off              4200
DBS_awake  ts_on_3V_control    3888
           ts_on_5V_control    3888
ts_on_5V   ts_on_3V_control    2700
           ts_on_5V_control    2700
Name: count, dtype: int64


In [8]:
results_dict = {
    "DBS": results_df_DBS,
     "MA": results_df_MA,
}
device = "cuda" if cuda.is_available() else "cpu"
cuda.empty_cache()

### Evaluation loop and checkpointing

We iterate over all discovered n-plets and compute evaluation metrics in batches.
To avoid losing progress in long runs, intermediate results are periodically merged back into the original table and saved to CSV.

In [9]:
for result_name, results_df in results_dict.items():
    eval_results = []
    for idx, row in tqdm(
        results_df.iterrows(),
        total=len(results_df),
        desc="Evaluating n-plets",
    ):

        metrics = evaluate_nplet_batched(
            idx,
            all_covs,
            conscious_states,
            nonresponsive_states,
            result_name,
            ast.literal_eval(row["optimal_nplet"]),
            row["state_c"],
            row["state_nr"],
            row["subject_c"],
            row["subject_nr"],
            device=device,
        )
        eval_results.extend(metrics)
        if idx % 10000 == 0:
            metrics_df = pd.DataFrame(eval_results)
            results_eval_df = results_df.merge(
                metrics_df,
                left_index=True,  # results_df row index matches metrics_df row_idx groups
                right_on="row_idx",  # match on metrics_df row_idx
            )
            # Drop the helper column if not needed
            results_eval_df = results_eval_df.drop(columns=["row_idx"])
            results_eval_df.to_csv(
                f"{results_path}/R1_B_nplet_eval_{result_name}.csv",
                index=False,
                encoding="utf-8-sig",
                sep=";",
                decimal=",",
            )
    metrics_df = pd.DataFrame(eval_results)
    results_eval_df = results_df.merge(
        metrics_df,
        left_index=True,  # results_df row index matches metrics_df row_idx groups
        right_on="row_idx",  # match on metrics_df row_idx
    )

    # Drop the helper column if not needed
    results_eval_df = results_eval_df.drop(columns=["row_idx"])
    results_eval_df.to_csv(
        f"{results_path}/R1_B_nplet_eval_{result_name}.csv",
        index=False,
        encoding="utf-8-sig",
        sep=";",
        decimal=",",
    )


Evaluating n-plets:   0%|          | 0/23424 [00:00<?, ?it/s]



Evaluating n-plets:   0%|          | 0/13680 [00:00<?, ?it/s]



In [10]:
results_eval_df.head(10)

Unnamed: 0,order,task,state_c,state_nr,subject_c,subject_nr,optimal_nplet,optimal_score,measure,F_score,PR_AUC,PR_AUC_inv
0,3,Cpos,MA_awake,ts_selv2,0,0,"[24, 28, 68]",0.436837,TC,85.149014,0.775192,0.115692
1,3,Cpos,MA_awake,ts_selv2,0,0,"[24, 28, 68]",0.436837,DTC,51.661191,0.666045,0.121674
2,3,Cpos,MA_awake,ts_selv2,0,0,"[24, 28, 68]",0.436837,O,70.233159,0.824582,0.128064
3,3,Cpos,MA_awake,ts_selv2,0,0,"[24, 28, 68]",0.436837,S,69.878844,0.733989,0.117648
4,3,Cpos,MA_awake,ts_selv2,0,0,"[24, 28, 68]",0.436837,norm_O,47.616938,0.724365,0.124643
5,3,Cpos,MA_awake,ts_selv2,0,0,"[24, 28, 68]",0.436837,FC_mean_z,125.000646,0.880448,0.111413
6,3,Cpos,MA_awake,ts_selv2,0,1,"[28, 39, 72]",0.574011,TC,193.26739,0.941805,0.110207
7,3,Cpos,MA_awake,ts_selv2,0,1,"[28, 39, 72]",0.574011,DTC,160.193007,0.890943,0.110991
8,3,Cpos,MA_awake,ts_selv2,0,1,"[28, 39, 72]",0.574011,O,139.817569,0.972403,0.109869
9,3,Cpos,MA_awake,ts_selv2,0,1,"[28, 39, 72]",0.574011,S,182.653524,0.921997,0.110487


Best nplets in terms of ANOVA's F-score

In [19]:
results_eval_df.query("measure=='O'").sort_values(by='F_score',ascending=False).head(10)

Unnamed: 0,order,task,state_c,state_nr,subject_c,subject_nr,optimal_nplet,optimal_score,measure,F_score,PR_AUC,PR_AUC_inv
60860,5,Cpos,MA_awake,ts_selv4,14,5,"[26, 28, 59, 68, 69]",0.76831,O,488.257275,0.987047,0.109786
9530,3,Cpos,MA_awake,moderate_propofol,9,7,"[27, 29, 70]",0.540952,O,418.502869,0.994631,0.10973
50426,4,Cpos,MA_awake,ketamine,15,10,"[27, 28, 68, 70]",0.482542,O,417.593759,0.998188,0.109715
5474,3,Cpos,MA_awake,ts_selv4,4,4,"[27, 31, 70]",0.423926,O,406.331996,0.989384,0.109758
17684,3,Cpos,MA_awake,deep_propofol,23,18,"[27, 70, 72]",0.38521,O,400.169247,0.990585,0.109762
14594,3,Cpos,MA_awake,deep_propofol,1,9,"[27, 28, 69]",0.413995,O,399.067945,0.98687,0.10977
33416,4,Cpos,MA_awake,ts_selv4,13,2,"[27, 29, 69, 80]",0.771467,O,397.650665,0.981168,0.109812
21566,3,Cpos,MA_awake,ketamine,4,2,"[27, 70, 72]",0.275797,O,392.787702,0.990585,0.109762
6026,3,Cpos,MA_awake,ts_selv4,12,8,"[27, 70, 72]",0.354387,O,392.182869,0.990585,0.109762
2048,3,Cpos,MA_awake,ts_selv2,18,17,"[27, 28, 69]",0.413394,O,391.485344,0.98687,0.10977


Best nplets in terms of Area nder Precision-Recall Curve when consciouss states show higher O Information

In [20]:
results_eval_df.query("measure=='O' and task=='Cpos'").sort_values(by='PR_AUC',ascending=False).head(10)

Unnamed: 0,order,task,state_c,state_nr,subject_c,subject_nr,optimal_nplet,optimal_score,measure,F_score,PR_AUC,PR_AUC_inv
50426,4,Cpos,MA_awake,ketamine,15,10,"[27, 28, 68, 70]",0.482542,O,417.593759,0.998188,0.109715
10592,3,Cpos,MA_awake,moderate_propofol,17,16,"[27, 29, 70]",0.405592,O,361.271821,0.994631,0.10973
17468,3,Cpos,MA_awake,deep_propofol,22,5,"[27, 29, 70]",0.388672,O,379.545112,0.994631,0.10973
9530,3,Cpos,MA_awake,moderate_propofol,9,7,"[27, 29, 70]",0.540952,O,418.502869,0.994631,0.10973
69686,5,Cpos,MA_awake,deep_propofol,4,2,"[27, 28, 69, 72, 80]",0.644347,O,363.823293,0.992251,0.109757
22304,3,Cpos,MA_awake,ketamine,9,15,"[27, 29, 69]",0.326651,O,375.935567,0.992236,0.109736
21566,3,Cpos,MA_awake,ketamine,4,2,"[27, 70, 72]",0.275797,O,392.787702,0.990585,0.109762
22274,3,Cpos,MA_awake,ketamine,9,10,"[27, 70, 72]",0.391874,O,388.36342,0.990585,0.109762
1850,3,Cpos,MA_awake,ts_selv2,17,2,"[27, 70, 72]",0.398838,O,390.941541,0.990585,0.109762
1448,3,Cpos,MA_awake,ts_selv2,13,7,"[27, 70, 72]",0.429599,O,384.638613,0.990585,0.109762


Best nplets in terms of Area nder Precision-Recall Curve when non responsive states show higher O Information

In [21]:
results_eval_df.query("measure=='O' and task=='NRpos'").sort_values(by='PR_AUC_inv',ascending=False).head(10)

Unnamed: 0,order,task,state_c,state_nr,subject_c,subject_nr,optimal_nplet,optimal_score,measure,F_score,PR_AUC,PR_AUC_inv
27290,3,NRpos,MA_awake,ketamine,23,10,"[38, 40, 81]",0.314849,O,137.739219,0.110068,0.954124
12392,3,NRpos,MA_awake,moderate_propofol,8,1,"[27, 62, 73]",0.323249,O,251.555851,0.110067,0.952574
11876,3,NRpos,MA_awake,moderate_propofol,3,20,"[27, 62, 73]",0.183692,O,201.455527,0.110067,0.952574
25286,3,NRpos,MA_awake,ketamine,8,6,"[27, 62, 73]",0.330455,O,250.848834,0.110067,0.952574
25706,3,NRpos,MA_awake,ketamine,11,10,"[27, 62, 73]",0.231854,O,203.533364,0.110067,0.952574
20030,3,NRpos,MA_awake,deep_propofol,16,18,"[23, 34, 81]",0.21477,O,159.546667,0.11141,0.948632
12542,3,NRpos,MA_awake,moderate_propofol,9,5,"[23, 34, 81]",0.369757,O,161.766129,0.111442,0.944585
13232,3,NRpos,MA_awake,moderate_propofol,14,15,"[29, 38, 40]",0.176652,O,168.855796,0.115634,0.944211
13580,3,NRpos,MA_awake,moderate_propofol,17,10,"[23, 34, 81]",0.367117,O,158.712371,0.111456,0.94394
8294,3,NRpos,MA_awake,ts_selv4,23,1,"[23, 34, 81]",0.429813,O,166.978731,0.111456,0.94394
