# MorphoXAI – Stage 2: Slide-level Explanations for Independent Test Slides – Starter Notebook

This notebook assumes that:

1. You have already trained a **full-data MIL model** for WSI-based prediction.
2. You have completed **Stage 1 – Morphologic Spectrum Construction**, and obtained a **morphologic spectrum** that summarizes the key histomorphologic patterns captured by the model.

Building on this spectrum, the code in this notebook generates **slide-level (local) explanations** for predictions made by the full-data model on **independent test slides**. These local explanations:

- highlight **which regions of the slide the model relies on most heavily**, and  
- specify **which morphologic-spectrum patterns these regions correspond to**.

The resulting explanations are exported in **GeoJSON format**, which can be imported into **MorphoExplainer** for interactive visualization directly on the whole-slide image.


In [None]:
from pathlib import Path
import sys

# -----------------------
# 1. Project root
# -----------------------
PROJECT_ROOT = Path("..").resolve()
print("Project root:", PROJECT_ROOT)

# Make sure PROJECT_ROOT is on sys.path so we can import project modules
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

# -----------------------
# 2. Global config file
# -----------------------
# Configuration file storing label mappings, algorithm hyperparameters,
# and other static settings shared across Stage 2 scripts.
CONFIG_PATH = PROJECT_ROOT / "Interpretation_of_Individual_Slide" / "config.yaml"
print("Config path:", CONFIG_PATH)

## Step 1 – High Contribution Patch Extraction for Independent Test Slide

In [None]:
from Interpretation_of_Individual_Slide import run_high_contri_extraction

# Path to the metadata CSV for independent test slides.
manifest_path = PROJECT_ROOT / "Data_Split/independent_svs_file_mapping.csv"

# Path to the full-data model checkpoint
attn_checkpoint = PROJECT_ROOT / "path/to/final_model.pt"

# Directory containing the feature bags for test slides
h5_base_path = PROJECT_ROOT / "feature_bags"

# Output directory for Step 1.
output_dir = PROJECT_ROOT / "Stage2/high_contri_independent"

coords_csv = run_high_contri_extraction(
    config_path=CONFIG_PATH,
    manifest_path=manifest_path,
    attn_checkpoint=attn_checkpoint,
    h5_base_path=h5_base_path,
    output_dir=output_dir,
    topk_ratio=0.9,
    slide_ids=None,  
)
coords_csv

The outputs are written to output_dir.
The primary output is:

`high_attened_patches_indep_data.csv` — a unified table that aggregates all high-contribution patches extracted from all independent test slides and all prediction classes.

In addition, this unified CSV is automatically split by ground-truth subtype, producing a set of per-label subset CSVs. Each of these subtype-specific files contains only the patches whose slides belong to that ground-truth group.

## Step 2 – Spectrum Mapping for Independent Slides
Note: Spectrum mapping requires a slide-level summary of the full-data model’s predictions on the independent test set (each_slide_result).
This CSV contains one row per slide and must include the following columns:

slide_id

name_label (ground-truth subtype)

name_pred (model-predicted subtype)

correctness (True/False)

This file is used solely to determine the slide’s true subtype, so that spectrum-mapping results can be grouped and saved accordingly.

In [None]:
import yaml
from Interpretation_of_Individual_Slide import run_spectrum_mapping

with open(CONFIG_PATH) as f:
    CFG = yaml.safe_load(f)

# 1) Path to the morphologic spectrum stats file. Exported in Stage 1 Step 8 using `export_spectrum_stats`.
SPECTRUM_STATS_PATH = PROJECT_ROOT / "morphologic_spectrum_stats.pkl"

# 2) Slide-level prediction summary of the full-data model
each_slide_result = Path("/path/to/each_slide_result.csv")

# 3) Output directory for spectrum mapping results
mapping_output_dir = PROJECT_ROOT / "Stage2/spectrum_assign"

# 4) High-contribution patches predicted by the full-data model on the independent test set (Step 1 outputs).  These are per-label subset CSVs.
test_embeds = {
    "Endo":   PROJECT_ROOT / "Stage2/high_contri_independent/independent_Endo.csv",
}

slide_matrix_csv = run_spectrum_mapping(
    config=CFG,
    spectrum_stats_path=SPECTRUM_STATS_PATH,
    test_embeds=test_embeds,
    each_slide_result=each_slide_result,
    output_dir=mapping_output_dir,
)
slide_matrix_csv


## Step 3 – Attention Heatmap + Spectrum-based Local Explanation Generation (Single Slide)

In [None]:
from Interpretation_of_Individual_Slide.spectrum_atten_geojson_gen import run_single_slide_explanation
import yaml

with open(CONFIG_PATH) as f:
    CFG = yaml.safe_load(f)

# 1) path to the input WSI
slide_path = Path("/path/to/wsi.svs")
# 2) directory containing Step 2 mapping results
mapping_root   = PROJECT_ROOT / "Stage2/spectrum_assign"
# 3) directory containing feature bags
h5_base        = PROJECT_ROOT / "feature_bags"
# 4) full-data model checkpoint
attn_ckpt      = PROJECT_ROOT / "path/to/final_model.pt"
# 5) slide-level metadata for independent test slides
manifest_path  = PROJECT_ROOT / "Data_Split/independent_svs_file_mapping.csv"
# 6) output directory for patch tiles + coordinates
tiles_root     = PROJECT_ROOT / "Stage2/spectrum_tiles"
# 7) output directory for final MorphoXAI GeoJSON files
geojson_root   = PROJECT_ROOT / "Stage2/spectrum_geojson"


out_paths = run_single_slide_explanation(
    config=CFG,
    slide_path=slide_path,
    mapping_root=mapping_root,
    h5_base=h5_base,
    tiles_root=tiles_root,
    geojson_root=geojson_root,
    attn_checkpoint=attn_ckpt,
    manifest_path=manifest_path,
    device="cuda",
    save_jpg=True,  # or False if you don't need JPG overlays
)
out_paths


After running the script:

The file {slide_id}_ATTN+SPECTRUM.geojson is written to
geojson_root, containing the final MorphoXAI slide-level explanation.

If save_jpg=True, the script also outputs attention heatmap JPGs for
the slide in the same directory.

The tiles_root directory stores the patch–coordinate table
{slide_id}_spectrum_tiles.csv, which lists all extracted tiles and their
spatial locations.