# Multi-Pathogen Antimicrobial XAI Predictor

**Streamlined notebook for antimicrobial activity prediction with explainable AI**

---

## Overview

This notebook provides **ensemble-averaged predictions and explanations** for antimicrobial activity against:

| Pathogen | Code | Threshold | Description |
|----------|------|-----------|-------------|
| **S. aureus** | `SA` | 0.4164 | Gram-positive bacteria |
| **E. coli** | `EC` | 0.4610 | Gram-negative bacteria |
| **C. albicans** | `CA` | 0.5230 | Fungal pathogen |

## Key Features

- **Ensemble Prediction**: Averages predictions across 5 best models per pathogen
- **Ensemble Attribution**: Averages XAI explanations across all ensemble members
- **Internal Consistency**: Simple binary AGREE/DISAGREE check

## Internal Consistency (Prediction-Explanation Agreement)

Internal consistency assesses whether explanatory evidence logically supports the corresponding prediction using a **simple sign-matching criterion**:

We compute the **mean attribution** across all explanatory units and determine whether this net explanatory stance aligns with the predicted class direction:

| Prediction | Mean Attribution | Agreement |
|------------|------------------|------------|
| ACTIVE (prob >= threshold) | Positive | **AGREE** |
| ACTIVE (prob >= threshold) | Negative | **DISAGREE** |
| INACTIVE (prob < threshold) | Negative | **AGREE** |
| INACTIVE (prob < threshold) | Positive | **DISAGREE** |

## Color Scheme

- **Cyan/Blue**: Positive attribution (increases predicted activity)
- **Orange**: Negative attribution (decreases predicted activity)
- **Intensity**: Deeper shade = higher magnitude attribution

---

## How to Use

1. **Run Cell 1** (Setup) - Loads models
2. **Go to Cell 2** - Enter your SMILES and select pathogens
3. **Run Cell 2** - Get predictions with visualizations

---
⚠️ Model Checkpoint Compatibility Note

The provided model checkpoints were trained using an earlier implementation of the graph construction pipeline. The current scripts have been updated and are fully compatible with these legacy checkpoints. However, you may observe minor quantitative differences in attribution values (e.g., -0.31 vs -0.44) compared to the original training environment.

Important: The qualitative results remain consistent - attribution rankings, substructure importance patterns, and overall interpretations are preserved. Only the absolute magnitude of attribution scores may vary slightly.

For production use with new datasets, we recommend retraining models using the current codebase to ensure full numerical consistency.


In [4]:
# =============================================================================
# CELL 1: SETUP (Run this first)
# =============================================================================

import warnings
warnings.filterwarnings('ignore')

import logging
logging.getLogger().setLevel(logging.ERROR)

print("="*70)
print(" MULTI-PATHOGEN ANTIMICROBIAL XAI PREDICTOR")
print(" Ensemble Prediction + Attribution")
print("="*70)
print("\n Loading helper module...")

# Import the helper module
from xai_antimicrobial_helper import (
    EnsembleModelManager,
    analyze_molecule,
    quick_predict,
    batch_predict,
    PATHOGEN_CONFIGS
)

print(" Initializing ensemble manager...")

# Create the manager
manager = EnsembleModelManager()

# Pre-load all models
print("\n Loading ensemble models (5 best per pathogen):")
for code in ['SA', 'EC', 'CA']:
    config = PATHOGEN_CONFIGS[code]
    print(f"  {config.name}...", end=" ")
    success = manager.load_pathogen_model(code)
    if not success:
        print("")

print("\n" + "="*70)
print(" SETUP COMPLETE!")
print("="*70)
print("\n Go to Cell 2 to enter your SMILES and run predictions.")
print(" Pathogen codes: 'SA' (S. aureus), 'EC' (E. coli), 'CA' (C. albicans)")

2025-11-30 14:30:30 - INFO - config - Starting configuration initialization...
INFO:config:Starting configuration initialization...
2025-11-30 14:30:30 - INFO - config - Logger initialized for Configuration class.
INFO:config:Logger initialized for Configuration class.
2025-11-30 14:30:30 - INFO - config - All SMARTS patterns are valid.
INFO:config:All SMARTS patterns are valid.
2025-11-30 14:30:30 - INFO - model - Initialized for classification with 2 classes


 MULTI-PATHOGEN ANTIMICROBIAL XAI PREDICTOR
 Ensemble Prediction + Attribution

 Loading helper module...
 Initializing ensemble manager...

 Loading ensemble models (5 best per pathogen):
  S. aureus... 

INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:30 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:30 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:30 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:31 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:31 - INFO - config - Starting configuration initialization...
INFO:config:Starting configuration initialization...
2025-11-30 14:30:31 - INFO - config - Logger initialized for Configuration class.
INFO:config:Logger initialized for Configuration class.
2025-11-30 14:30:31 - INFO - config - All SMARTS patterns are valid.
INFO:config:All SMARTS patterns are valid.


  Loaded 5 models for S. aureus
  E. coli... 

2025-11-30 14:30:31 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:31 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:32 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:32 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:32 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:32 - INFO - config - Starting configuration initialization...
INFO:config:Starting configuration initialization...
2025-11-30 14:30:32 - INFO - config - Logger initialized for Configuration class.
INFO:config:Logger initialized for Configuration class.
2025-11-30 14:30:32 - INFO - c

  Loaded 5 models for E. coli
  C. albicans... 

2025-11-30 14:30:32 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:33 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:33 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:33 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes
2025-11-30 14:30:33 - INFO - model - Initialized for classification with 2 classes
INFO:model:Initialized for classification with 2 classes


  Loaded 5 models for C. albicans

 SETUP COMPLETE!

 Go to Cell 2 to enter your SMILES and run predictions.
 Pathogen codes: 'SA' (S. aureus), 'EC' (E. coli), 'CA' (C. albicans)


---

## Cell 2: Enter Your Molecule

**Instructions:**

1. Replace `smiles` with your molecule's SMILES string
2. Modify `pathogens` to select which pathogen(s) to test:
   - Single: `['SA']` or `['EC']` or `['CA']`
   - Dual: `['SA', 'EC']`
   - All: `['SA', 'EC', 'CA']`
3. Run the cell

---

In [5]:
# =============================================================================
# CELL 2: USER INPUT
# =============================================================================

# ┌─────────────────────────────────────────────────────────────────────────┐
# │  EDIT THESE VALUES                                                       │
# └─────────────────────────────────────────────────────────────────────────┘

# Your molecule's SMILES string
smiles = 'O=C(O)C1=CN(CC(O)CN2C=NC([N+](=O)[O-])=C2)C2=C(F)C=C(F)C=C2C1=O'

# Select pathogen(s): ['SA'], ['EC'], ['CA'], or combinations 
pathogens = ['SA', 'EC', 'CA']

# Show scaffold attribution visualizations?
show_attributions = True

# ┌─────────────────────────────────────────────────────────────────────────┐
# │  RUN ANALYSIS                                                            │
# └─────────────────────────────────────────────────────────────────────────┘

results = analyze_molecule(
    smiles=smiles,
    pathogens=pathogens,
    show_attributions=show_attributions,
    manager=manager
)

# =============================================================================
# INTERPRETATION
# =============================================================================
#
# PREDICTION:
#   ACTIVE: prob >= threshold | INACTIVE: prob < threshold
#
# AGREEMENT (Internal Consistency):
#   AGREE: Prediction and explanation align (trustworthy)
#   DISAGREE: Explanation contradicts prediction (use caution)
#
# COLORS:
#   CYAN: Positive attribution | ORANGE: Negative attribution
#   Deeper shade = higher magnitude
# =============================================================================

 SMILES: O=C(O)C1=CN(CC(O)CN2C=NC([N+](=O)[O-])=C2)C2=C(F)C=C(F)C=C2C1=O
 Atoms: 28 | Bonds: 30



 Making ensemble predictions...

 PREDICTION RESULTS (Ensemble Averaged)



 Summary Table:


Unnamed: 0,Pathogen,Prediction,Std Dev,Classification,Threshold,Agreement,Mean Attr
0,S. aureus,0.9968,0.0012,ACTIVE,0.4164,AGREE,0.5138
1,E. coli,0.9704,0.0116,ACTIVE,0.461,AGREE,0.7001
2,C. albicans,0.791,0.1707,ACTIVE,0.523,AGREE,0.5864



----------------------------------------------------------------------
 Attribution Analysis: S. aureus (Ensemble Averaged)
----------------------------------------------------------------------

 Top 2 Scaffolds by Attribution:



----------------------------------------------------------------------
 Attribution Analysis: E. coli (Ensemble Averaged)
----------------------------------------------------------------------

 Top 2 Scaffolds by Attribution:



----------------------------------------------------------------------
 Attribution Analysis: C. albicans (Ensemble Averaged)
----------------------------------------------------------------------

 Top 2 Scaffolds by Attribution:
