# Comprehensive Benchmarking Pipeline with Excel Export

This notebook provides a complete workflow for:
1. Running comprehensive benchmarks across multiple models and datasets
2. Generating standardized summary files
3. Creating formatted Excel reports with performance rankings

**Models included:**
- AbLangPDB (cosine similarity)
- AbLangRBD (cosine similarity)
- AbLangPre (cosine similarity)
- SEQID (sequence identity)
- CDRH3ID (CDRH3 identity)

**Datasets:**
- SAbDab (structural antibody database)
- DMS (deep mutational scanning)

**Features:**
- Automated testing to ensure Excel generation works
- Optional embedding re-calculation toggle
- Optional threshold reuse for faster re-runs
- Comprehensive error handling and logging

In [1]:
import pandas as pd
import numpy as np

## Configuration Flags

In [1]:
# =============================================================================
# CONFIGURATION FLAGS - MODIFY THESE AS NEEDED
# =============================================================================

# Set to False to use existing parquet files (faster), True to re-calculate embeddings
RECALCULATE_EMBEDDINGS = False

# Model paths - update these paths as needed
MODEL_PATHS = {
    "AbLangPDB": "../../../huggingface/AbLangPDB1/ablangpdb_model.safetensors",
    "AbLangRBD": "../../../huggingface/AbLangRBD1/model.safetensors"
}

# Batch size for embedding generation
BATCH_SIZE = 256

# Output configuration
OUTPUT_FOLDER = "output_csvs"
EXCEL_FILENAME = "comprehensive_benchmarking_results.xlsx"

print(f"Configuration:")
print(f"  • Recalculate embeddings: {RECALCULATE_EMBEDDINGS}")
print(f"  • Output folder: {OUTPUT_FOLDER}")
print(f"  • Excel filename: {EXCEL_FILENAME}")
print(f"  • Batch size: {BATCH_SIZE}")

Configuration:
  • Recalculate embeddings: False
  • Output folder: output_csvs
  • Excel filename: comprehensive_benchmarking_results.xlsx
  • Batch size: 256


## Setup and Testing

In [None]:
import pandas as pd
import torch
import numpy as np
import os
import sys
from pathlib import Path
import warnings
import typing as T
from torch.utils.data import DataLoader, TensorDataset
warnings.filterwarnings('ignore')

# Add parent directory to path for imports
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Import local modules
import calculate_metrics
import calculate_metrics_dms
import models
from excel_generator import generate_results_excel, print_summary_stats
from ablangpaired_model import AbLangPairedConfig, AbLangPaired

# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Ensure output directory exists
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

Using device: cuda


## Helper Functions

In [3]:
def check_file_exists(filepath, description=""):
    """Check if a file exists and return status."""
    exists = os.path.exists(filepath)
    status = "✅" if exists else "❌"
    print(f"  {status} {description}: {filepath}")
    return exists


def embed_with_ablangpaired(input_path: str, output_path: str, model_path: str, model_name: str):
    """Generate embeddings using AbLangPaired models.
    
        Args:
            model_name: str. If "ablangpre" then the model architecture will no longer have the mixer layer.
    """
    print(f"\n🔄 Generating {model_name} embeddings...")
    
    # Load data
    df = pd.read_parquet(input_path)
    if "EMBEDDING" in df.columns:
        df = df.drop(columns=["EMBEDDING"])
    
    # Setup model
    model_config = AbLangPairedConfig(checkpoint_filename=model_path)
    is_ablangpre = model_name == "ablangpre"
    model = AbLangPaired(model_config, device=device, use_pretrained=is_ablangpre)
    
    # Tokenize and embed using enhanced methods
    tokenized_dataloader = models.tokenize_data(df, model_config, batch_size=BATCH_SIZE)
    all_embeds = models.embed_dataloader(tokenized_dataloader, model, device)
    
    # Save
    df['EMBEDDING'] = list(all_embeds.cpu().numpy())
    df.to_parquet(output_path)
    
    print(f"✅ {model_name} embeddings saved to {output_path}")
    return df

print("Helper functions loaded successfully.")

Helper functions loaded successfully.


## Data Preparation and Embedding Generation

### Check Required Base Files

In [4]:
# Check that base dataset files exist
print("📋 Checking base dataset files...")

base_files = {
    "SAbDab base dataset": "ablangpdb_renameddatasets.parquet",
    "DMS base dataset": "ablangrbd_renameddatasets.parquet",
    "SAbDab validation labels": "ablangpdb_train_val_label_mat.pt",
    "SAbDab test labels": "ablangpdb_train_test_label_mat.pt",
    "DMS validation labels": "dms_train_val_label_mat.pt",
    "DMS test labels": "dms_train_test_label_mat.pt"
}

missing_base_files = []
for desc, filepath in base_files.items():
    if not check_file_exists(filepath, desc):
        missing_base_files.append(filepath)

if missing_base_files:
    raise FileNotFoundError(f"Missing required base files: {missing_base_files}")

print("\n✅ All base files found!")

📋 Checking base dataset files...
  ✅ SAbDab base dataset: ablangpdb_renameddatasets.parquet
  ✅ DMS base dataset: ablangrbd_renameddatasets.parquet
  ✅ SAbDab validation labels: ablangpdb_train_val_label_mat.pt
  ✅ SAbDab test labels: ablangpdb_train_test_label_mat.pt
  ✅ DMS validation labels: dms_train_val_label_mat.pt
  ✅ DMS test labels: dms_train_test_label_mat.pt

✅ All base files found!


### Generate Embeddings (if needed)

In [5]:
# Define all embedding files needed
embedding_files = {
    # SAbDab dataset embeddings
    "sabdab_embeddedby_ablangrbd.parquet": ("ablangpdb_renameddatasets.parquet", MODEL_PATHS["AbLangRBD"], "AbLangRBD"),
    "sabdab_embeddedby_ablangpre.parquet": ("ablangpdb_renameddatasets.parquet", None, "AbLangPre"),
    
    # DMS dataset embeddings
    "dms_embeddedby_ablangpdb.parquet": ("ablangrbd_renameddatasets.parquet", MODEL_PATHS["AbLangPDB"], "AbLangPDB"),
    "dms_embeddedby_ablangpre.parquet": ("ablangrbd_renameddatasets.parquet", None, "AbLangPre")
}

print(f"\n{'='*60}")
print("EMBEDDING GENERATION")
print(f"{'='*60}")

if RECALCULATE_EMBEDDINGS:
    print("🔄 Recalculating all embeddings...")
    force_generate = True
else:
    print("📂 Checking for existing embedding files...")
    force_generate = False

for output_file, (input_file, model_path, model_name) in embedding_files.items():
    if force_generate or not os.path.exists(output_file):
        print(f"\n🔄 Generating: {output_file}")
        
        try:
            if model_name == "AbLangPre":
                embed_with_ablangpaired(input_file, output_file, "", "ablangpre")
            else:
                embed_with_ablangpaired(input_file, output_file, model_path, model_name)
        except Exception as e:
            print(f"❌ Error generating {output_file}: {str(e)}")
            continue
    else:
        print(f"✅ Using existing: {output_file}")

print(f"\n✅ Embedding generation complete!")


EMBEDDING GENERATION
📂 Checking for existing embedding files...
✅ Using existing: sabdab_embeddedby_ablangrbd.parquet
✅ Using existing: sabdab_embeddedby_ablangpre.parquet
✅ Using existing: dms_embeddedby_ablangpdb.parquet
✅ Using existing: dms_embeddedby_ablangpre.parquet

✅ Embedding generation complete!


### Verify All Required Files

In [6]:
# Check that all required files now exist
print("\n📋 Verifying all required files...")

all_required_files = {
    # Base datasets
    "SAbDab base (AbLangPDB embeddings)": "ablangpdb_renameddatasets.parquet",
    "DMS base": "ablangrbd_renameddatasets.parquet",
    
    # Generated embedding files
    "SAbDab + AbLangRBD": "sabdab_embeddedby_ablangrbd.parquet",
    "SAbDab + AbLangPre": "sabdab_embeddedby_ablangpre.parquet",
    "DMS + AbLangPDB": "dms_embeddedby_ablangpdb.parquet",
    "DMS + AbLangPre": "dms_embeddedby_ablangpre.parquet",
    
    # Label matrices
    "SAbDab validation labels": "ablangpdb_train_val_label_mat.pt",
    "SAbDab test labels": "ablangpdb_train_test_label_mat.pt",
    "DMS validation labels": "dms_train_val_label_mat.pt",
    "DMS test labels": "dms_train_test_label_mat.pt"
}

missing_files = []
for desc, filepath in all_required_files.items():
    if not check_file_exists(filepath, desc):
        missing_files.append(filepath)

if missing_files:
    print(f"\n⚠️ Warning: {len(missing_files)} files are missing:")
    for file in missing_files:
        print(f"  - {file}")
    print("\nProceeding with available files only.")
else:
    print("\n✅ All required files are available!")


📋 Verifying all required files...
  ✅ SAbDab base (AbLangPDB embeddings): ablangpdb_renameddatasets.parquet
  ✅ DMS base: ablangrbd_renameddatasets.parquet
  ✅ SAbDab + AbLangRBD: sabdab_embeddedby_ablangrbd.parquet
  ✅ SAbDab + AbLangPre: sabdab_embeddedby_ablangpre.parquet
  ✅ DMS + AbLangPDB: dms_embeddedby_ablangpdb.parquet
  ✅ DMS + AbLangPre: dms_embeddedby_ablangpre.parquet
  ✅ SAbDab validation labels: ablangpdb_train_val_label_mat.pt
  ✅ SAbDab test labels: ablangpdb_train_test_label_mat.pt
  ✅ DMS validation labels: dms_train_val_label_mat.pt
  ✅ DMS test labels: dms_train_test_label_mat.pt

✅ All required files are available!


## Comprehensive Model Configuration

In [7]:
# Complete configuration for all model/dataset/metric combinations
CONFIGS = {
    # SAbDab Dataset Configurations
    "ablangpdb_sabdab_cosine": {
        "df_path": "ablangpdb_renameddatasets.parquet",
        "labels_val": "ablangpdb_train_val_label_mat.pt",
        "labels_test": "ablangpdb_train_test_label_mat.pt",
        "model_name": "AbLangPDB",
        "score_type": "cosine",
        "function": calculate_metrics.get_metrics,
        "dataset_type": "sabdab"
    },
    "ablangrbd_sabdab_cosine": {
        "df_path": "sabdab_embeddedby_ablangrbd.parquet",
        "labels_val": "ablangpdb_train_val_label_mat.pt",
        "labels_test": "ablangpdb_train_test_label_mat.pt",
        "model_name": "AbLangRBD",
        "score_type": "cosine",
        "function": calculate_metrics.get_metrics,
        "dataset_type": "sabdab"
    },
    "ablangpre_sabdab_cosine": {
        "df_path": "sabdab_embeddedby_ablangpre.parquet",
        "labels_val": "ablangpdb_train_val_label_mat.pt",
        "labels_test": "ablangpdb_train_test_label_mat.pt",
        "model_name": "AbLangPre",
        "score_type": "cosine",
        "function": calculate_metrics.get_metrics,
        "dataset_type": "sabdab"
    },
    "seqid_sabdab": {
        "df_path": "ablangpdb_renameddatasets.parquet",
        "labels_val": "ablangpdb_train_val_label_mat.pt",
        "labels_test": "ablangpdb_train_test_label_mat.pt",
        "model_name": "SEQID",
        "score_type": "seq_identity",
        "function": calculate_metrics.get_metrics,
        "dataset_type": "sabdab"
    },
    "cdrh3id_sabdab": {
        "df_path": "ablangpdb_renameddatasets.parquet",
        "labels_val": "ablangpdb_train_val_label_mat.pt",
        "labels_test": "ablangpdb_train_test_label_mat.pt",
        "model_name": "CDRH3ID",
        "score_type": "cdrh3_identity",
        "function": calculate_metrics.get_metrics,
        "dataset_type": "sabdab"
    },
    
    # DMS Dataset Configurations
    "ablangpdb_dms_cosine": {
        "df_path": "dms_embeddedby_ablangpdb.parquet",
        "labels_val": "dms_train_val_label_mat.pt",
        "labels_test": "dms_train_test_label_mat.pt",
        "model_name": "AbLangPDB",
        "score_type": "cosine",
        "function": calculate_metrics_dms.get_metrics_dms,
        "dataset_type": "dms"
    },
    "ablangrbd_dms_cosine": {
        "df_path": "ablangrbd_renameddatasets.parquet",
        "labels_val": "dms_train_val_label_mat.pt",
        "labels_test": "dms_train_test_label_mat.pt",
        "model_name": "AbLangRBD",
        "score_type": "cosine",
        "function": calculate_metrics_dms.get_metrics_dms,
        "dataset_type": "dms"
    },
    "ablangpre_dms_cosine": {
        "df_path": "dms_embeddedby_ablangpre.parquet",
        "labels_val": "dms_train_val_label_mat.pt",
        "labels_test": "dms_train_test_label_mat.pt",
        "model_name": "AbLangPre",
        "score_type": "cosine",
        "function": calculate_metrics_dms.get_metrics_dms,
        "dataset_type": "dms"
    },
    "seqid_dms": {
        "df_path": "ablangrbd_renameddatasets.parquet",
        "labels_val": "dms_train_val_label_mat.pt",
        "labels_test": "dms_train_test_label_mat.pt",
        "model_name": "SEQID",
        "score_type": "seq_identity",
        "function": calculate_metrics_dms.get_metrics_dms,
        "dataset_type": "dms"
    },
    "cdrh3id_dms": {
        "df_path": "ablangrbd_renameddatasets.parquet",
        "labels_val": "dms_train_val_label_mat.pt",
        "labels_test": "dms_train_test_label_mat.pt",
        "model_name": "CDRH3ID",
        "score_type": "cdrh3_identity",
        "function": calculate_metrics_dms.get_metrics_dms,
        "dataset_type": "dms"
    }
}

print(f"Configured {len(CONFIGS)} model/dataset/metric combinations:")
for name, config in CONFIGS.items():
    print(f"  • {name}: {config['model_name']} on {config['dataset_type']} using {config['score_type']}")

Configured 10 model/dataset/metric combinations:
  • ablangpdb_sabdab_cosine: AbLangPDB on sabdab using cosine
  • ablangrbd_sabdab_cosine: AbLangRBD on sabdab using cosine
  • ablangpre_sabdab_cosine: AbLangPre on sabdab using cosine
  • seqid_sabdab: SEQID on sabdab using seq_identity
  • cdrh3id_sabdab: CDRH3ID on sabdab using cdrh3_identity
  • ablangpdb_dms_cosine: AbLangPDB on dms using cosine
  • ablangrbd_dms_cosine: AbLangRBD on dms using cosine
  • ablangpre_dms_cosine: AbLangPre on dms using cosine
  • seqid_dms: SEQID on dms using seq_identity
  • cdrh3id_dms: CDRH3ID on dms using cdrh3_identity


### Filter Available Configurations

In [8]:
# Check which configurations can actually run based on available files
available_configs = {}
missing_configs = []

for config_name, config in CONFIGS.items():
    files_to_check = [config["df_path"], config["labels_val"], config["labels_test"]]
    missing_files = [f for f in files_to_check if not os.path.exists(f)]
    
    if not missing_files:
        available_configs[config_name] = config
        print(f"✅ {config_name}: Ready to run")
    else:
        missing_configs.append(config_name)
        print(f"❌ {config_name}: Missing files - {missing_files}")

print(f"\n📊 Summary:")
print(f"  • Available configurations: {len(available_configs)}/{len(CONFIGS)}")
print(f"  • Missing configurations: {len(missing_configs)}")

if missing_configs:
    print(f"\n⚠️ Configurations that will be skipped: {', '.join(missing_configs)}")

if not available_configs:
    raise RuntimeError("❌ No configurations are available to run!")

✅ ablangpdb_sabdab_cosine: Ready to run
✅ ablangrbd_sabdab_cosine: Ready to run
✅ ablangpre_sabdab_cosine: Ready to run
✅ seqid_sabdab: Ready to run
✅ cdrh3id_sabdab: Ready to run
✅ ablangpdb_dms_cosine: Ready to run
✅ ablangrbd_dms_cosine: Ready to run
✅ ablangpre_dms_cosine: Ready to run
✅ seqid_dms: Ready to run
✅ cdrh3id_dms: Ready to run

📊 Summary:
  • Available configurations: 10/10
  • Missing configurations: 0


## Optional: Pre-computed Thresholds

In [None]:
# Optional: Pre-computed thresholds to skip threshold optimization
# Uncomment and set values to reuse previous results for faster execution

PRECOMPUTED_THRESHOLDS = {
    # Comment out as desired to recalculate
    "ablangpdb_sabdab_cosine": {
        "epitope_threshold": 0.5037,
        "antigen_threshold": 0.2697
    },
    "ablangrbd_sabdab_cosine": {
        "epitope_threshold": 0.7912,
        "antigen_threshold": -0.2969
    },
    "ablangpre_sabdab_cosine": {
        "epitope_threshold": 0.6941,
        "antigen_threshold": 0.5851
    },
    "seqid_sabdab": {
        "epitope_threshold": 0.6684,
        "antigen_threshold": 0.3380
    },
    "cdrh3id_sabdab": {
        "epitope_threshold": 0.2727,
        "antigen_threshold": 0.0000
    },
    "ablangpdb_dms_cosine": {
        "epitope_threshold": -0.0419
    },
    "ablangrbd_dms_cosine": {
        "epitope_threshold": 0.8493
    },
    "ablangpre_dms_cosine": {
        "epitope_threshold": 0.6608
    },
    "seqid_dms": {
        "epitope_threshold": 0.6497
    },
    "cdrh3id_dms": {
        "epitope_threshold": 0.1905
    }
}

use_precomputed = len(PRECOMPUTED_THRESHOLDS) > 0
if use_precomputed:
    print(f"🔄 Using precomputed thresholds for {len(PRECOMPUTED_THRESHOLDS)} configurations")
    for config_name, thresholds in PRECOMPUTED_THRESHOLDS.items():
        print(f"  • {config_name}: {thresholds}")
else:
    print("🆕 Computing fresh thresholds for all configurations")

🔄 Using precomputed thresholds for 8 configurations
  • ablangpdb_sabdab_cosine: {'epitope_threshold': 0.5037, 'antigen_threshold': 0.2697}
  • ablangrbd_sabdab_cosine: {'epitope_threshold': 0.7912, 'antigen_threshold': -0.2969}
  • seqid_sabdab: {'epitope_threshold': 0.6684, 'antigen_threshold': 0.338}
  • cdrh3id_sabdab: {'epitope_threshold': 0.2727, 'antigen_threshold': 0.0}
  • ablangpdb_dms_cosine: {'epitope_threshold': -0.0419}
  • ablangrbd_dms_cosine: {'epitope_threshold': 0.8493}
  • seqid_dms: {'epitope_threshold': 0.6497}
  • cdrh3id_dms: {'epitope_threshold': 0.1905}


## Run Comprehensive Benchmarks

In [10]:
# Execute all available configurations
execution_results = {}
failed_configs = []

print(f"\n{'='*70}")
print(f"COMPREHENSIVE BENCHMARK EXECUTION")
print(f"{'='*70}")
print(f"Total configurations to run: {len(available_configs)}")

for i, (config_name, config) in enumerate(available_configs.items(), 1):
    print(f"\n{'='*70}")
    print(f"[{i}/{len(available_configs)}] Running: {config_name}")
    print(f"Model: {config['model_name']}, Dataset: {config['dataset_type']}, Score: {config['score_type']}")
    print(f"{'='*70}")
    
    try:
        # Prepare arguments
        args = {
            "df_path": config["df_path"],
            "labels_file_val": config["labels_val"],
            "labels_file_test": config["labels_test"],
            "score_type": config["score_type"],
            "model_name": config["model_name"],
            "output_folder": OUTPUT_FOLDER
        }
        
        # Add precomputed thresholds if available
        if config_name in PRECOMPUTED_THRESHOLDS:
            thresholds = PRECOMPUTED_THRESHOLDS[config_name]
            if config["dataset_type"] == "sabdab":
                if "epitope_threshold" in thresholds:
                    args["epitope_threshold"] = thresholds["epitope_threshold"]
                if "antigen_threshold" in thresholds:
                    args["antigen_threshold"] = thresholds["antigen_threshold"]
            elif config["dataset_type"] == "dms":
                if "epitope_threshold" in thresholds:
                    args["epitope_threshold"] = thresholds["epitope_threshold"]
        
        # Execute the benchmark
        config["function"](**args)
        
        execution_results[config_name] = "✅ Success"
        print(f"\n✅ [{i}/{len(available_configs)}] {config_name} completed successfully!")
        
    except Exception as e:
        error_msg = f"❌ Error: {str(e)}"
        execution_results[config_name] = error_msg
        failed_configs.append(config_name)
        print(f"\n❌ [{i}/{len(available_configs)}] {config_name} failed: {str(e)}")
        continue

print(f"\n{'='*70}")
print("COMPREHENSIVE BENCHMARK EXECUTION SUMMARY")
print(f"{'='*70}")

for config_name, result in execution_results.items():
    print(f"{config_name:30} {result}")

successful_configs = len(available_configs) - len(failed_configs)
print(f"\n📊 Results:")
print(f"  • Successful: {successful_configs}/{len(available_configs)}")
print(f"  • Failed: {len(failed_configs)}")

if failed_configs:
    print(f"\n⚠️ Failed configurations: {', '.join(failed_configs)}")
else:
    print("\n🎉 All available configurations completed successfully!")


COMPREHENSIVE BENCHMARK EXECUTION
Total configurations to run: 10

[1/10] Running: ablangpdb_sabdab_cosine
Model: AbLangPDB, Dataset: sabdab, Score: cosine
Using provided Epitope threshold: 0.5037
Using provided Antigen threshold: 0.2697
Preparing data with score_type: 'cosine' for TRAIN vs TEST...

--- Analyzing Epitope-level performance on TEST data (Positive label >= 0.5) ---
ROC-AUC: 0.8090, Average Precision: 0.5419, F1 Score: 0.5567

--- Analyzing Antigen-level performance on TEST data (Positive label >= 0.2) ---
ROC-AUC: 0.7887, Average Precision: 0.5084, F1 Score: 0.5044

Results saved to output_csvs
Summary metrics saved to output_csvs/AbLangPDB_sabdab_ep_summarymetrics.txt and output_csvs/AbLangPDB_sabdab_ag_summarymetrics.txt

✅ [1/10] ablangpdb_sabdab_cosine completed successfully!

[2/10] Running: ablangrbd_sabdab_cosine
Model: AbLangRBD, Dataset: sabdab, Score: cosine
Using provided Epitope threshold: 0.7912
Using provided Antigen threshold: -0.2969
Preparing data with s

## Generate Comprehensive Excel Report

In [11]:
# Generate summary statistics
print("\n📊 Generating summary statistics...")
print_summary_stats(OUTPUT_FOLDER)

print("\n" + "="*70)
print("GENERATING COMPREHENSIVE EXCEL REPORT")
print("="*70)

try:
    # Generate the Excel file
    excel_path = generate_results_excel(
        output_folder=OUTPUT_FOLDER,
        excel_filename=EXCEL_FILENAME
    )
    
    print(f"\n🎉 Comprehensive Excel report generated successfully!")
    print(f"📁 File location: {excel_path}")
    print(f"📏 File size: {os.path.getsize(excel_path):,} bytes")
    
    # Provide usage instructions
    print(f"\n📖 Excel Report Contents:")
    print(f"  • Models as rows (AbLangPDB, AbLangRBD, AbLangPre, SEQID, CDRH3ID)")
    print(f"  • Datasets grouped as column headers (SAbDab, DMS)")
    print(f"  • Metrics: ROC-AUC, Average Precision, F1 Score")
    print(f"  • Best performance: Bold formatting")
    print(f"  • Second best: Italic formatting")
    print(f"  • Values rounded to 4 decimal places")
    
except Exception as e:
    print(f"❌ Error generating Excel report: {str(e)}")
    print("\nDebugging information:")
    print(f"  • Output folder: {OUTPUT_FOLDER}")
    print(f"  • Files in folder: {len(os.listdir(OUTPUT_FOLDER))}")
    
    # List summary files found
    import glob
    summary_files = glob.glob(os.path.join(OUTPUT_FOLDER, "*summarymetrics.txt"))
    print(f"  • Summary files found: {len(summary_files)}")
    for f in summary_files[:5]:  # Show first 5
        print(f"    - {os.path.basename(f)}")
    if len(summary_files) > 5:
        print(f"    - ... and {len(summary_files)-5} more")


📊 Generating summary statistics...

=== Summary Statistics ===
Total summary files found: 15
Unique models: 5 (AbLangPDB, AbLangPre, AbLangRBD, CDRH3ID, SEQID)
Unique datasets: 3 (dms, sabdab_ag, sabdab_ep)
Unique score types: 3 (cdrh3_identity, cosine, seq_identity)
Best ROC_AUC: AbLangRBD on dms (0.8442)
Best Average_Precision: AbLangRBD on dms (0.6401)
Best F1_Score: AbLangRBD on dms (0.5926)

GENERATING COMPREHENSIVE EXCEL REPORT
Collecting summary metrics from output_csvs...
Found 15 summary files
Creating pivot table...
Pivot table created with 5 models and 9 metric columns
Ranking values for formatting...
Exporting to Excel: output_csvs/comprehensive_benchmarking_results.xlsx
✅ Excel file generated successfully: output_csvs/comprehensive_benchmarking_results.xlsx

🎉 Comprehensive Excel report generated successfully!
📁 File location: output_csvs/comprehensive_benchmarking_results.xlsx
📏 File size: 5,435 bytes

📖 Excel Report Contents:
  • Models as rows (AbLangPDB, AbLangRBD, Ab

## Final Summary and Analysis

In [12]:
print("\n" + "="*70)
print("COMPREHENSIVE PIPELINE COMPLETION SUMMARY")
print("="*70)

print(f"\n🔧 Configuration:")
print(f"  • Recalculated embeddings: {RECALCULATE_EMBEDDINGS}")
print(f"  • Used precomputed thresholds: {use_precomputed}")
print(f"  • Batch size: {BATCH_SIZE}")

print(f"\n📈 Benchmarking Results:")
print(f"  • Total configurations possible: {len(CONFIGS)}")
print(f"  • Configurations attempted: {len(available_configs)}")
print(f"  • Successful runs: {successful_configs}")
print(f"  • Failed runs: {len(failed_configs)}")

if os.path.exists(os.path.join(OUTPUT_FOLDER, EXCEL_FILENAME)):
    print(f"\n📊 Excel Report:")
    print(f"  • Status: ✅ Generated successfully")
    print(f"  • Location: {os.path.join(OUTPUT_FOLDER, EXCEL_FILENAME)}")
    print(f"  • Ready for analysis and sharing")
else:
    print(f"\n📊 Excel Report:")
    print(f"  • Status: ❌ Generation failed")
    print(f"  • Check error messages above")

print(f"\n🔬 Models Benchmarked:")
models_run = set()
for config_name, result in execution_results.items():
    if "Success" in result:
        config = available_configs[config_name]
        models_run.add(f"{config['model_name']} ({config['score_type']})")
        
for model in sorted(models_run):
    print(f"  • {model}")

print(f"\n📊 Datasets Analyzed:")
datasets_run = set()
for config_name, result in execution_results.items():
    if "Success" in result:
        config = available_configs[config_name]
        datasets_run.add(config['dataset_type'].upper())
        
for dataset in sorted(datasets_run):
    print(f"  • {dataset}")

print(f"\n🎯 Next Steps:")
print(f"  1. 📊 Open the Excel report for comprehensive performance comparison")
print(f"  2. 🔍 Identify best-performing models for each dataset")
print(f"  3. 📈 Analyze performance patterns across different similarity metrics")
print(f"  4. 📋 Share results with your research team")
print(f"  5. 📝 Consider additional analyses based on findings")

if failed_configs:
    print(f"\n⚠️ Failed Configurations to Investigate:")
    for config in failed_configs:
        print(f"  • {config}: {execution_results[config]}")

if missing_configs:
    print(f"\n❓ Configurations Not Attempted (Missing Files):")
    for config in missing_configs:
        print(f"  • {config}")

print(f"\n🏁 Comprehensive benchmarking pipeline completed!")
print(f"\n📄 Report: {os.path.join(OUTPUT_FOLDER, EXCEL_FILENAME)}")


COMPREHENSIVE PIPELINE COMPLETION SUMMARY

🔧 Configuration:
  • Recalculated embeddings: False
  • Used precomputed thresholds: True
  • Batch size: 256

📈 Benchmarking Results:
  • Total configurations possible: 10
  • Configurations attempted: 10
  • Successful runs: 10
  • Failed runs: 0

📊 Excel Report:
  • Status: ✅ Generated successfully
  • Location: output_csvs/comprehensive_benchmarking_results.xlsx
  • Ready for analysis and sharing

🔬 Models Benchmarked:
  • AbLangPDB (cosine)
  • AbLangPre (cosine)
  • AbLangRBD (cosine)
  • CDRH3ID (cdrh3_identity)
  • SEQID (seq_identity)

📊 Datasets Analyzed:
  • DMS
  • SABDAB

🎯 Next Steps:
  1. 📊 Open the Excel report for comprehensive performance comparison
  2. 🔍 Identify best-performing models for each dataset
  3. 📈 Analyze performance patterns across different similarity metrics
  4. 📋 Share results with your research team
  5. 📝 Consider additional analyses based on findings

🏁 Comprehensive benchmarking pipeline completed!

📄 