# ðŸ«€ Clinical-Grade ECG Model Evaluation  
<span style="color:red">by Ridwan Oladipo, MD | Medical AI Specialist</span>  

Comprehensive validation of **multimodal ECG AI** trained on **PTB-XL (21,837 recordings)** and tested on **4,396 official holdout cases** across 5 cardiac conditions.  

### ðŸ“ˆ Clinical-Grade Performance  
- **MI Sensitivity (Recall):** 96.2% (exceeds >95% clinical goal)  
- **MI Specificity:** 99.97%  
- **MI Precision (PPV):** 99.9%  
- **MI NPV:** 98.7% (exceeds >98% safety threshold)  
- **MI AUC:** 0.999 (near-perfect discriminative power)  
- **Calibration (Brier Score, MI):** 0.008 (excellent reliability)  
- **Macro AUC (All Classes):** 0.95  
- **Macro F1 Score:** 0.81  
- **Overall Accuracy:** 87.4%  
- **Cohenâ€™s Kappa:** 0.82 (substantial agreement)  

### ðŸ“Š Deployment Readiness Validation  
- **Calibration analysis** â†’ probability-risk alignment with reliability curves  
- **Demographic slice testing** â†’ age/sex/device subgroup performance  
- **Robustness validation** â†’ stable under ECG noise perturbations  
- **Generalization testing** â†’ cross-site performance validation  
- **Clinical documentation** â†’ model card with intended use and limitations  

>ðŸŽ¯ **Clinical impact**: Achieves **regulatory-grade reliability** and transforms ECG interpretation from **minutes of manual review** into **seconds of AI-powered screening**, with **regulatory-ready metrics** for cardiac emergency detection.

## ðŸ§ª Environment Setup and Model Loading

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from sklearn.metrics import roc_curve, auc, precision_recall_curve, brier_score_loss, confusion_matrix
from sklearn.calibration import calibration_curve
import shap
import pickle
import os
import json
from scipy.ndimage import zoom
from scipy.signal import find_peaks

# Load trained model and preprocessed data
model = tf.keras.models.load_model("/kaggle/input/ecg-evaluation/ecg_resnet_multimodal_final.keras")
base_dir = "/kaggle/input/ecg-evaluation"

all_signals = np.load(f"{base_dir}/all_signals.npy", allow_pickle=True)
y_labels = np.load(f"{base_dir}/y_labels.npy", allow_pickle=True)
all_features = pd.read_parquet(f"{base_dir}/all_features.parquet")
model_df_with_labels = pd.read_parquet(f"{base_dir}/model_df_with_labels.parquet")

# Train/test split
train_idx = model_df_with_labels['strat_fold'] < 9
test_idx = model_df_with_labels['strat_fold'] >= 9

X_ecg_test = all_signals[test_idx]
X_tab_test = all_features.loc[test_idx]
y_test = y_labels[test_idx]

class_names = ['NORM', 'MI', 'STTC', 'CD', 'HYP']

print("=== ECG Evaluation Environment Initialized ===")
print(f"âœ“ Model loaded successfully")
print(f"âœ“ Test set: {len(X_ecg_test):,} samples")
print(f"âœ“ Feature dimensions: {X_tab_test.shape}")
print(f"âœ“ Classes: {class_names}")


class NpEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (np.integer, np.int64)):
            return int(obj)
        if isinstance(obj, (np.floating, np.float32, np.float64)):
            return float(obj)
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        return super().default(obj)

2025-10-01 07:44:44.485549: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1759304684.721131      36 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1759304684.795345      36 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-10-01 07:45:08.446906: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


=== ECG Evaluation Environment Initialized ===
âœ“ Model loaded successfully
âœ“ Test set: 4,396 samples
âœ“ Feature dimensions: (4396, 190)
âœ“ Classes: ['NORM', 'MI', 'STTC', 'CD', 'HYP']
