# ü´Ä Clinical-Grade Multimodal ECG Training Pipeline  
<span style="color:red">by Ridwan Oladipo, MD | Medical AI Specialist</span>  

Production-ready training pipeline for **12-lead ECG classification**, implementing a **ResNet-1D + tabular fusion network** with:  

- **ResNet-1D signal branch** ‚Üí temporal P‚ÄìQRS‚ÄìT wave & rhythm morphology modeling  
- **Clinical metadata branch** ‚Üí HR/HRV + age/sex + device harmonization  
- **Late fusion** ‚Üí integrated ECG + tabular decision space
- **Binary cross-entropy loss** for multilabel setting  
- **Recall-optimized callbacks** ‚Üí early stopping & checkpointing to maximize **myocardial infarction sensitivity**  
- **Reproducible training** with fixed seeds & official PTB-XL stratified folds  (preventing patient leakage)

üöÄ Trains on **~17k+ ECGs** with structured logging & TensorBoard monitoring.  
>‚öïÔ∏è **Clinically-aligned optimization** ‚Äî tuning for **sensitivity and NPV in myocardial infarction detection**, the metrics that matter most in cardiology.

## üß©Environment Setup and Data Loading

In [1]:
# Essential libraries for deep learning and model training
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv1D, BatchNormalization, Activation, Add
from tensorflow.keras.layers import MaxPooling1D, GlobalAveragePooling1D, Dropout, concatenate
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau, TensorBoard

# For monitoring and evaluation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import datetime
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split

# Load Preprocessed Data
base_dir = "/kaggle/input/ecg-preprocessed"
all_signals = np.load(f"{base_dir}/all_signals.npy", allow_pickle=True)
y_labels = np.load(f"{base_dir}/y_labels.npy", allow_pickle=True)
all_features = pd.read_parquet(f"{base_dir}/all_features.parquet")
model_df_with_labels = pd.read_parquet(f"{base_dir}/model_df_with_labels.parquet")

# Reproducibility
np.random.seed(42)
tf.random.set_seed(42)
print("Random seeds set for reproducibility")

print("=== Training Environment Initialized ===")
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU available: {tf.config.list_physical_devices('GPU')}")

print(f"\n=== Preprocessed Data Verification ===")
print(f"Signals shape: {all_signals.shape}")
print(f"Features shape: {all_features.shape}")
print(f"Labels shape: {y_labels.shape}")
print(f"Classes: {y_labels.shape[1]}")

2025-09-24 20:22:47.360025: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1758745367.382678      93 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1758745367.389419      93 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Random seeds set for reproducibility
=== Training Environment Initialized ===
TensorFlow version: 2.18.0
GPU available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

=== Preprocessed Data Verification ===
Signals shape: (21837, 1000, 12)
Features shape: (21837, 190)
Labels shape: (21837, 5)
Classes: 5


## üîÄTrain/Test Split

In [2]:
# Using PTB-XL Official strat_fold for Train/Test Split
print("\n=== Using PTB-XL Official strat_fold for Train/Test Split ===")

train_idx = model_df_with_labels['strat_fold'] < 9  # folds 1‚Äì8 = train
test_idx = model_df_with_labels['strat_fold'] >= 9  # folds 9‚Äì10 = test

X_ecg_train, X_ecg_test = all_signals[train_idx], all_signals[test_idx]
X_tab_train, X_tab_test = all_features.loc[train_idx], all_features.loc[test_idx]
y_train, y_test = y_labels[train_idx], y_labels[test_idx]

print(f"‚úì Training set: {len(X_ecg_train):,} samples")
print(f"‚úì Test set: {len(X_ecg_test):,} samples")

# Class Distribution Verification
class_names = ['NORM', 'MI', 'STTC', 'CD', 'HYP']
train_class_dist = y_train.mean(axis=0)
test_class_dist = y_test.mean(axis=0)

print("\n=== Class Distribution Verification ===")
for i, cls in enumerate(class_names):
    diff = abs(train_class_dist[i] - test_class_dist[i])
    print(f"{cls}: Train {train_class_dist[i]:.3f} | Test {test_class_dist[i]:.3f} | Diff {diff:.3f}")


=== Using PTB-XL Official strat_fold for Train/Test Split ===
‚úì Training set: 17,441 samples
‚úì Test set: 4,396 samples

=== Class Distribution Verification ===
NORM: Train 0.436 | Test 0.437 | Diff 0.001
MI: Train 0.252 | Test 0.250 | Diff 0.002
STTC: Train 0.240 | Test 0.240 | Diff 0.000
CD: Train 0.224 | Test 0.226 | Diff 0.002
HYP: Train 0.122 | Test 0.121 | Diff 0.000
