# ResNet Model: 
1. Multi-Label Stratified K-Fold
    - Multi-label imbalance handled: All rare diseases are proportionally represented.
    - View imbalance handled: Each fold has close to the same Frontal/Lateral distribution.
    - Sex & Age: Since they’re balanced already, we don’t need to augment stratification with them.

2. Data Generator: (train ML model in batches to increase time efficiency, since training data >100gb)

3. ResNet Model: fine tuned and pre-trained

4. Train Model & Evaluate

Problems and Solutions:
- No internet pip loading in kaggle, couldn't load iterstat or imagenet pretrained weights for MultiLabelKfold Stratify and using pretrained weights for my ResNet model. **Solution**: Load iterstat package manually from github and use. Load resnet50 pretrained weights manually from keras into kaggle and load weights into resnet model.
- fixed by verifying phone number on kaggle, now i can load iterstrat and imagenet pretrained weights


In [1]:
# Imports
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.image as mpimg
import random
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import Sequence
import cv2
import os
import sys
from io import StringIO
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import Sequence
import tensorflow.keras.applications.resnet50 as resnet
import warnings
warnings.filterwarnings('ignore')

2025-10-05 18:51:15.994104: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1759690276.231613      36 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1759690276.294811      36 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [2]:
print(os.listdir("/kaggle/input"))

# Path to competition dataset
data_dir = "/kaggle/input/grand-xray-slam-division-b"
# Check what files are inside
print('Filenames of the data', os.listdir(data_dir))

['resnet-pretrainedmodel', 'newresnet', 'iterstat', 'grand-xray-slam-division-b']
Filenames of the data ['test2', 'sample_submission_2.csv', 'train2.csv', 'train2']


In [3]:
# Load the training CSV metadata with labels
train = pd.read_csv("/kaggle/input/grand-xray-slam-division-b/train2.csv")

print('Metadata shape:',train.shape)
train.head()

Metadata shape: (108494, 21)


Unnamed: 0,Image_name,Patient_ID,Study,Sex,Age,ViewCategory,ViewPosition,Atelectasis,Cardiomegaly,Consolidation,...,Enlarged Cardiomediastinum,Fracture,Lung Lesion,Lung Opacity,No Finding,Pleural Effusion,Pleural Other,Pneumonia,Pneumothorax,Support Devices
0,00000003_001_001.jpg,3,1,Male,41.0,Frontal,AP,0,1,0,...,1,0,0,1,0,0,0,0,0,0
1,00000004_001_001.jpg,4,1,Female,20.0,Frontal,PA,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,00000004_001_002.jpg,4,1,Female,20.0,Lateral,Lateral,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,00000006_001_001.jpg,6,1,Female,42.0,Frontal,AP,0,0,0,...,0,0,0,0,1,0,0,0,0,0
4,00000010_001_001.jpg,10,1,Female,50.0,Frontal,PA,0,0,0,...,0,0,0,0,1,0,0,0,0,0


# **Multi-Label Stratified Kfold**
- Split to train and val data using multlabelKfold
    - Multi-label imbalance handled: All rare diseases are proportionally represented.
    - View imbalance handled: Each fold has close to the same Frontal/Lateral distribution.
    - Sex & Age: Since they’re balanced already, we don’t need to augment stratification with them

**Note**: cannot dowwnload iterstat pakcage on kaggle.
1. Solution fetch and load Multi-Kfold stratification function in script manually, this way we get the same behavoir of stratification without the need of internet
 https://github.com/trent-b/iterative-stratification/blob/master/iterstrat/ml_stratifiers.py

In [4]:
# 1. Feature & Target Preperation
# Define labels
conditions = [
    'Atelectasis', 'Cardiomegaly', 'Consolidation', 'Edema', 'Enlarged Cardiomediastinum',
    'Fracture', 'Lung Lesion', 'Lung Opacity', 'No Finding', 'Pleural Effusion',
    'Pleural Other', 'Pneumonia', 'Pneumothorax', 'Support Devices'
]
# Features you want
features = ["ViewCategory", "ViewPosition", "Age", "Sex"]

# Encode categorical features
from sklearn.preprocessing import LabelEncoder

train_enc = train.copy()   # train data encoded
for col in ["ViewCategory", "ViewPosition", "Sex"]:  # features that can be encoded
    le = LabelEncoder()
    train_enc[col] = le.fit_transform(train[col].astype(str))

X = train_enc[features].values
y = train[conditions].values

In [5]:
print(X.shape) # 4 features (ViewCategory, ViewPosition, Age, Sex)
print(y.shape)  # 14 conditions

(108494, 4)
(108494, 14)


In [6]:
# 2. Adding ViewBalancing for Stratification: ViewCategory= Frontal, Lateral; since ViewCategory is unbalanced

# One-hot encode ViewCategory and append to y
view_onehot = pd.get_dummies(train["ViewCategory"], prefix="view").values

y_aug = np.hstack([y, view_onehot])  # augmented target matrix (added ViewCategory as y to stratify and reduce bias)

In [9]:
!pip install iterative-stratification

Collecting iterative-stratification
  Downloading iterative_stratification-0.1.9-py3-none-any.whl.metadata (1.3 kB)
Downloading iterative_stratification-0.1.9-py3-none-any.whl (8.5 kB)
Installing collected packages: iterative-stratification
Successfully installed iterative-stratification-0.1.9


In [10]:
# ml kaggle file that includes iterstat package (couldnt pip install iterstat in kaggle)
# import sys
# sys.path.append("/kaggle/input/iterstat/keras/default/1")
# from ml_stratifiers import MultilabelStratifiedKFold
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold

# 3. Multilabel Stratified K-Fold Split
mskf = MultilabelStratifiedKFold(n_splits=5, shuffle=True, random_state=42)

for fold, (train_idx, val_idx) in enumerate(mskf.split(X, y_aug)):
    print(f"Fold {fold}")
    print(" Train:", len(train_idx), " Val:", len(val_idx))

    train_df = train.iloc[train_idx].reset_index(drop=True)
    val_df   = train.iloc[val_idx].reset_index(drop=True)

    # Check condition + view balance
    print("  Train views:", train_df["ViewCategory"].value_counts().to_dict())
    print("  Val views:", val_df["ViewCategory"].value_counts().to_dict())
    print("  Train labels sum:", train_df[conditions].sum().to_dict())
    print("  Val labels sum:", val_df[conditions].sum().to_dict())
    print("-"*60)


Fold 0
 Train: 86795  Val: 21699
  Train views: {'Frontal': 76011, 'Lateral': 10784}
  Val views: {'Frontal': 19003, 'Lateral': 2696}
  Train labels sum: {'Atelectasis': 30890, 'Cardiomegaly': 27984, 'Consolidation': 23715, 'Edema': 21254, 'Enlarged Cardiomediastinum': 30053, 'Fracture': 11662, 'Lung Lesion': 9885, 'Lung Opacity': 39217, 'No Finding': 27391, 'Pleural Effusion': 27655, 'Pleural Other': 5544, 'Pneumonia': 11453, 'Pneumothorax': 6991, 'Support Devices': 29908}
  Val labels sum: {'Atelectasis': 7723, 'Cardiomegaly': 6996, 'Consolidation': 5929, 'Edema': 5313, 'Enlarged Cardiomediastinum': 7513, 'Fracture': 2916, 'Lung Lesion': 2472, 'Lung Opacity': 9805, 'No Finding': 6974, 'Pleural Effusion': 6914, 'Pleural Other': 1387, 'Pneumonia': 2863, 'Pneumothorax': 1747, 'Support Devices': 7477}
------------------------------------------------------------
Fold 1
 Train: 86795  Val: 21699
  Train views: {'Frontal': 76011, 'Lateral': 10784}
  Val views: {'Frontal': 19003, 'Lateral': 

In [11]:
print(train_df.shape)
print(val_df.shape)

(86796, 21)
(21698, 21)


# Data Generator
To handle large datasets, we use a custom generator to load images in batches.

Each image is preprocessed with ResNet-50’s preprocessing function and resized to 224×224 (the default input size for ResNet-50).

In [12]:
class XRayDataGenerator(Sequence):
    def __init__(self, dataframe, batch_size=32, img_size=(224, 224), is_test=False, **kwargs):
        super().__init__(**kwargs)
        self.dataframe = dataframe.reset_index(drop=True)
        self.batch_size = batch_size
        self.img_size = img_size
        self.is_test = is_test
        self.image_dir = '/kaggle/input/grand-xray-slam-division-b/train2/' if not is_test else '/kaggle/input/grand-xray-slam-division-b/test2/'
        self.conditions = conditions
        
        if not os.path.exists(self.image_dir):
            print(f"Error: Directory {self.image_dir} not found.")
            raise FileNotFoundError(f"Directory {self.image_dir} missing.")
    
    def __len__(self):
        return (len(self.dataframe) + self.batch_size - 1) // self.batch_size
    
    def __getitem__(self, idx):
        start = idx * self.batch_size
        end = min(start + self.batch_size, len(self.dataframe))
        batch_data = self.dataframe.iloc[start:end]
        
        images, labels = [], []
        
        for _, row in batch_data.iterrows():
            img_path = os.path.join(self.image_dir, row['Image_name'])
            img = cv2.imread(img_path, cv2.IMREAD_COLOR)
            
            if img is not None and img.shape[0] > 0 and img.shape[1] > 0:
                img = cv2.resize(img, self.img_size)
                img = resnet.preprocess_input(img)
                images.append(img)
                
                if not self.is_test:
                    labels.append(row[self.conditions].values.astype(np.float32))
        
        if not images:
            dummy_img = np.zeros((*self.img_size, 3), dtype=np.float32)
            images.append(dummy_img)
            if not self.is_test:
                labels.append(np.zeros(len(self.conditions), dtype=np.float32))
        
        if not self.is_test:
            return np.array(images), np.array(labels)
        else:
            return np.array(images)

# Create generators
batch_size = 32
train_generator = XRayDataGenerator(train_df, batch_size=batch_size)
val_generator = XRayDataGenerator(val_df, batch_size=batch_size)
print("Data generators created.")

Data generators created.


# Build ResNet-50 Model¶
We load a pretrained ResNet-50 model with weights from ImageNet.
The convolutional base is frozen to retain pretrained features, and we add a custom classifier head:

- Global Average Pooling to reduce feature maps.
- Dense layer for feature learning.
- Dropout to reduce overfitting.
- Sigmoid output for multi-label classification across 14 chest conditions.

In [13]:
!ls -lh /kaggle/input/newresnet

total 91M
-rw-r--r-- 1 nobody nogroup 91M Oct  5 17:08 resnet50_weights_tf_dim_ordering_tf_kernels_notop-2.h5


In [14]:
# from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.applications import ResNet50

# weights_path = '/kaggle/input/newresnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop-2.h5'

def build_resnet_model(num_classes=14):
    # Load Resnet with cached ImageNet weights
    base_model = ResNet50(
        weights="imagenet",  #doesnt work in kaggle
        # weights = None,
        include_top=False, 
        input_shape=(224, 224, 3)
    )
    # load pretrained weights manually
    # print("Loading ResNet50 weights...")
    # base_model.load_weights(weights_path)
    print("✅ Weights loaded successfully.")
    base_model.trainable = False   # freeze backbone for now
    
    # add custom head
    inputs = base_model.input
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(256, activation="relu")(x)
    x = Dropout(0.5)(x)
    outputs = Dense(num_classes, activation="sigmoid")(x)  # multilabel
    
    model = Model(inputs, outputs)
    return model
    
model = build_resnet_model()
model.compile(
    optimizer=Adam(learning_rate=0.0001),
    loss="binary_crossentropy",
    metrics=["AUC"]
)

print("Model Architecture: ResNet50 + Custom Head")
print(f"Total parameters: {model.count_params():,}")
trainable_params = sum([tf.size(v).numpy() for v in model.trainable_variables])
print(f"Trainable parameters: {trainable_params:,}")
print(f"Non-trainable parameters: {model.count_params() - trainable_params:,}")
print("Model compiled successfully!")

I0000 00:00:1759690502.051856      36 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15513 MB memory:  -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step
✅ Weights loaded successfully.
Model Architecture: ResNet50 + Custom Head
Total parameters: 24,115,854
Trainable parameters: 528,142
Non-trainable parameters: 23,587,712
Model compiled successfully!


# Train the Model
- now we train the ResNet model for 3 epoch using the training and validation generators
- the performance is tracked using AUC-ROC, which evaluates each of the 14 conditions

In [None]:
# Train for 3 epochs takes a long time
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=3,
    verbose=1
)

# Display final validation AUC
val_auc = history.history['val_AUC'][-1] if 'val_AUC' in history.history else 0.0
print(f"Final Validation AUC-ROC: {val_auc:.4f}")

Epoch 1/3
[1m 710/2713[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m1:49:00[0m 3s/step - AUC: 0.7333 - loss: 0.5164

# Make Preditions or try better hyperparemeters and submit

In [15]:
import tensorflow as tf
print("GPUs Available:", len(tf.config.list_physical_devices('GPU')))


GPUs Available: 1


In [None]:
# ============================================
# 🧠 GRAND X-RAY SLAM - RESNET50 FINAL TRAINING CELL
# ============================================
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, LambdaCallback
from tensorflow.keras import mixed_precision
import os

# ----------------------------------------
# ⚙️ Enable mixed precision (Tensor Cores)
# ----------------------------------------
mixed_precision.set_global_policy('mixed_float16')
print("✅ Mixed precision enabled for faster GPU performance.")

# ----------------------------------------
# 🧩 Define callbacks for monitoring training
# ----------------------------------------
checkpoint = ModelCheckpoint(
    "best_model.h5",
    monitor="val_auc",       # monitor AUC metric
    mode="max",
    save_best_only=True,
    verbose=1
)

early_stop = EarlyStopping(
    monitor="val_auc",
    mode="max",
    patience=5,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor="val_auc",
    mode="max",
    factor=0.5,
    patience=2,
    min_lr=1e-6,
    verbose=1
)

# ----------------------------------------
# 🪄 Live plot callback (updates after each epoch)
# ----------------------------------------
history_data = {'auc': [], 'val_auc': [], 'loss': [], 'val_loss': []}

def on_epoch_end(epoch, logs):
    history_data['auc'].append(logs.get('auc'))
    history_data['val_auc'].append(logs.get('val_auc'))
    history_data['loss'].append(logs.get('loss'))
    history_data['val_loss'].append(logs.get('val_loss'))

    # Clear previous plots
    plt.clf()
    plt.figure(figsize=(10,4))

    # Plot AUC
    plt.subplot(1,2,1)
    plt.plot(history_data['auc'], label='Train AUC')
    plt.plot(history_data['val_auc'], label='Val AUC')
    plt.xlabel('Epochs'); plt.ylabel('AUC'); plt.legend(); plt.title('Training vs Validation AUC')

    # Plot Loss
    plt.subplot(1,2,2)
    plt.plot(history_data['loss'], label='Train Loss')
    plt.plot(history_data['val_loss'], label='Val Loss')
    plt.xlabel('Epochs'); plt.ylabel('Loss'); plt.legend(); plt.title('Training vs Validation Loss')

    plt.tight_layout()
    plt.show()

live_plot = LambdaCallback(on_epoch_end=on_epoch_end)

# ----------------------------------------
# 🧠 Compile model (from your build_resnet_model function)
# ----------------------------------------
print("✅ Model compiled successfully.")
print(f"Total parameters: {model.count_params():,}")

# ----------------------------------------
# 🚀 Train model
# ----------------------------------------
EPOCHS = 3   # increase later if needed

history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=EPOCHS,
    callbacks=[checkpoint, early_stop, reduce_lr, live_plot],
    verbose=1,
)

# ----------------------------------------
# 🧾 Evaluate and visualize results
# ----------------------------------------
val_auc = history.history.get('val_auc', [0])[-1]
print(f"\n✅ Final Validation AUC: {val_auc:.4f}")

# Save submission-ready model
model.save("final_resnet_model.h5")
print("✅ Model saved as final_resnet_model.h5")

# ----------------------------------------
# 🔮 Predict on test set and create submission
# ----------------------------------------
print("\nGenerating predictions for submission...")
model.load_weights("best_model.h5")  # best model from callbacks
test_generator = XRayDataGenerator(test_df, batch_size=32, is_test=True)
preds = model.predict(test_generator, verbose=1)

submission = pd.DataFrame(preds, columns=conditions)
submission.insert(0, "Image_name", test_df["Image_name"].values)
submission.to_csv("submission.csv", index=False)

print("✅ Submission file created successfully: submission.csv")
print(submission.head())


✅ Mixed precision enabled for faster GPU performance.
✅ Model compiled successfully.
Total parameters: 24,115,854
Epoch 1/3


I0000 00:00:1759690698.423282      91 service.cc:148] XLA service 0x7f0d08002b20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1759690698.426310      91 service.cc:156]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
I0000 00:00:1759690700.025611      91 cuda_dnn.cc:529] Loaded cuDNN version 90300


[1m   1/2713[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m12:24:52[0m 16s/step - AUC: 0.3602 - loss: 1.2275

I0000 00:00:1759690704.364511      91 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m 231/2713[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1:03:22[0m 2s/step - AUC: 0.6361 - loss: 0.6197