# Pneumonia Detection using X-ray Images with Transfer Learning (VGG16)

In this notebook, we will build a binary classifier for pneumonia detection using X-ray images. The model will leverage transfer learning using the pre-trained VGG16 model from Keras.

## Step 1: Import Required Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.applications import VGG16
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
import warnings
from datetime import datetime
warnings.filterwarnings('ignore')  # Disable warnings

## Step 2: Hyperparameter Configuration

In [2]:
# Hyperparameters
hyper_dimension = 224  # Use a smaller size to speed up training
hyper_epochs = 30  # Increased number of epochs
hyper_batch_size = 32  # Increased batch size
hyper_channels = 3  # RGB Images for pre-trained models like VGG16
hyper_mode = 'rgb'  # Use RGB color mode for VGG16

## Step 3: Data Augmentation and Preprocessing

In [3]:
# Data Augmentation Settings
train_gen = ImageDataGenerator(rescale=1./255,
                               shear_range=0.2,
                               zoom_range=0.2,
                               rotation_range=20,
                               horizontal_flip=True,
                               fill_mode='nearest')

val_gen = ImageDataGenerator(rescale=1./255)

# Creating training and validation image flows
train_set = train_gen.flow_from_directory('../input/pneumonia-xray-images/train',
                                          target_size=(hyper_dimension, hyper_dimension),
                                          batch_size=hyper_batch_size,
                                          class_mode='binary',
                                          color_mode=hyper_mode)

val_set = val_gen.flow_from_directory('../input/pneumonia-xray-images/val',
                                      target_size=(hyper_dimension, hyper_dimension),
                                      batch_size=hyper_batch_size,
                                      class_mode='binary',
                                      color_mode=hyper_mode)

# Visualizing images from training batch
image_batch = train_set[0][0]
plt.figure(figsize=(20, 5))
for i in range(len(image_batch)):
    plt.subplot(2, 8, i + 1)
    pil_img = array_to_img(image_batch[i])
    plt.imshow(pil_img, cmap='gray')
    plt.axis('off')
plt.tight_layout()
plt.show()

## Step 4: Model Architecture - Transfer Learning with VGG16

In [4]:
# Initialize the VGG16 model for transfer learning
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(hyper_dimension, hyper_dimension, hyper_channels))
base_model.trainable = False  # Freeze the base model layers

# Building the new top layer for our specific task
classifier = Sequential()
classifier.add(base_model)  # Add pre-trained VGG16
classifier.add(GlobalAveragePooling2D())  # Global Pooling to reduce dimensions
classifier.add(Dense(512, activation='relu'))  # Fully connected layer
classifier.add(Dropout(0.5))  # Dropout for regularization
classifier.add(Dense(1, activation='sigmoid'))  # Sigmoid output for binary classification

# Compile the model
classifier.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

## Step 5: Training the Model

In [5]:
# Early stopping to prevent overfitting
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Fitting the model
history = classifier.fit(train_set,
                         epochs=hyper_epochs,
                         validation_data=val_set,
                         steps_per_epoch=len(train_set),
                         validation_steps=len(val_set),
                         callbacks=[early_stopping])

## Step 6: Evaluating the Model

In [6]:
# Create test image flow
test_gen = ImageDataGenerator(rescale=1./255)
test_set = test_gen.flow_from_directory('../input/pneumonia-xray-images/test',
                                        target_size=(hyper_dimension, hyper_dimension),
                                        batch_size=1,
                                        class_mode=None,
                                        color_mode=hyper_mode,
                                        shuffle=False)

# Making predictions
predictions = classifier.predict(test_set, verbose=1)

# Convert predictions to binary values
predictions = (predictions > 0.5).astype(int)

# Confusion Matrix
cm = confusion_matrix(test_set.classes, predictions)
cm_df = pd.DataFrame(cm, index=["Actual Normal", "Actual Pneumonia"], columns=["Predicted Normal", "Predicted Pneumonia"])
print("\nConfusion Matrix:\n", cm_df)

# Classification Report
print("\nClassification Report:\n", classification_report(test_set.classes, predictions))

# ROC AUC Score
auc_score = roc_auc_score(test_set.classes, predictions)
print("\nAUC Score: ", auc_score)

## Step 7: Visualizing Model Performance

In [7]:
# Plotting Training & Validation Accuracy & Loss
plt.figure(figsize=(12, 5))

# Accuracy plot
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

# Loss plot
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

## Step 8: Conclusion

In this project, we have successfully built a binary classification model to detect pneumonia from chest X-ray images using transfer learning with VGG16. The model performed well in terms of classification metrics, and the ROC AUC score demonstrates its potential for clinical use.

In [8]:
# End Time
end_time = datetime.now()
print(f"\nTotal Training Time: {end_time - start_time}")