# Pneumonia Classification

## Project Overview

This project documents the process of creating a machine learning model to classify pneumonia.

Pneumonia has a high mortality rate, and rapid, accurate diagnosis is critical for timely treatment. This project aims to create a reliable automated tool to assist medical professionals.


## Content
1. Data Preprocessing
2. Model Training
3. Model Evaluation


**Dataset:** https://www.kaggle.com/datasets/pcbreviglieri/pneumonia-xray-images

In [1]:
import kagglehub

path = kagglehub.dataset_download("pcbreviglieri/pneumonia-xray-images")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/pcbreviglieri/pneumonia-xray-images?dataset_version_number=1...


100%|██████████| 1.14G/1.14G [00:54<00:00, 22.6MB/s]

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/pcbreviglieri/pneumonia-xray-images/versions/1


In [2]:
%load_ext tensorboard

In [3]:
import tensorflow as tf
import numpy as np
import random
import os

seed_value = 24
os.environ['PYTHONHASHSEED'] = str(seed_value)
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

BATCH_SIZE = 32
IMAGE_SIZE = (150, 150)
train_dir = f'{path}/train'
valid_dir = f'{path}/val'
test_dir = f'{path}/test'


# Data Preprocessing

### Data Augmentation
To prevent overfitting and build a model that is robust to the real world variations in X-ray imaging, the training images were past through a data augmentation pipeline.
* Horizontal Flips: To ensure the model is not biased towards a specific side (left or right lung).
* Rotations (up to 40 degrees): To account for slight variations in patient positioning during the X-ray procedure.
* Zooms and Translations (up to 20%): To handle differences in patient distance from the scanner and centering within the frame.
* Contrast Adjustments (up to 20%): To make the model resilient to variations in X-ray exposure levels.



### Class Imbalance
Initial analysis and training runs revealed a significant class imbalance, with pneumonia cases outnumbering normal cases by nearly 3-to-1. This was addressed by calculating and applying `class_weight` during model training.

### Training and Evaluation
A robust training and evaluation framework was created:
* **Callbacks:** A comprehensive suite of callbacks was used for every training run:
    * `ModelCheckpoint`: To save only the best version of the model based on validation loss.
    * `EarlyStopping`: To prevent wasting time and energy by stopping training when performance plateaus.
    * `ReduceLROnPlateau`: To automatically stabilize training by reducing the learning rate.
    * `TensorBoard`: For detailed logging and visualization of the training process.

In [4]:
train_data = tf.keras.utils.image_dataset_from_directory(
    train_dir,
    labels="inferred",
    label_mode="binary",
    image_size=IMAGE_SIZE,
    interpolation="nearest",
    batch_size=BATCH_SIZE,
    shuffle=True
)

valid_data = tf.keras.utils.image_dataset_from_directory(
    valid_dir,
    labels="inferred",
    label_mode="binary",
    image_size=IMAGE_SIZE,
    interpolation="nearest",
    batch_size=BATCH_SIZE,
    shuffle=False
)

test_data = tf.keras.utils.image_dataset_from_directory(
    test_dir,
    labels="inferred",
    label_mode="binary",
    image_size=IMAGE_SIZE,
    interpolation="nearest",
    batch_size=BATCH_SIZE,
    shuffle=False
)

Found 4192 files belonging to 2 classes.
Found 1040 files belonging to 2 classes.
Found 624 files belonging to 2 classes.


In [5]:
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),
    tf.keras.layers.RandomRotation(factor=(-40/360, 40/360)),
    tf.keras.layers.RandomZoom(height_factor=0.2, width_factor=0.2),
    tf.keras.layers.RandomTranslation(height_factor=0.2, width_factor=0.2),
    tf.keras.layers.RandomContrast(factor=0.2),
])

In [6]:
train_data = train_data.map(
    lambda x, y: (data_augmentation(x, training=True), y),
    num_parallel_calls=tf.data.AUTOTUNE
)

In [7]:
AUTOTUNE = tf.data.AUTOTUNE

def configure_dataset(ds):
    ds = ds.map(lambda x, y: (tf.cast(x, tf.float32) / 255.0, y), num_parallel_calls=AUTOTUNE)
    ds = ds.cache()
    ds = ds.prefetch(buffer_size=AUTOTUNE)
    return ds

train_data = configure_dataset(train_data)
valid_data = configure_dataset(valid_data)
test_data = configure_dataset(test_data)

In [8]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, TensorBoard, ReduceLROnPlateau
import datetime

def create_callbacks(model_name):
    log_dir = f"logs/fit/{model_name}_{datetime.datetime.now().strftime('%Y%m%d-%H%M%S')}"

    callbacks = [
        EarlyStopping(monitor='val_loss', patience=10),
        ModelCheckpoint(
            filepath=f'best_{model_name}.keras',
            monitor='val_loss',
            save_best_only=True,
            verbose=1
        ),
        ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.2,
            patience=3,
            min_lr=0.00001
        ),
        TensorBoard(log_dir=log_dir),
    ]
    return callbacks

In [9]:
from tensorflow.keras.models import load_model

def evaluate_best_model(model_name):
  model = load_model(f"best_{model_name}.keras")
  test_results = model.evaluate(test_data)
  print("Test Set Evaluation:")
  print(f"Loss: {test_results[0]}")
  print(f"Accuracy: {test_results[1]}")
  print(f"Precision: {test_results[2]}")
  print(f"Recall: {test_results[3]}")
  print(f"F1-Score: {test_results[4]}")

In [10]:
total_samples = 4192
num_normal = 1082
num_opacity = 3110

weight_for_0 = (1 / num_normal) * (total_samples / 2.0)
weight_for_1 = (1 / num_opacity) * (total_samples / 2.0)

class_weights_dict = {0: weight_for_0, 1: weight_for_1}

print(f"Weight for class 0 (Normal): {weight_for_0:.2f}")
print(f"Weight for class 1 (Opacity): {weight_for_1:.2f}")

Weight for class 0 (Normal): 1.94
Weight for class 1 (Opacity): 0.67


# Models

Several model architectures were developed and tested to find the best-performing solution.

1.  **Custom CNN (`model_1`):** A custom CNN using a `Flatten` layer. This served as the first baseline.
2.  **Custom CNN with GAP (`model_2`):** An improved version of the custom model that replaced the `Flatten` layer with `GlobalAveragePooling2D` to reduce parameters and combat overfitting.
3.  **Transfer Learning (`transfer_model_1`):** A `EfficientNetB0` model was used as a feature extractor.
4.  **Fine-Tuned Transfer Learning:** The final step, where the top layers of the `EfficientNetB0` model were unfrozen and trained on the dataset to maximize performance.

In [11]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, BatchNormalization, Dropout, Activation
from tensorflow.keras.optimizers import AdamW
from tensorflow.keras.losses import BinaryCrossentropy

model_1 = Sequential([

    Conv2D(64, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    Conv2D(64, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    MaxPool2D(2),

    Conv2D(128, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    Conv2D(128, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    MaxPool2D(2),
    Dropout(0.2),

    Conv2D(256, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    Conv2D(256, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    MaxPool2D(2),
    Dropout(0.3),


    Flatten(),
    Dense(128, use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model_1.compile(
    optimizer=AdamW(),
    loss=BinaryCrossentropy(),
    metrics=[
        'accuracy',
        tf.keras.metrics.Precision(name='precision'),
        tf.keras.metrics.Recall(name='recall'),
        tf.keras.metrics.F1Score(name='f1_score', threshold=0.5)
    ]
)

model_1.summary()

model_1_callbacks = create_callbacks("model_1")

In [12]:
model_1.fit(train_data, epochs=25, validation_data=test_data, class_weight=class_weights_dict, callbacks=model_1_callbacks)

Epoch 1/25
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 353ms/step - accuracy: 0.8065 - f1_score: 0.8568 - loss: 0.4956 - precision: 0.9432 - recall: 0.7854
Epoch 1: val_loss improved from inf to 2.73412, saving model to best_model_1.keras
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m87s[0m 425ms/step - accuracy: 0.8066 - f1_score: 0.8569 - loss: 0.4952 - precision: 0.9432 - recall: 0.7856 - val_accuracy: 0.6250 - val_f1_score: 0.7692 - val_loss: 2.7341 - val_precision: 0.6250 - val_recall: 1.0000 - learning_rate: 0.0010
Epoch 2/25
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 153ms/step - accuracy: 0.8700 - f1_score: 0.9068 - loss: 0.2876 - precision: 0.9640 - recall: 0.8560
Epoch 2: val_loss improved from 2.73412 to 2.38209, saving model to best_model_1.keras
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 165ms/step - accuracy: 0.8700 - f1_score: 0.9068 - loss: 0.2877 - precision: 0.9640 - recall: 0.8560 - va

<keras.src.callbacks.history.History at 0x7d08416faed0>

In [13]:
evaluate_best_model("model_1")

[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 78ms/step - accuracy: 0.8603 - f1_score: 0.5437 - loss: 0.4777 - precision: 0.5177 - recall: 0.5857
Test Set Evaluation:
Loss: 0.400118887424469
Accuracy: 0.8733974099159241
Precision: 0.9018087983131409
Recall: 0.8948717713356018
F1-Score: 0.8983268737792969


In [14]:
from tensorflow.keras.layers import GlobalAveragePooling2D

model_2 = Sequential([
    Conv2D(64, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    Conv2D(64, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    MaxPool2D(2),

    Conv2D(128, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    Conv2D(128, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    MaxPool2D(2),
    Dropout(0.2),

    Conv2D(256, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    Conv2D(256, 3, padding='same', use_bias=False),
    BatchNormalization(),
    Activation('relu'),
    MaxPool2D(2),
    Dropout(0.3),


    GlobalAveragePooling2D(),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model_2.compile(
    optimizer=AdamW(),
    loss=BinaryCrossentropy(),
    metrics=[
        'accuracy',
        tf.keras.metrics.Precision(name='precision'),
        tf.keras.metrics.Recall(name='recall'),
        tf.keras.metrics.F1Score(name='f1_score', threshold=0.5)
    ]
)

model_2.summary()

model_2_callbacks = create_callbacks("model_2")

In [15]:
model_2.fit(train_data, epochs=25, validation_data=test_data, class_weight=class_weights_dict, callbacks=model_2_callbacks)

Epoch 1/25
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 149ms/step - accuracy: 0.8099 - f1_score: 0.8625 - loss: 0.4866 - precision: 0.9258 - recall: 0.8075
Epoch 1: val_loss improved from inf to 1.81958, saving model to best_model_2.keras
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 170ms/step - accuracy: 0.8100 - f1_score: 0.8626 - loss: 0.4862 - precision: 0.9259 - recall: 0.8076 - val_accuracy: 0.6250 - val_f1_score: 0.7692 - val_loss: 1.8196 - val_precision: 0.6250 - val_recall: 1.0000 - learning_rate: 0.0010
Epoch 2/25
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 152ms/step - accuracy: 0.8604 - f1_score: 0.9000 - loss: 0.3359 - precision: 0.9557 - recall: 0.8505
Epoch 2: val_loss improved from 1.81958 to 0.77853, saving model to best_model_2.keras
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 159ms/step - accuracy: 0.8604 - f1_score: 0.9000 - loss: 0.3361 - precision: 0.9557 - recall: 0.8504 - va

<keras.src.callbacks.history.History at 0x7d07cd3ff4d0>

In [16]:
evaluate_best_model("model_2")

[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 77ms/step - accuracy: 0.8636 - f1_score: 0.5532 - loss: 0.3500 - precision: 0.5155 - recall: 0.6140
Test Set Evaluation:
Loss: 0.28856104612350464
Accuracy: 0.8814102411270142
Precision: 0.8969849348068237
Recall: 0.9153845906257629
F1-Score: 0.9060912728309631


In [17]:
IMAGE_SIZE = (224, 224)

train_data = tf.keras.utils.image_dataset_from_directory(
    train_dir,
    labels="inferred",
    label_mode="binary",
    image_size=IMAGE_SIZE,
    interpolation="nearest",
    batch_size=BATCH_SIZE,
    shuffle=True
)

valid_data = tf.keras.utils.image_dataset_from_directory(
    valid_dir,
    labels="inferred",
    label_mode="binary",
    image_size=IMAGE_SIZE,
    interpolation="nearest",
    batch_size=BATCH_SIZE,
    shuffle=False
)

test_data = tf.keras.utils.image_dataset_from_directory(
    test_dir,
    labels="inferred",
    label_mode="binary",
    image_size=IMAGE_SIZE,
    interpolation="nearest",
    batch_size=BATCH_SIZE,
    shuffle=False
)

Found 4192 files belonging to 2 classes.
Found 1040 files belonging to 2 classes.
Found 624 files belonging to 2 classes.


In [18]:
from tensorflow.keras.applications import EfficientNetB0



base_model = EfficientNetB0(
    weights="imagenet",
    include_top=False,
    input_shape=(*IMAGE_SIZE, 3)
)

base_model.trainable = False

transfer_model_1 = Sequential([
    base_model,
    GlobalAveragePooling2D(),
    Dropout(0.5),
    Dense(1, activation="sigmoid")
])

transfer_model_1.compile(
    optimizer=AdamW(),
    loss=BinaryCrossentropy(),
    metrics=[
        'accuracy',
        tf.keras.metrics.Precision(name='precision'),
        tf.keras.metrics.Recall(name='recall'),
        tf.keras.metrics.F1Score(name='f1_score', threshold=0.5)
    ]
)


transfer_model_1_callbacks = create_callbacks("transfer_model_1")

transfer_model_1.summary

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb0_notop.h5
[1m16705208/16705208[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step


In [19]:
history = transfer_model_1.fit(train_data, epochs=25, validation_data=test_data, class_weight=class_weights_dict, callbacks=transfer_model_1_callbacks)

Epoch 1/25
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 198ms/step - accuracy: 0.7203 - f1_score: 0.7922 - loss: 0.5318 - precision: 0.8826 - recall: 0.7198
Epoch 1: val_loss improved from inf to 0.36029, saving model to best_transfer_model_1.keras
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m77s[0m 359ms/step - accuracy: 0.7209 - f1_score: 0.7927 - loss: 0.5309 - precision: 0.8830 - recall: 0.7204 - val_accuracy: 0.8317 - val_f1_score: 0.8736 - val_loss: 0.3603 - val_precision: 0.8231 - val_recall: 0.9308 - learning_rate: 0.0010
Epoch 2/25
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 218ms/step - accuracy: 0.8802 - f1_score: 0.9150 - loss: 0.2793 - precision: 0.9692 - recall: 0.8665
Epoch 2: val_loss improved from 0.36029 to 0.34152, saving model to best_transfer_model_1.keras
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 263ms/step - accuracy: 0.8803 - f1_score: 0.9150 - loss: 0.2792 - precision: 0.9692 - r

In [20]:
evaluate_best_model("transfer_model_1")

[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 328ms/step - accuracy: 0.7888 - f1_score: 0.5159 - loss: 0.4486 - precision: 0.4576 - recall: 0.6176
Test Set Evaluation:
Loss: 0.3244757652282715
Accuracy: 0.8493589758872986
Precision: 0.8410138487815857
Recall: 0.9358974099159241
F1-Score: 0.8859223127365112


In [21]:
best_transfer_model = tf.keras.models.load_model('best_transfer_model_1.keras')

best_transfer_model.get_layer('efficientnetb0').trainable = True
for layer in best_transfer_model.get_layer('efficientnetb0').layers[:-20]:
    layer.trainable = False


best_transfer_model.compile(
    optimizer=AdamW(learning_rate=1e-5),
    loss=BinaryCrossentropy(),
    metrics=[
        'accuracy',
        tf.keras.metrics.Precision(name='precision'),
        tf.keras.metrics.Recall(name='recall'),
        tf.keras.metrics.F1Score(name='f1_score', threshold=0.5)
    ]
)

In [22]:
initial_epochs = len(history.history['loss'])

fine_tune_epochs = 10
total_epochs = initial_epochs + fine_tune_epochs

history_fine_tune = best_transfer_model.fit(
    train_data,
    epochs=total_epochs,
    initial_epoch=history.epoch[-1],
    validation_data=valid_data,
    class_weight=class_weights_dict,
    callbacks=create_callbacks("transfer_model_1_finetuned")
)

Epoch 13/23
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 195ms/step - accuracy: 0.8258 - f1_score: 0.8690 - loss: 0.3177 - precision: 0.9902 - recall: 0.7743
Epoch 13: val_loss improved from inf to 0.27134, saving model to best_transfer_model_1_finetuned.keras
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m77s[0m 375ms/step - accuracy: 0.8259 - f1_score: 0.8691 - loss: 0.3176 - precision: 0.9902 - recall: 0.7745 - val_accuracy: 0.9048 - val_f1_score: 0.9317 - val_loss: 0.2713 - val_precision: 0.9985 - val_recall: 0.8732 - learning_rate: 1.0000e-05
Epoch 14/23
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 207ms/step - accuracy: 0.8716 - f1_score: 0.9068 - loss: 0.2533 - precision: 0.9945 - recall: 0.8337
Epoch 14: val_loss improved from 0.27134 to 0.20698, saving model to best_transfer_model_1_finetuned.keras
[1m131/131[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 291ms/step - accuracy: 0.8717 - f1_score: 0.9069 - loss: 0.

In [23]:
evaluate_best_model("transfer_model_1_finetuned")

[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 365ms/step - accuracy: 0.7776 - f1_score: 0.5189 - loss: 0.5175 - precision: 0.4450 - recall: 0.6539
Test Set Evaluation:
Loss: 0.3387572467327118
Accuracy: 0.8589743375778198
Precision: 0.8254310488700867
Recall: 0.9820512533187866
F1-Score: 0.8969554901123047


# Final Results and Conclusion


### Definitive Model Comparison

| Metric | Model_1 (CNN w/ Flatten) | Model_2 (CNN w/ GAP) | Transfer Learning Model_1 | Fine-Tuned Transfer Model |
| :--- | :--- | :--- | :--- | :--- |
| **Accuracy** | 87.3% | 88.1% | 84.9% | **85.9%** |
| **Precision**| 90.2% | 89.7% | 84.1% | **82.5%** |
| **Recall** | 89.5% | 91.5% | 93.6% | **98.2%** |
| **F1-Score**| 89.8% | 90.6% | 88.6% | **89.7%** |


### Conclusion

Model_2 (CNN w/ Global Average Pooling) obtained the most balance performance, achieving the highest F1-score (90.6%) and accuracy (88.1%).

In contrast, the Fine-Tuned Transfer Model obtained the highest recall (98.2%). This high sensitivity minimises the chance of a false negative, making it an ideal model for a screening tool, which would be used to flag potentially problematic X-ray scans for an expert to review.
