### Summery
<pre>
Author           : Amulya Tandur
Project Name     : Detection of Pulmonary Infection using Deep Convolutional Neural Network & Transfer Learning
Method           : 
Tools/Library    : Python, Keras, PyTorch, TensorFlow
Last Update      : 30.08.2025
Comments         : Please use Anaconda editor for convenience of visualization.
</pre>

#### Code
<pre>
GitHub Link      : <a href=https://github.com/AmulyaTandur/Pneumonia_Detection-using-RESNET50>Detection of Pneumonia from Chest X-Ray Images(GitHub)</a>
</pre>

#### Dataset
<pre>
Dataset Name     : Chest X-Ray Images (Pneumonia)
Dataset Link     : <a href=https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia/data>Chest X-Ray Images (Pneumonia) Dataset (Kaggle)</a>
                 
Original Paper   : <a href=https://www.cell.com/cell/fulltext/S0092-8674(18)30154-5>Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning</a>
                   (Daniel S. Kermany, Michael Goldbaum, Wenjia Cai, M. Anthony Lewis, Huimin Xia, Kang Zhang)
                   https://www.cell.com/cell/fulltext/S0092-8674(18)30154-5
</pre>

<!---
#### Library/Tools Version
- Python - v3.6.7
- argparse
- random
- numpy
- shutil
- gc
- re
- Keras - 2.2.4
- Keras-preprocessing - v1.0.5
- TensorFlow - 1.12
- PIL/Pillow - 5.1.0
- Matplotlib - 2.2.2
- scikit-learn - 0.19.1
- mlxtend - 0.14.0
-->

#### Commands / Running Instruction
<pre>
tensorboard --logdir=logs
%config IPCompleter.greedy=True
</pre>

<pre>
<b>Dataset Details</b>
Dataset Name            : Chest X-Ray Images (Pneumonia)
Number of Class         : 2
Number/Size of Images   : Total      : 5856 (1.15 Gigabyte (GB))
                          Training   : 5216 (1.07 Gigabyte (GB))
                          Validation : 320  (42.8 Megabyte (MB))
                          Testing    : 320  (35.4 Megabyte (MB))

<b>Model Parameters</b>
Machine Learning Library: Keras
Base Model              : RESNET50 & Custom Deep Convolutional Neural Network
Optimizers              : Adam
Loss Function           : categorical_crossentropy

<b>For Custom Deep Convolutional Neural Network : </b>
<b>Training Parameters</b>
Batch Size              : 32
Number of Epochs        : 15
Training Time           : 2.5 Hours

<b>Output (Prediction/ Recognition / Classification Metrics)</b>
<!--<b>Validation</b>-->
<b>Testing</b>
Accuracy (F-1) Score    : 90.2%
Loss                    : 0.28
Precision               : 88.0 %
Recall (Pneumonia)      : 92.6 % (For positive class)
<!--Specificity             : -->
</pre>

### Detailed Classification Report
<pre>
        precision    recall  f1-score   support

      NORMAL       0.86      0.79      0.83       234
   PNEUMONIA       0.88      0.93      0.90       390

    accuracy                           0.88       624
   macro avg       0.87      0.86      0.86       624
weighted avg       0.87      0.88      0.87       624
    
The model is showing an accuracy of 88% with F1-score ≈ 90.2%
</pre>

## Import Libraries

In [1]:
import os,shutil 
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns 

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import ResNet50
from  tensorflow.keras.callbacks import Callback,EarlyStopping
from  tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.models import load_model
from tensorflow.keras.layers import Dense,Conv2D,MaxPooling2D,Flatten, GlobalAveragePooling2D, Dropout
from tensorflow.keras.optimizers import Adam

from sklearn.metrics import classification_report, confusion_matrix,roc_curve, auc

from keras.preprocessing import image


In [2]:
# Creates directory, if directory exists removes if remove parameter is set to True 
import cv2

def get_data(data_dir):     # User defined function to fetch data from test, train or val
    data = []
    labels = ['PNEUMONIA', 'NORMAL']
    img_size = 128

    for label in labels: 
         # This will go to 'chest_xray/train/PNEUMONIA' path 
        path = os.path.join(data_dir, label)    # is the actual path of the image folder 
        class_num = labels.index(label)        
        
        for img in os.listdir(path):             
            try:
                img_path = os.path.join(path, img)
                img_array = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)

                # Check if image was read successfully
                if img_array is None:
                    print(f"Warning: Unable to read image: {img_path}")
                    continue

                resized_arr = cv2.resize(img_array, (img_size, img_size))
                data.append([resized_arr, class_num])
            except Exception as e:                   
                print(f"Error: {e} on image {img_path}")

    X = []
    y = []
    for feature, label in data:
        X.append(feature)
        y.append(label)

    X = np.array(X).reshape(-1, img_size, img_size, 1) / 255.0
    y = np.array(y)
    return X, y

### Exploratory Data Analysis

In [None]:
 Visualize images of Pnemonia patients

In [6]:
plt.figure (figsize=(20,10))

for i in range(9):
    plt.subplot(3,3,i+1)    # i+1 -> will go to next line 
    image= plt.imread(os.path.join(pneumonia_dir, pneumonia[i]))   
    plt.axis('off')
    plt.title('X-ray of Pneumonia Patients')
plt.tight_layout()
plt.show()

Visualize images of Normal Individuals

In [None]:
plt.figure (figsize=(20,10))

for i in range(9):
    plt.subplot(3,3,i+1)   
    image_2= plt.imread(os.path.join(normal_dir, normal[i]))  
    plt.imshow(image_2, cmap='gray')
    plt.axis('off')
    plt.title('X-ray of Normal Individual ')
plt.tight_layout()
plt.show()

 Identify classes present in the Training directory and Visualize

In [None]:
# Path to training folder
train_dir = r"D:Deep Learning\Kaggle_Pneumonia detection\chest_xray\chest_xray\train"

# Get list of class names (folders)
class_names = [folder for folder in os.listdir(train_dir) if os.path.isdir(os.path.join(train_dir, folder))]

class_counts = {}   

for class_name in class_names:
    class_path = os.path.join(train_dir, class_name)
    image_files = [img for img in os.listdir(class_path) if img.lower().endswith(('.jpg', '.jpeg', '.png'))]
    class_counts[class_name] = len(image_files)

print("Classes found and their counts:")
for label, count in class_counts.items():
    print(f"{label}: {count}")

In [None]:
import matplotlib.pyplot as plt

# Plotting class distribution
plt.figure(figsize=(6, 4))
plt.bar(class_counts.keys(), class_counts.values(), color=['skyblue', 'salmon'])
plt.title('Class Distribution in Training Data')
plt.xlabel('Class')
plt.ylabel('Number of Images')
plt.grid(False)
plt.tight_layout()
plt.show()


### Image Preprocessing/ Augmentation/ Transformation for Training, Validation, Testing and  Dataset 

In [2]:
# batch_size = 32
# target_size = (224, 224)
# color_mode = "rgb"

train_datagen = ImageDataGenerator(
    rescale=1./255,
    width_shift_range = 0.1,      
    height_shift_range = 0.1,
    zoom_range=0.2,
    shear_range=0.2,
    horizontal_flip=True,
    validation_split = 0.2     
)

test_datagen = ImageDataGenerator(rescale=1./255) # For test set

validation_datagen  = ImageDataGenerator(rescale=1./255)

In [None]:
train_generator = train_datagen.flow_from_directory(r"D:\Deep Learning\Kaggle_Pneumonia detection\chest_xray\chest_xray\train",
    target_size=(224, 224),   # Resizes all images to 224x224 pixels
    batch_size= 32,
    class_mode='categorical',
    shuffle =True,
    seed =42 ,
    color_mode ='rgb'
)

validation_generator = validation_datagen.flow_from_directory(r'D:Deep Learning\Kaggle_Pneumonia detection\chest_xray\chest_xray\val',
    target_size=(224, 224),
    batch_size= 32,
    class_mode='categorical',
    shuffle =False,
    seed =42,
    color_mode ='rgb'
)

test_generator = test_datagen.flow_from_directory(r'D:Deep Learning\Kaggle_Pneumonia detection\chest_xray\chest_xray\test',
    target_size=(224, 224),
    batch_size= 32,
    class_mode='categorical',
    shuffle=False,
    seed =42,
    color_mode ='rgb'                                              
)

Compute Class weights 

In [16]:
from sklearn.utils import class_weight

class_weights = class_weight.compute_class_weight(
    class_weight='balanced',
    classes=np.unique(train_generator.classes),
    y=train_generator.classes
)

class_weight_dict = dict(enumerate(class_weights))  
print("Class weights:", class_weight_dict)      


### Transfer learning using ResNet50 Architecture

In [None]:
from tensorflow.keras.layers import BatchNormalization, Activation

# 1) Load base model
base_model = ResNet50(weights='imagenet', 
                      include_top=False, 
                      input_shape=(224, 224, 3))  

# 2) Freeze it initially
for layer in base_model.layers:
    layer.trainable = False

# 3) Add custom layers on top
x = base_model.output
x = GlobalAveragePooling2D()(base_model.output)    

x = BatchNormalization()(x)             
x = Dense(128)(x)                       
x = BatchNormalization()(x)             
x = Activation('relu')(x)    
x = Dropout(0.3)(x)      

# 4) Output layer for 2 classes 
out = Dense(2, activation='softmax')(x)

# 5) Final model 
model_1 = Model(inputs=base_model.input,outputs= out)

# 6) Compile
model_1.compile(optimizer=Adam(learning_rate=1e-4),   # learning_rate=0.0001
              loss='categorical_crossentropy',
              metrics=['accuracy'] )


model_1.summary()

### Callbacks 

In [None]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau

filepath ='model_1.h5'

early_stopping = EarlyStopping(monitor="val_loss",
                               patience=3,
                               restore_best_weights=True )

checkpoint =ModelCheckpoint(filepath,
                            monitor="val_loss",    
                            save_best_only=True,
                            mode="auto",
                            save_freq="epoch")

reduce_lr = ReduceLROnPlateau(monitor="val_accuracy",  
                              patience=3,
                              verbose=1, 
                              factor=0.5,    
                              min_lr=0.0001)

In [None]:
model_1.compile(
    optimizer=Adam(learning_rate=1e-4),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

In [None]:
# Save the model weights file 
save_path = r"D:\Deep Learning\Kaggle_Pneumonia detection\model_weights"

if not os.path.exists(save_path):
    os.makedirs(save_path)

model_1.save(r"D:\Deep Learning\Kaggle_Pneumonia detection\model_weights\resnet50_model_1.keras", overwrite=True)

Evaluation Metrics for the model_1

In [None]:
from keras.models import load_model
model_1.load_weights(r"D:\Deep Learning\Kaggle_Pneumonia detection\model_weights\resnet50_model_1.keras")

#Evaluation metrics for model.01
resnet_val_eval_01 = model_1.evaluate(validation_generator)
resnet_test_eval_01 = model_1.evaluate(test_generator) 

In [None]:
# Printing out the output 

print(f"Validation Loss: {resnet_val_eval_01[0]}")
print(f"Validation Accuracy: {resnet_val_eval_01[1]}")
print(f"Test Loss: {resnet_test_eval_01[0]}")
print(f"Test Accuracy: {resnet_test_eval_01[1]}")    

### Training/Fine-Tuning Base Model-

In [None]:
base_model = ResNet50(weights='imagenet', 
                      include_top=False, 
                      input_shape=(224, 224, 3))  

# 2) Freeze it initially
for layer in base_model.layers[-30]:
    layer.trainable = True

# 3) Add custom layers on top
x = base_model.output
x = GlobalAveragePooling2D()(base_model.output)    

x = BatchNormalization()(x)             
x = Dense(128)(x)                       
x = BatchNormalization()(x)             
x = Activation('relu')(x)    
x = Dropout(0.3)(x)      

# 4) Output layer for 2 classes 
out = Dense(2, activation='sigmoid')(x)

# 5) Final model 
model_2 = Model(inputs=base_model.input,outputs= out)

# 6) Compile
model_2.compile(optimizer=Adam(learning_rate=1e-4),   # learning_rate=0.0001
              loss='categorical_crossentropy',
              metrics=['accuracy'] )


model_2.summary()

In [None]:
# Save the model weights file 
save_path = r"D:\Deep Learning\Kaggle_Pneumonia detection\model_weights"

if not os.path.exists(save_path):
    os.makedirs(save_path)

model_2.save(r"D:\Deep Learning\Kaggle_Pneumonia detection\model_weights\resnet50_model_2.keras", overwrite=True)

Evaluation Metrics for the model_2

In [None]:
from keras.models import load_model
model_2.load_weights(r"D:\Deep Learning\Kaggle_Pneumonia detection\model_weights\resnet50_model_2.keras")

#Evaluation metrics for model.02
resnet_val_eval_02 = model_2.evaluate(validation_generator)
resnet_test_eval_02 = model_2.evaluate(test_generator) 

In [None]:
print(f"Validation Loss: {resnet_val_eval_02[0]}")
print(f"Validation Accuracy: {resnet_val_eval_02[1]}")
print(f"Test Loss: {resnet_test_eval_02[0]}")
print(f"Test Accuracy: {resnet_test_eval_02[1]}")

### Model Performance Visualization over the Epochs

In [None]:
plt.figure(figsize=(8, 6))
plt.plot(history_ft.history['accuracy'], label='Training Accuracy')
plt.plot(history_ft.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()

### Confusion matrix

In [149]:
true_labels = test_generator.classes
pred_probs = model_2.predict(test_generator, steps=len(test_generator), verbose=1)
pred_labels = np.argmax(pred_probs, axis=1)
cm = confusion_matrix(true_labels, pred_labels)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', # 'fmt = d' returns the integer value in each grid
            xticklabels=test_generator.class_indices.keys(),     # 'cmap' assigns color to each grid
            yticklabels=test_generator.class_indices.keys())
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

data/output/models/2018-12-15 22-32-00/10-val_acc-0.96-val_loss-0.11.hdf5


### Classification Report

In [None]:
print(classification_report(true_labels, pred_labels, target_names=test_generator.class_indices.keys()))

In [126]:
print("results")
y_pred = model.predict_generator(test_generator, steps=len(test_generator), verbose=1)  
y_pred = y_pred.argmax(axis=-1)
y_true=test_generator.classes

results


### Plot ROC Curve and Compute AUC

In [None]:
from sklearn.metrics import roc_curve, auc


test_generator.shuffle = False


true_labels = test_generator.classes

pred_probs = model_2.predict(test_generator, steps=len(test_generator), verbose=1)
pred_pneumonia_probs = pred_probs[:, 1]  # Because , class 1 = PNEUMONIA


fpr, tpr, thresholds = roc_curve(true_labels, pred_pneumonia_probs)
roc_auc = auc(fpr, tpr)


plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='gray', linestyle='--')  # Diagonal reference line
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

### The model’s AUC of 0.94 indicates it has a very high ability to distinguish between the two classes (Normal vs Pneumonia).