<a href="https://colab.research.google.com/github/Sachin20010517/pneumonet-cnn-classifier/blob/main/PneumoNet_Complete_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Name: **Rammini Sachin Ayeshmantha De Silva Dharmawickrama**









UoW id: **w1953261**

# ***PneumoNet: Deep Learning for Pneumonia Detection***

<h2>Part A â€“ Application area review.</h2>
<h4><u>Literature Review: AI Applications in Medical Image Analysis for Pneumonia Detection </u></h4>
<br/>
<h4><b>Introduction</b></h4>
<br/>
<p style="text-align:justify">
Artificial intelligence (AI) has revolutionised medical diagnostics, particularly in radiology where deep learning algorithms demonstrate remarkable capabilities in detecting diseases from medical images. Pneumonia, a leading cause of mortality globally with approximately 2.5 million deaths annually (World Health Organization, 2023), presents a significant diagnostic challenge, particularly in resource-constrained settings. This literature review explores how AI, specifically deep learning techniques, has been applied to pneumonia detection from chest X-rays, examining the evolution of approaches, their effectiveness, and clinical implications.
<br/><br/>
<h4><b>Evolution of AI in Medical Image Analysis</b></h4>
<br/>
<p style="text-align:justify">
The application of AI to medical imaging has progressed significantly since the introduction of computer-aided diagnosis (CAD) systems in the 1980s. Traditional machine learning approaches relied heavily on manual feature engineering, requiring domain experts to define relevant image characteristics (Suzuki, 2017). However, the advent of deep learning, particularly Convolutional Neural Networks (CNNs), has transformed medical image analysis by enabling automatic feature extraction directly from raw image data (Litjens et al., 2017).
<br/><br/>
CNNs have become the predominant architecture for medical image classification due to their ability to learn hierarchical representations of visual patterns. The breakthrough work by Krizhevsky, Sutskever and Hinton (2012) with AlexNet demonstrated that deep CNNs could achieve superior performance in image recognition tasks, paving the way for their application in medical domains. Subsequent architectures such as ResNet (He et al., 2016), DenseNet (Huang et al., 2017), and EfficientNet (Tan and Le, 2019) have further enhanced classification accuracy whilst addressing challenges such as vanishing gradients and computational efficiency.
<br/><br/>


Import Libraries

In [3]:
# data processing, CSV & image file I/O
import os
import re
import requests
from PIL import Image
import pandas as pd
import numpy as np

#libraries for data visualization
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

#preprocessing, modeling & Evaluation
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.layers import Conv2D, BatchNormalization, MaxPool2D, Dropout, Flatten, Dense
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.preprocessing import image

import warnings
warnings.filterwarnings('ignore')

In [2]:
print(tf.__version__)

2.19.0


In [4]:
np.random.seed(42)
tf.random.set_seed(42)

Data Loading

In [None]:
main_path='/kaggle/input/chest-xray-pneumonia/chest_xray'
os.listdir(main_path)

In [None]:
train_dir = os.path.join(main_path, 'train')
val_dir = os.path.join(main_path, 'val')
test_dir = os.path.join(main_path, 'test')

In [None]:
def count_file(dir=None, labels=None):
    for label in labels:
        num_data = len(os.listdir(os.path.join(dir, label)))
        print(f'number of {label} : {num_data}')

labels = ['PNEUMONIA', 'NORMAL']

print('Train Set: \n' + '='*50)
count_file(train_dir, labels)

print('\nValidation Set: \n' + '='*50)
count_file(val_dir, labels)

print('\nTest Set: \n' + '='*50)
count_file(test_dir, labels)


Notice that there are significantly more images classified as pneumonia than normal. This indicates that our dataset is imbalanced. We will address this imbalance later in this notebook.

In [None]:
def get_file_sizes(directory):
    """Collect the size (in KB) of every file within a directory tree."""

    sizes = []

    # Traverse all subdirectories and files
    for root, _, files in os.walk(directory):
        for filename in files:
            path = os.path.join(root, filename)
            size_kb = os.path.getsize(path) / 1024  # convert bytes to KB
            sizes.append({
                'file': path,
                'size_kb': round(size_kb, 3)
            })

    return sizes