<a href="https://colab.research.google.com/github/AndyMDH/pneumonia_detection_cnn/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CSCK506  End of Module: Pneumonia Detection through Convolutional Neural Network (CNN)



## Table of Contents
1. [Introduction](#section-1-introduction)
2. [Data Exploration & Analysis](#section-2-data-exploration--analysis)
3. [Data Preparation](#section-3-data-preparation)
4. [Create Vocabulary](#section-4-create-vocabulary)
5. [Feature Extraction](#section-5-feature-extraction)
6. [Seq2Seq Model Development](#section-6-seq2seq-model-development)
7. [Model Evaluation](#section-7-model-evaluation)
8. [Chatbot Implementation and Manual Testing](#section-8-chatbot-implementation-and-manual-testing)

---
## Introduction

Pneumonia poses a severe threat to human health, being a potentially life-threatening infectious illness that typically affects one or both lungs. It is frequently triggered by bacteria, notably Streptococcus pneumoniae. According to the World Health Organization (WHO), pneumonia is responsible for one in three deaths in India (Varshni et al., 2019). Medical practitioners often rely on X-ray scans to diagnose pneumonia, distinguishing between bacterial and viral types.

This Jupyter notebook delves into the realm of automated pneumonia detection using Convolutional Neural Networks (CNNs). Specifically, it addresses the task of training a CNN model to differentiate between healthy lung scans and those afflicted with pneumonia. The dataset utilised for this endeavor is sourced from the Kaggle competition repository, offering a collection of chest X-ray images categorised as pneumonia-positive and normal.


**This task involves, but is not limited to:**

a. CNN Model Development:

- Write code to train a CNN model using the provided dataset.
- Objective: Achieve optimal performance in distinguishing between healthy and pneumonia-infected lung images.

    - **Key considerations:**
      - Define CNN architecture, including convolution-pooling blocks.
      - Fine-tune parameters like strides, padding, and activation functions for accuracy.
      - Implement strategies to prevent overfitting and ensure model generalization.

b. Training and Evaluation:

- Train the CNN model using the provided training dataset.
Fine-tune hyperparameters using validation data to enhance performance.
- Evaluate the model's accuracy using a separate test dataset to validate pneumonia detection in chest X-ray images.

This Jupyter Notebook was collaboratively prepared by:

- Minh-Dat Andy Ho Huu
- Santiago Fernandez Blanco
- Ismael Saumtally
- Chi Chuen Wan
- Chui Yi Wong

### Import Dependencies

In [None]:
# Standard library imports
import itertools
import logging
import os
import re
import unicodedata
import urllib.request
from collections import defaultdict
from typing import List, Optional, Set, Tuple
from zipfile import ZipFile

import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Conv2D, MaxPooling2D, Flatten, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import precision_recall_curve, roc_curve, accuracy_score, confusion_matrix, precision_score, recall_score
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split

In [None]:
# Check if TensorFlow is using GPU
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Num GPUs Available:  0


In [None]:
warnings.filterwarnings(action='ignore',category=DeprecationWarning)
warnings.filterwarnings(action='ignore',category=FutureWarning)

### Download Pneumonia Dataset

The Corpus can be downloaded here: [Chest X-Ray Images (Pneumonia)](https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia?resource=download)

In [None]:
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

In [None]:
def download_file(url, destination):
    try:
        urllib.request.urlretrieve(url, destination)
        logger.info(f'Downloaded file from {url} to {destination}')
    except Exception as e:
        logger.error(f'Error downloading file: {e}')

def extract_zip(zip_path, extract_path):
    try:
        with ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(extract_path)
        logger.info(f'Extracted {zip_path} to {extract_path}')
    except Exception as e:
        logger.error(f'Error extracting zip file: {e}')

def create_directory(directory):
    if not os.path.exists(directory):
        os.makedirs(directory)
        logger.info(f'Created directory: {directory}')

CORPUS_NAME = 'Chest_XRay'
CORPUS_URL = 'https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia?resource=download'
CORPUS_DIR = os.path.join(CORPUS_NAME)
DATASET_ZIP = os.path.join(CORPUS_DIR, 'Chest_XRay.zip')

# Check if dataset directory already exists
if os.path.exists(CORPUS_DIR):
    print(f'{CORPUS_NAME} already exists')
else:
    if os.path.exists(DATASET_ZIP):
        create_directory(CORPUS_DIR)
        extract_zip(DATASET_ZIP, CORPUS_DIR)
        os.remove(DATASET_ZIP)
        print(f'{CORPUS_NAME} extracted')
    else:
        print(f'To obtain the "{CORPUS_NAME}" dataset, please follow these steps:')
        print(f'1. Manually download the WikiQA dataset from: {CORPUS_URL}')
        print(f'2. Place the downloaded "WikiQACorpus.zip" file in the "{CORPUS_DIR}" folder.')
        print(f'3. Rerun this script after placing the corpus in the correct location.')

To obtain the "Chest_XRay" dataset, please follow these steps:
1. Manually download the WikiQA dataset from: https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia?resource=download
2. Place the downloaded "WikiQACorpus.zip" file in the "Chest_XRay" folder.
3. Rerun this script after placing the corpus in the correct location.


---
### References:

Varshni, D., Thakral, K., Agarwal, L., Nijhawan, R. and Mittal, A. (2019). Pneumonia Detection Using CNN based Feature Extraction. [online] IEEE Xplore. doi:https://doi.org/10.1109/ICECCT.2019.8869364.