## BUSINESS UNDERSTANDING

The purpose of this project is to develop a deep learning model for medical image classification using a large dataset of labeled Optical Coherence Tomography (OCT) and Chest X-Ray images. The project addresses a real-world problem faced by medical professionals, which is the accurate and timely diagnosis of diseases based on medical images. Accurately interpreting medical images is time-consuming and requires specialized training. The specific problem this project aims to solve is the accurate diagnosis of medical conditions using OCT and Chest X-Ray images, which provide vital information that can support clinical decision-making.

The stakeholders for this project are healthcare professionals and patients who require accurate and timely diagnoses to facilitate proper treatment. Healthcare professionals, such as radiologists, ophthalmologists, and other specialists, could use the deep learning model developed through this project to assist in the diagnosis of various medical conditions based on medical images. Patients would benefit from accurate diagnoses and appropriate treatments, leading to improved health outcomes.

Stakeholders in this application of deep learning include healthcare professionals, patients, hospitals and medical centers, medical device manufacturers, and insurance companies. Healthcare professionals can use the deep learning model to assist them in identifying pneumonia cases in medical imaging, leading to more accurate diagnoses and treatment decisions. Patients can benefit from accurate and timely diagnoses and treatments resulting from the use of deep learning in medical imaging. Hospitals and medical centers can use the deep learning model as a tool to assist in the diagnosis and treatment of various medical conditions, leading to better patient outcomes and more efficient use of medical resources. Medical device manufacturers can incorporate deep learning models into their products to provide more accurate and efficient diagnosis of medical conditions. Insurance companies can benefit from more accurate diagnoses and treatments resulting from the use of deep learning in medical imaging, leading to cost savings and improved health outcomes for their customers.

The project's value lies in its potential to improve medical diagnosis accuracy and treatment outcomes. By accurately classifying medical images, medical professionals can make more informed decisions regarding patient care. Additionally, this project has the potential to enhance the efficiency of medical diagnosis and reduce the need for invasive procedures, such as biopsies. Image classification with deep learning can also be used to identify pneumonia cases in medical imaging such as chest X-rays, which can be useful in assisting healthcare professionals to make accurate diagnoses and treatment decisions.


In summary, this project aims to develop a deep learning model for medical image classification to assist in the accurate and timely diagnosis of medical conditions using OCT and Chest X-Ray images. The project's stakeholders include healthcare professionals, patients, hospitals and medical centers, medical device manufacturers, and insurance companies. The project's value lies in its potential to improve medical diagnosis accuracy, treatment outcomes, and the efficiency of medical diagnosis, ultimately leading to better patient outcomes and lower healthcare costs.

## Technical Objectives
1. Build a deep learning model that can classify whether a given patient has pneumonia based on a chest x-ray image.
2. Optimize the model architecture and hyperparameters to achieve the highest possible accuracy on the validation set.
3. Use data augmentation techniques such as rotation, scaling, and flipping to increase the size of the training dataset and improve the model's ability to generalize.
3. Experiment with different optimization algorithms, learning rates, and batch sizes to improve the speed and stability of model training.
4. Evaluate the model's performance using appropriate metrics such as accuracy, precision, recall, and F1 score.

## Business Objectives
1. Provide pediatricians with a tool that can quickly and accurately diagnose pneumonia in children, potentially reducing the number of unnecessary hospital visits and improving patient outcomes.
2. Increase the accessibility of pneumonia diagnosis in low-resource settings where trained medical professionals may not be readily available.
3. Potentially reduce healthcare costs by allowing for earlier diagnosis and treatment of pneumonia in pediatric patients.
4. Contribute to the development of a larger dataset for pneumonia diagnosis that can be used for further research and model development.
5. Develop a model that can be easily integrated into existing hospital or clinic workflows, allowing for streamlined and efficient diagnosis.

## DATA UNDERSTANDING

The data source for this project is  Kermany, Daniel; Zhang, Kang; Goldbaum, Michael (2018), “Large Dataset of Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images”, Mendeley Data, V3.

The dataset contains 5,856 Chest X-Ray images from 2,839 patients, with 3,955 images labeled as "normal" and 1,901 images labeled as "pneumonia".[Dataset](https://data.mendeley.com/datasets/rscbjbr9sj/3)

The data is suitable for the project because it contains labeled medical images that can be used to train a deep learning model to accurately classify medical conditions. Medical imaging is an important tool for diagnosing and treating diseases, and accurate and timely diagnoses are critical for improving patient outcomes.

Descriptive statistics for the features used in the analysis are not applicable in this case since the images are not numerical data. Instead, image pre-processing techniques are applied to transform the images into numerical data that can be used to train a deep learning model.

The inclusion of features in this project is based on the relevance of the medical images in accurately diagnosing medical conditions. The dataset contains images of normal conditions as well as those with CNV, DME, and DRUSEN, which are conditions that affect the retina of the eye. By including these conditions in the dataset, the deep learning model can be trained to accurately identify and classify these conditions based on medical images.



##  Dataset Limitation
One limitation of the dataset is that it may not be representative of all Chest X-Ray images, as the images were obtained from a specific hospital and may not be generalizable to other populations. 

Additionally, the dataset may be imbalanced since there are fewer pneumonia cases compared to normal cases. This could affect the model's ability to accurately classify pneumonia cases. Another limitation is that the dataset does not provide any information about the patients' demographics or medical histories, which may be relevant for predicting pneumonia.

## Related Work
There has been a significant amount of related work on using deep learning models for pneumonia diagnosis from chest x-ray images. Here are a few examples:
Wang et al. (2017) developed a deep learning model based on the Inception architecture to diagnose pneumonia from chest x-ray images. Their model achieved an area under the receiver operating characteristic curve (AUC) of 0.92 on a test set of 279 images, outperforming several other models.
Rajpurkar et al. (2017) released a large dataset of chest x-ray images labeled with various pathologies, including pneumonia. They also developed a deep learning model based on the CheXNet architecture that achieved state-of-the-art performance on the task of pneumonia detection.
Wang et al. (2018) developed a deep learning model based on the DenseNet architecture that could classify chest x-ray images into various pathologies, including pneumonia. Their model achieved an AUC of 0.887 on a test set of 420 images, outperforming several other models.
Chouhan et al. (2020) developed a deep learning model based on the EfficientNet architecture to diagnose pneumonia from chest x-ray images. Their model achieved an accuracy of 95.8% on a test set of 234 images, outperforming several other models.
These studies demonstrate the effectiveness of deep learning models for pneumonia diagnosis from chest x-ray images, as well as the potential for further improvement in accuracy and performance. They also highlight the importance of having access to large and diverse datasets for model training and evaluation.

## DATA PREPARATION

In [1]:
# Import necessary libraries for image classification with deep learning

# OS and File System Libraries
import os # to interact with the operating system
from os import listdir, makedirs, getcwd, remove # for file system operations
from os.path import isfile, join, abspath, exists, isdir, expanduser # for file system path operations

# Suppress future warnings from libraries
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

# Randomization
import random # for randomization functions

# Path Manipulation
from pathlib import Path # to be able to use functions using path

# Data Manipulation Libraries
import pandas as pd # data processing
import numpy as np # linear algebra

# Deep Learning Libraries
import tensorflow as tf # deep learning library
from tensorflow.compat.v1 import Session, ConfigProto, set_random_seed # to set random seeds and configure sessions
from tensorflow.python.client import device_lib # to check the devices available for training

import keras # high-level API for deep learning models
from keras.models import Sequential, Model # Sequential is the simplest way to build models in Keras, and Model allows you to build complex models
from keras.applications.vgg16 import VGG16, preprocess_input # pre-trained deep learning model for image classification
from keras.layers import Conv2D, MaxPool2D, MaxPooling2D, Dense, Flatten, Dropout, BatchNormalization # Layers used in building models
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array# tools for image preprocessing and data augmentation
from keras.callbacks import ReduceLROnPlateau # to reduce the learning rate when the training reaches a plateau
from keras.applications.inception_v3 import InceptionV3 # pre-trained deep learning model for image classification
from keras.constraints import maxnorm # weight constraint to avoid overfitting
from keras import backend as K # to handle backend operations

# OpenCV - computer vision library
import cv2

# Image Manipulation Libraries
from skimage.io import imread # to read images from disk
from skimage.transform import resize # to resize images

# Scikit-learn library for evaluation and data preprocessing
from sklearn.metrics import classification_report, confusion_matrix # tools for evaluation of classification models
from sklearn.preprocessing import LabelEncoder # for data preprocessing

# Visualization Libraries
from PIL import Image # Python Imaging Library - for opening, manipulating and saving images
import imgaug as aug # library for image augmentation
import imgaug.augmenters as iaa # augmentation techniques
import matplotlib.pyplot as plt # for visualization of graphs and charts
import matplotlib.image as mimg # for visualization of images
%matplotlib inline 
import seaborn as sns # for visualization of statistical data
import plotly.express as px # for visualization of graphs and charts


### LOADING DATA