# <a id='toc1_'></a>[AI Lung Classification Project](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [AI Lung Classification Project](#toc1_)    
  - [About the Project](#toc1_1_)    
    - [Dataset Used](#toc1_1_1_)    
  - [Managing Libraries](#toc1_2_)    
    - [Install Libraries](#toc1_2_1_)    
    - [Import Libraries](#toc1_2_2_)    
  - [Pre-training Requirements](#toc1_3_)    
    - [Importing Datasets](#toc1_3_1_)    
    - [Calculating Steps](#toc1_3_2_)    
  - [Implementing a DenseNet Model](#toc1_4_)    
  - [Implementing a VGG16 Model](#toc1_5_)    
  - [Implementing a Resnet50 Model](#toc1_6_)    
  - [Implementing an InceptionV3 Model](#toc1_7_)    
  - [Testing the Models](#toc1_8_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_1_'></a>[About the Project](#toc0_)

This project aims to develop an image classification system for chest X-Rays to differentiate between normal lungs and lungs which have pneumonia. The system uses several CNN architectures, including DenseNet121, VGG16, ResNet50, and InceptionV3. Each model is adapted for binary classification to distinguish between pneumonia-afflicted and healthy lung images. 
To prevent overfitting and ensure the best model performance, early stopping and model checkpoints are implemented. These methods monitor validation loss across epochs, halting training when no improvement is seen and saving the best model respectively.

### <a id='toc1_1_1_'></a>[Dataset Used](#toc0_)

"The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert."

The dataset can be found here - https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia

**Acknowledgements**

Data: https://data.mendeley.com/datasets/rscbjbr9sj/2

License: CC BY 4.0

Citation: http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5

## <a id='toc1_2_'></a>[Managing Libraries](#toc0_)

### <a id='toc1_2_1_'></a>[Install Libraries](#toc0_)

In [None]:
!pip install streamlit

### <a id='toc1_2_2_'></a>[Import Libraries](#toc0_)

In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.preprocessing import image
import numpy as np
import json
import os
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.models import Model
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import VGG16
from keras.applications import ResNet50
from tensorflow.keras.models import load_model
from tensorflow.keras.applications import InceptionV3

## <a id='toc1_3_'></a>[Pre-training Requirements](#toc0_)

### <a id='toc1_3_1_'></a>[Importing Datasets](#toc0_)

In [None]:
# Import datasets
train_dir = 'datasets/train'
validation_dir = 'datasets/val'
test_dir = 'datasets/test'

### <a id='toc1_3_2_'></a>[Calculating Steps](#toc0_)

In [None]:
# Counts number of images to be used later in calculating steps
def count_files(directory):
    return sum(len(files) for _, _, files in os.walk(directory))

train_images = count_files(train_dir)
val_images = count_files(validation_dir)
batch_size = 20

## <a id='toc1_4_'></a>[Implementing a DenseNet Model](#toc0_)

In [None]:
def create_densenet_datagen(directory):
    datagen = ImageDataGenerator(rescale=1./255)
    return datagen.flow_from_directory(directory, target_size=(224, 224), batch_size=20, class_mode='binary')

train_generator = create_densenet_datagen(train_dir)
validation_generator = create_densenet_datagen(validation_dir)
test_generator = create_densenet_datagen(test_dir)

base_model = DenseNet121(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

for layer in base_model.layers:
    layer.trainable = False

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)

model_densenet = Model(inputs=base_model.input, outputs=predictions)

model_densenet.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Implement early stopping

early_stopping_dn = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=10,
    verbose=1,
    restore_best_weights=True
)

# Saves checkpoint file

model_checkpoint_dn = ModelCheckpoint(
    filepath='lung_classifier_dn.keras',
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

# Train model

history_densenet = model_densenet.fit(
    train_generator,
    steps_per_epoch=train_images // batch_size,
    epochs=50,
    validation_data=validation_generator,
    validation_steps=val_images // batch_size,
    callbacks=[early_stopping_dn, model_checkpoint_dn]
)

# Save model training history

with open('model_densenet_history.json', 'w') as f:
    json.dump(history_densenet.history, f)

## <a id='toc1_5_'></a>[Implementing a VGG16 Model](#toc0_)

In [None]:
def create_data_generator(directory):
    datagen = ImageDataGenerator(rescale=1./255)
    return datagen.flow_from_directory(directory, target_size=(224, 224), batch_size=20, class_mode='binary')

train_generator = create_data_generator(train_dir)
validation_generator = create_data_generator(validation_dir)
test_generator = create_data_generator(test_dir)

vgg16_base = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

for layer in vgg16_base.layers:
    layer.trainable = False

x = vgg16_base.output
x = Flatten()(x)
x = Dense(512, activation='relu')(x) 
predictions_layer = Dense(1, activation='sigmoid')(x)

vgg16_model = Model(inputs=vgg16_base.input, outputs=predictions_layer)
vgg16_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

early_stopping_vgg = EarlyStopping(
    monitor='val_loss',
    patience=10,
    verbose=1,
    restore_best_weights=True
)

model_checkpoint_vgg = ModelCheckpoint(
    filepath='lung_classifier_vgg.keras',
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

history_vgg16 = vgg16_model.fit(
    train_generator,
    steps_per_epoch=train_images // batch_size,
    epochs=50,
    validation_data=validation_generator,
    validation_steps=val_images // batch_size,
    callbacks=[early_stopping_vgg, model_checkpoint_vgg]
)

with open('model_vgg16_history.json', 'w') as f:
    json.dump(history_densenet.history, f)

## <a id='toc1_6_'></a>[Implementing a Resnet50 Model](#toc0_)

In [None]:
def create_data_generator(directory):
    datagen = ImageDataGenerator(rescale=1./255)
    return datagen.flow_from_directory(directory, target_size=(224, 224), batch_size=20, class_mode='binary')

train_generator = create_data_generator(train_dir)
validation_generator = create_data_generator(validation_dir)
test_generator = create_data_generator(test_dir)

resnet50_base = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

for layer in resnet50_base.layers:
    layer.trainable = False

x = resnet50_base.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
predictions_layer = Dense(1, activation='sigmoid')(x)

resnet50_model = Model(inputs=resnet50_base.input, outputs=predictions_layer)
resnet50_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

early_stopping_resnet = EarlyStopping(
    monitor='val_loss',
    patience=10,
    verbose=1,
    restore_best_weights=True
)

model_checkpoint_resnet = ModelCheckpoint(
    filepath='lung_classifier_rn.keras',
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

history_resnet50 = resnet50_model.fit(
    train_generator,
    steps_per_epoch=train_images // batch_size,
    epochs=50,
    validation_data=validation_generator,
    validation_steps=val_images // batch_size,
    callbacks=[early_stopping_resnet, model_checkpoint_resnet]
)

with open('model_resnet_history.json', 'w') as f:
    json.dump(history_densenet.history, f)

## <a id='toc1_7_'></a>[Implementing an InceptionV3 Model](#toc0_)

In [None]:
def create_data_generator(directory):
    datagen = ImageDataGenerator(rescale=1./255)
    return datagen.flow_from_directory(directory, target_size=(299, 299), batch_size=20, class_mode='binary')

train_generator = create_data_generator(train_dir)
validation_generator = create_data_generator(validation_dir)
test_generator = create_data_generator(test_dir)

inceptionv3_base = InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3))

for layer in inceptionv3_base.layers:
    layer.trainable = False

x = inceptionv3_base.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
predictions_layer = Dense(1, activation='sigmoid')(x)

inceptionv3_model = Model(inputs=inceptionv3_base.input, outputs=predictions_layer)
inceptionv3_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

early_stopping_inception = EarlyStopping(
    monitor='val_loss',
    patience=10,
    verbose=1,
    restore_best_weights=True
)

model_checkpoint_inception = ModelCheckpoint(
    filepath='lung_classifier_inceptionv3.keras',
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

history_inceptionv3 = inceptionv3_model.fit(
    train_generator,
    steps_per_epoch=train_images // batch_size,
    epochs=50,
    validation_data=validation_generator,
    validation_steps=val_images // batch_size,
    callbacks=[early_stopping_inception, model_checkpoint_inception]
)

with open('model_inceptionv3_history.json', 'w') as f:
    json.dump(history_inceptionv3.history, f)

## <a id='toc1_8_'></a>[Testing the Models](#toc0_)

In [None]:
model_paths = {
    'ResNet50': 'lung_classifier_rn.keras',
    'VGG16': 'lung_classifier_vgg.keras',
    'DenseNet': 'lung_classifier_dn.keras',
    'InceptionV3': 'lung_classifier_inceptionv3.keras'
}

for model_name, model_path in model_paths.items():
    if model_name == 'InceptionV3':
        target_size = (299, 299)
    else:
        target_size = (224, 224)
    
    test_datagen = ImageDataGenerator(rescale=1./255)
    test_generator = test_datagen.flow_from_directory(
        test_dir,
        target_size=target_size,
        batch_size=20,
        class_mode='binary',
        shuffle=False
    )
    
    model = load_model(model_path)
    
    loss, accuracy = model.evaluate(test_generator)
    
    print(f"{model_name} Test Loss: {loss:.4f}")
    print(f"{model_name} Test Accuracy: {accuracy:.4f}\n")