<a href="https://colab.research.google.com/github/abhilb/DicomImageClassification/blob/main/Build_a_VGG16_Model_Milestone_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# COVID Chest X-Ray Image Classification Using VGG16
### Objective
Train a VGG16 model in a real clinical setting using DICOM images (X-ray) with Keras. Use a custom data generator to input images in DICOM format to a Keras deep learning model. Build the VGG16 model from basic building blocks and train it with the X-ray data to achieve good (> 70%) validation accuracy.

## Step 1. Use a custom image data generator which takes in DICOM images.

* Load the CSV file you saved in Part 1 in a pandas DataFrame
* Split n the DataFrame into train, test DataFrame using the function train_test_split(test_size=0.2) from the sklearn.model_selection library
* Add train/ test/ validation data augmentation parameters in a dictionary form or use the Keras preprocessing function.
* Set training/ test/ validation parameters such as BATCH_SIZE, CLASS_MODE, COLOR_MODE, TARGET_SIZE, and EPOCHS.
* Create a data generator class for reading in DICOM images or use the class provided. With this custom datagenerator class create a train and validation generator.
* Build a VGG 16 model from scratch and train using X-ray images

## Step 2. Build the VGG16 model from scratch using correct layers and activations.
* Compile the model and check model summary.
* Using the model.fit_generator function of Keras, train the model using the train_generator and validation_generator you built.
* Plot the training loss. accuracy, and validation loss. and accuracy values vs. epochs.
* Load a set of 9 random images from the test_generator, run model.predict on them. and visualize the prediction scores along with the test images

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [3]:
import tensorflow as tf
from tensorflow.keras.layers import (
    Conv2D,
    MaxPooling2D,
    Flatten,
    Dense,
    Dropout,
    InputLayer
)
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [11]:
import numpy as np
import pandas as pd
from pathlib import Path
from sklearn.model_selection import train_test_split
!pip install pydicom
from pydicom import dcmread
import matplotlib.pyplot as plt

Collecting pydicom
[?25l  Downloading https://files.pythonhosted.org/packages/f4/15/df16546bc59bfca390cf072d473fb2c8acd4231636f64356593a63137e55/pydicom-2.1.2-py3-none-any.whl (1.9MB)
[K     |▏                               | 10kB 22.2MB/s eta 0:00:01[K     |▍                               | 20kB 30.2MB/s eta 0:00:01[K     |▌                               | 30kB 31.5MB/s eta 0:00:01[K     |▊                               | 40kB 21.9MB/s eta 0:00:01[K     |▉                               | 51kB 13.4MB/s eta 0:00:01[K     |█                               | 61kB 12.5MB/s eta 0:00:01[K     |█▏                              | 71kB 12.3MB/s eta 0:00:01[K     |█▍                              | 81kB 13.4MB/s eta 0:00:01[K     |█▋                              | 92kB 11.1MB/s eta 0:00:01[K     |█▊                              | 102kB 10.9MB/s eta 0:00:01[K     |██                              | 112kB 10.9MB/s eta 0:00:01[K     |██                              | 122kB 10.9

In [7]:
normal_data_dir = Path('./gdrive/MyDrive/Colab Notebooks/content/normal').absolute()
covid_data_dir = Path('./gdrive/MyDrive/Colab Notebooks/content/covid').absolute()
dataset_dir = Path('./gdrive/MyDrive/Colab Notebooks/content/')

In [8]:
normal_data = [x for x in normal_data_dir.rglob("*.dcm")]
covid_data = [x for x in covid_data_dir.rglob("*.dcm")]

### Load the CSV file you saved in Part 1 in a pandas DataFrame

In [9]:
dataset = pd.read_csv(dataset_dir / "dataset.csv", header=None, names=["Path", "Label"])
dataset.head()

Unnamed: 0,Path,Label
0,/content/gdrive/MyDrive/Colab Notebooks/conten...,COVID
1,/content/gdrive/MyDrive/Colab Notebooks/conten...,COVID
2,/content/gdrive/MyDrive/Colab Notebooks/conten...,COVID
3,/content/gdrive/MyDrive/Colab Notebooks/conten...,COVID
4,/content/gdrive/MyDrive/Colab Notebooks/conten...,COVID


### Split n the DataFrame into train, test DataFrame using the function train_test_split(test_size=0.2) from the sklearn.model_selection library

In [12]:
train, test = train_test_split(dataset, test_size=0.2, random_state=42)

In [14]:
print(f"Train: {train.shape}, Test: {test.shape}")
print(f"{type(train)}")

Train: (11046, 2), Test: (2762, 2)
<class 'pandas.core.frame.DataFrame'>


### Add train/ test/ validation data augmentation parameters in a dictionary form or use the Keras preprocessing function.

In [106]:
data_gen_args = dict(featurewise_center=True,
                     featurewise_std_normalization=True,
                     rotation_range=90,
                     width_shift_range=0.1,
                     height_shift_range=0.1,
                     zoom_range=0.2)

### Set training/ test/ validation parameters such as BATCH_SIZE, CLASS_MODE, COLOR_MODE, TARGET_SIZE, and EPOCHS.

In [102]:
BATCH_SIZE = 32
EPOCHS = 100
CLASS_MODE = "categorical"
COLOR_MODE = "gray"
TARGET_SIZE = (224, 224)

### Create a data generator class for reading in DICOM images or use the class provided. With this custom datagenerator class create a train and validation generator.

In [109]:
class DataGenerator:
  def __init__(self, batch_size, train_data_frame, test_data_frame):
    self.batch_size = batch_size
    self.train_df = train_data_frame
    self.test_df = test_data_frame

  


TypeError: ignored

In [88]:
files = dataset.Path.to_numpy().astype(np.str)
dg = data_generator(files, BATCH_SIZE)

### Build a VGG 16 model from scratch and train using X-ray images

In [95]:
def VGG_16():
    model = Sequential()

    model.add(InputLayer(input_shape=(224, 224, 1)))
    # Block 1
    model.add(Conv2D(64, (3, 3),
                     activation='relu',
                     padding = 'same', 
                     name='block1_conv1'))    
    model.add(Conv2D(64, (3, 3),
                     activation='relu',
                     padding = 'same', 
                     name='block1_conv2'))
    model.add(MaxPooling2D((2,2), strides=(2,2), name='block1_pool'))

    
    # Block 2
    model.add(Conv2D(128, (3, 3),
                     activation='relu',
                     padding = 'same', 
                     name='block2_conv1'))    
    model.add(Conv2D(128, (3, 3),
                     activation='relu',
                     padding = 'same', 
                     name='block2_conv2'))
    model.add(MaxPooling2D((2,2), strides=(2,2), name='block2_pool'))

    # Block 3
    model.add(Conv2D(256, (3, 3),
                     activation='relu',
                     padding='same',
                     name='block3_conv1'))
    model.add(Conv2D(256, (3, 3),
                     activation='relu',
                     padding='same',
                     name='block3_conv2'))    
    model.add(Conv2D(256, (3, 3),
                     activation='relu',
                     padding='same',
                     name='block3_conv3'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool'))

    # Block 4
    model.add(Conv2D(512, (3, 3),
                     activation='relu',
                     padding='same',
                     name='block4_conv1'))
    model.add(Conv2D(512, (3, 3),
                     activation='relu',
                     padding='same',
                     name='block4_conv2'))
    model.add(Conv2D(512, (3, 3),
                     activation='relu',
                     padding='same',
                     name='block4_conv3'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool'))

    # Block 5
    model.add(Conv2D(512, (3, 3),
                     activation='relu',
                     padding='same',
                     name='block5_conv1'))
    model.add(Conv2D(512, (3, 3),
                     activation='relu',
                     padding='same',
                     name='block5_conv2'))
    model.add(Conv2D(512, (3, 3),
                     activation='relu',
                     padding='same',
                     name='block5_conv3'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool'))

    model.add(Flatten())
    model.add(Dense(64, activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(1, activation='softmax'))
    return model

In [96]:
model = VGG_16()

In [99]:
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_accuracy',
    mode='max',
    patience=6
)

In [100]:
model.compile(
    loss='binary_crossentropy',
    optimizer=tf.keras.optimizers.Adam(),
    metrics=['accuracy']
)
model.summary()

Model: "sequential_15"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
block1_conv1 (Conv2D)        (None, 224, 224, 64)      640       
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)     