<a href="https://colab.research.google.com/github/anandababugudipudi/COVID-19-Detection-using-Chest-Xray/blob/master/COVID_19_Detection_using_Chest_XRays.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**COVID-19 Detection using Chest X-Rays**

The pandemic, originated by Novel Coronavirus-2019 (COVID-19), continuing its devastating effect on the health, well-being, and economy of the global population. The early detection and diagnosis of COVID-19 and the accurate separation of COVID infected cases at the lowest cost in the early stage is the main challenge in the current scenario. A critical step to restrain this pandemic is the early detection of COVID-19 in the human body, to constraint the exposure and control the spread of the virus. Chest X-Rays are one of the non-invasive tools to detect this disease as the manual PCR diagnosis process is quite tedious and time-consuming. Concerning the novelty of the disease, diagnostic methods based on radiological images suffer from shortcomings despite their many applications in diagnostic centers. Accordingly, medical and computer researchers tend to use machine-learning models to analyze radiology images.
<br><br>
![COVID-19](https://eyewire.news/wp-content/uploads/sites/2/2020/03/banner.png)
<br><br>

In this project, we have attempted to develop an automated COVID-19 classifer, utilizing available COVID and non-COVID Chest X-Ray datasets. 

###**The following are the steps involved in this project:**
- Importing the necessary packages
- Data Collection and Preprocessing
- Building a CNN based model in Keras
- Compiling the Model
- Processing the Training and Testing Images
- Training the Model
- Evaluating the Model
- Saving the Model
- Creating a Classification Method


###**Let's start implementing the above steps one by one:**

###**Importing the necessary packages:**

In [None]:
# Import packages
import numpy as np
import pandas as pd
from keras.layers import *
from keras.models import *
from keras.preprocessing import image
from keras.metrics import accuracy, binary_crossentropy
from keras.optimizers import Adam
from keras.regularizers import l2
import os
from keras.models import load_model
from keras.preprocessing import image

###**Data Collection and Preprocessing:**
####**Data Collection:**
For building this project we have to mainly rely on two types of Chest X-Rays. They are:
1. **COVID** infected patients Chest X-Rays,
2. **Non-COVID** patients Chest X-Rays like Pneumonia, Tuberculosis etc.

For **COVID** X-Rays we have downloaded the Chest X-Rays from [GitHub.](https://github.com/ieee8023/covid-chestxray-dataset) (Mix of COVID Positive and other diseases)

And for **Non-COVID** X-Rays we have downloaded the data from [Kaggle.](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia) (Chest X-Rays of Pneunomia patients)

####**Data Preprocessing:**
There are a total of 930 Chest X-Rays available at the time of doing this project, out of which 196 are labelled as **COVID Positive.** So we have extracted those images from the complete dataset using the metadata. 

And from the Pneunomia Chest X-Rays Dataset we have selected the equal number of **Normal Chest X-Rays** (196 Normal Chest X-Rays) and labelled them as **COVID Negative or Non COVID.** 

####**Data Split:**
Out of the total available 392 X-Rays (196-COVID, 196-Non COVID), 25% (49 X-Rays) are separated for Validation and the remaining 75% (147 X-Rays) are used for Training the model.

Seperated them in Train and Validation Datasets and organised in folders as follows:
- Dataset
  - Train
    - COVID
    - Normal
  - Validation
    - COVID
    - Normal

    and uploaded them to Dropbox for future use. The link of the seggregated data set is given [here](https://www.dropbox.com/s/tlsdn617iymz3bf/CovidDataset.zip) in `.zip` format.


In [None]:
# Downloading the dataset zip file from Dropbox
if (not os.path.exists("CovidDataset.zip")):
  # Download the dataset file from dropbox
  !wget https://www.dropbox.com/s/tlsdn617iymz3bf/CovidDataset.zip
if (not os.path.exists("Dataset/")):
  # Unzip the filed
  !unzip CovidDataset.zip

In [None]:
# Declaring the path variables
TRAIN_PATH = "DataSet/Train"
VAL_PATH = "DataSet/Val"

###**Building a CNN based model in Keras:**

In this project implementation we have used Convolutional Neural Networks (CNN or CovnNet), a complex feed forward neural networks for image classification with high accuracy. The CNN follows a hierarchical model which works on building a network, like a funnel which finally gives out a fully-connected layer where all the neurons are connected to each other and the output is processed. 
<br>
<br>
![CNN](https://www.researchgate.net/profile/Md-Mahin/publication/332407214/figure/fig2/AS:747438860156930@1555214719430/Proposed-adopted-Convolutional-Neural-Network-CNN-model.png)
<br>
<br>
We have added all the layers in a Sequential model. As we are classifying images we used `Conv2D` Layers stacked upon eachother.

**Conv2D** is a 2D Convolution Layer which creates a convolution kernel that is  with layers input which helps produce a tensor of outputs. In image processing kernel is a convolution matrix or masks which can be used for blurring, sharpening, embossing, edge detection, and more by doing a convolution between a kernel and an image. In this we use the appropriate number of filters which are to be obtained from the image. It is always in powers of 2. 

**MaxPooling2D** layers are used to reduce the dimensions of the feature maps as it reduces the number of parameters to learn and the amount of computation performed in the network. The pooling layer summarises the features present in a region of the feature map generated by a convolution layer

**Dropout** layers are added to prevent a model from overfitting and increasing the time and space complexity. This happens due to the co-adoptation of individual neurons in NN. Dropout works by randomly setting the outgoing edges of hidden units (neurons that make up hidden layers) to 0 at each update of the training phase. The dropout rate is between 0 and 1.

**Flatten** layer converts the pooled feature map to a single column that is passed to the fully connected layer. For example, if flatten is applied to layer having input shape as (batch_size, 2,2), then the output shape of the layer will be (batch_size, 4).

**Dense** layer connects to each neuron in the previous layer and recieves inputs from all the neurons. Dense layers adds an interesting non-linearity property by modelling any mathematical function. The dense layer is found to be the most commonly used layer in the models.

The input layer of the model is a `Cov2D` layer with `32 filters`, kernel size of `(3, 3)`, input shape of `(224, 224, 3)` and activation function `'relu'`.

Then a stack of `Conv2D` layer, `MaxPooling2D` Layer and a `Dropout` layers are created for feature extraction. 

We have used `sigmoid` activation function in the output layer as we have to predict whether it is a COVID or Non-COVID X-Ray which is a binary classification problem. 

In [None]:
# Build CNN Based Model in Keras
model = Sequential()
model.add(Conv2D(32, kernel_size = (3,3), activation = 'relu', input_shape = (224, 224, 3)))
  
model.add(Conv2D(64, (3, 3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(128, (3, 3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'sigmoid'))

###**Compiling the Model:**

After successfully creating the model, we have to compile the model with three parameters like `loss`, `optimizer`, and `metrics`. 

- We use `binary_crossentropy` as we are working on binary classification problem. 
- Among the optimizers like `adam`, `adagrad`, `sgd`, and `rmsprop`, we have selected the `adam` optimizer with its default learning rate (0.001). 
- We have taken `accuracy` from metrics. 

In [None]:
# Compile the model
model.compile(loss = "binary_crossentropy", 
              optimizer = 'adam', 
              metrics = ['accuracy'])

In [None]:
# Summary of the Model
model.summary()

###**Processing the Training and Testing Images:**

As we are having less number of images we have to apply some Data Augmentation techniques for making our training process more effecient and time saving too. For this we can use `ImageDataGenerator` which can rescale, flip, shrink and apply many more transformations on our images to make the network learn better. 

Then we create train generator and validation generator with batch size of 32 images. 

In [None]:
# Processing the images
train_datagen = image.ImageDataGenerator(
    rescale = 1/255.,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True
)

val_datagen = image.ImageDataGenerator(rescale = 1/255.)

In [None]:
train_generator = train_datagen.flow_from_directory(
    'Dataset/Train',
    target_size = (224, 224),
    batch_size = 32,
    class_mode = 'binary'
)

In [None]:
val_generator = val_datagen.flow_from_directory(
    'Dataset/Val',
    target_size = (224, 224),
    batch_size = 32,
    class_mode = 'binary'  
)

###**Training the Model:**



In [None]:
history = model.fit(
    train_generator,
    steps_per_epoch = 8,
    epochs = 20,
    validation_data = val_generator,
    validation_steps = 2
)

###**Evaluating the Model"**

In [None]:
# Evaluate the model
model.evaluate(val_generator)

In [None]:
train_acc = round(history.history['accuracy'][-1], 2) * 100
val_acc = round(history.history['val_accuracy'][-1], 2) * 100

In [None]:
print(f"The Training accuracy is {train_acc}%")
print(f"The validation accuracy is {val_acc}%")

###**Saving the Model:**

In [None]:
model.save(f"cnn-cxr-acc-{val_acc}_bs-{batch_size}_epochs-{epochs}.h5")
print(f"Model saved with {val_acc} % accuracy.")

###**Creating a Classification Method:**

In [None]:
def classify_cxr(img_path):
  # Dimensions of input image
  img_width, img_height = 224, 224

  # Load the saved model
  model = load_model('cnn-cxr-acc-98.44_bs-32_epochs-20.h5')
  model.compile(loss='binary_crossentropy',
                optimizer='adam',
                metrics=['accuracy'])

  # Loading the image and reshaping it
  img = image.load_img(img_path, target_size = (img_width, img_height))
  x = image.img_to_array(img)
  x = np.expand_dims(x, axis=0)
  images = np.vstack([x])
  # Predicting on X-Ray images
  predict_class = model.predict(images)
  # Returns the predicted class either 1 or 0
  return predict_class[0][0]

# Selecting images from our Dataset
img_base_path = "Dataset/Train/Normal/"
img_names = os.listdir(img_base_path)
img_path = img_base_path + img_names[10]

# Obtaining the classification results
classified = classify_cxr(img_path)

# Printing out the Resuls
if (int(classified) == 0):
  print(f"Reslut: COVID Chest XRay")
else:
  print(f"Reslut: Non-COVID Chest XRay")

In [None]:
!pip freeze > requirements.txt