<a href="https://colab.research.google.com/github/txusser/Master_IA_Sanidad/blob/main/Modulo_2/2_3_5_Proyecto_COVID_Tensorflow_Keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# COVID-19 Patient Classification

In this guided project, we will implement a deep neural network for the classification or diagnosis of patients suspected of COVID-19 infection using medical imaging data.


## Modify Execution Environment and Select GPU Support

Before starting to write code, we will modify the Google Colab execution environment to work with GPU hardware. To do this, go to the 'Runtime' menu, select the option 'Change runtime type,' and in the panel that opens, choose GPU from the 'Hardware accelerator' dropdown.


# A) Dataset Import

We load the medical images after downloading them from [this link](https://drive.google.com/file/d/1C6nEoNFr8PmqEddHHYGnXGAWNynz22qm/view?usp=sharing) in the provided .zip file. We will use the left-hand panel to make them available on the virtual machine that will execute this notebook. 

The Data folder contains over two thousand 2D chest X-ray images in jpg format.


Options for Loading the Data:

1. If you have a Google Drive account, mount the drive and load the Data folder from there.

In [None]:
from google.colab import drive
drive.mount('/content/drive')
# The base directory with the Data folder in Google Drive
base_dir = '/content/drive/MyDrive/Documentos/Master IA/Datasets/Datos'

2. Upload the .zip file and unzip it in the VM

In [None]:
!unzip "/content/Datos.zip" -d "/content/Datos/"
base_dir = '/content/datos

# B) Importing Necessary Libraries
We load the Sequential library for configuring the network composed of convolutional layers, 2D max pooling layers, dropout layers, and flatten and dense layers.


In [None]:
from tensorflow.keras.models import Sequential  # Base class for building the network layers
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense
# Conv2D -> Used for edge detection and sharpening image definition
# Pooling -> Reduces data size and image dimensionality
# Dropout -> Controls model overfitting
# Flatten -> Converts the feature matrix into a 1D vector
# Dense -> Connects the feature vector with the input data vector, used for label predictions
from tensorflow.keras.optimizers import Adam
# Adam will be used as the optimizer
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Allows for data augmentation tasks
import numpy as np
import matplotlib.pyplot as plt

print("\n - A) Imported necessary libraries\n")


In [None]:
# Define the paths to the folders containing training and validation images
import os
import os.path as op

# Base directory with 'train' and 'test' folders containing images 
# for training and validating the model
base_dir = '/content/datasets/Data'
train_dir = op.join(base_dir, 'train')
test_dir = op.join(base_dir, 'test')

# Within each set, there are two subdirectories with chest X-rays:
# one for COVID and one for Normal cases
train_covid_dir = op.join(train_dir, 'COVID19')
train_normal_dir = op.join(train_dir, 'NORMAL')
test_covid_dir = op.join(test_dir, 'COVID19')
test_normal_dir = op.join(test_dir, 'NORMAL')


In [None]:
# Preview the images
# Training set
train_covid_names = sorted(os.listdir(train_covid_dir))
print("\n - First 10 training images (COVID):", train_covid_names[0:10])
train_normal_names = sorted(os.listdir(train_normal_dir))
print("\n - First 10 training images (NORMAL):", train_normal_names[0:10])

# Validation set
test_covid_names = sorted(os.listdir(test_covid_dir))
print("\n - First 10 validation images (COVID):", test_covid_names[0:10])
test_normal_names = sorted(os.listdir(test_normal_dir))
print("\n - First 10 validation images (NORMAL):", test_normal_names[0:10])


In [None]:
# How many images are in our datasets
print(" => Images in the training set:", len(train_covid_names) + len(train_normal_names))
print(" => Images in the validation set:", len(test_covid_names) + len(test_normal_names))


# C) Viewing the images

In [None]:
# Let's visualize some images from the dataset on a 4x4 grid
import matplotlib.gridspec as gridspec
import matplotlib.image as mpimg

fig = plt.figure(1, figsize=(12, 12))
gs = gridspec.GridSpec(4, 4, figure=fig)
covid_pics = [op.join(train_covid_dir, filename) for filename in train_covid_names[0:8]]
normal_pics = [op.join(train_normal_dir, filename) for filename in train_normal_names[0:8]]
merger_pics = covid_pics + normal_pics
for i, pic_path in enumerate(merger_pics):
  pic_name = op.basename(merger_pics[i])
  ax = fig.add_subplot(gs[i])
  pic_data = mpimg.imread(pic_path)
  ax.imshow(pic_data, cmap='gray')
  ax.set_title(pic_name)
plt.show()


# D) Pre-processing and Data Augmentation

In [None]:
# Generate training, evaluation, and validation batches
dgen_train = ImageDataGenerator(rescale=1./255,
                                validation_split=0.2,
                                zoom_range=0.2,
                                horizontal_flip=True)
dgen_test = ImageDataGenerator(rescale=1./255)
dgen_val = ImageDataGenerator(rescale=1./255)

# Apply rescaling for normalization (0-1 range) and data augmentation on the training set
# zoom_range -> maximum percentage of zoom applied to the image
# horizontal_flip -> apply horizontal flipping to the image

# Create batch generators for training (80%) and validation (20%)
train_generator = dgen_train.flow_from_directory(train_dir,
                                                 target_size=(150, 150),
                                                 subset='training',
                                                 batch_size=32,
                                                 class_mode='binary')
# Target size -> resize images to fit a 150x150 pixel frame
# Mode -> specifies these data are for training
# batch_size -> number of images loaded into memory per step
# class_mode -> 'binary' for binary classification (COVID / Normal), 'categorical' for multi-class labels

val_generator = dgen_train.flow_from_directory(train_dir,
                                               target_size=(150, 150),
                                               subset='validation',
                                               batch_size=32,
                                               class_mode='binary')

test_generator = dgen_train.flow_from_directory(test_dir,
                                                target_size=(150, 150),
                                                batch_size=32,
                                                class_mode='binary')


In [None]:
# As we know, the two classes of our problem are COVID and Normal, let's see it:
train_generator.class_indices

In [None]:
# Let's check the sample size for model training
train_generator.image_shape
# The image size is 150x150, and the third dimension indicates that the images are in RGB format,
# where the color of each pixel is a combination of red, green, and blue.


# E) Construction of the Convolutional Neural Network

In [None]:
# At this point, we can define our convolutional neural network (CNN) model
# that will learn from the grouped data we processed earlier.
# We will build the model by adding layers to an instance of the Sequential class.

model = Sequential()

# The first layer used to extract features from the image is a convolutional layer 
# that applies filters formed by small squares mapping the input image.
# We will select 32 features to extract.
model.add(Conv2D(filters=32, kernel_size=(5, 5), padding='SAME', activation='relu', input_shape=(150, 150, 3))) 
# Add the Pooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))
# Add a Dropout layer to reduce overfitting
model.add(Dropout(0.5))

# Add a second convolutional layer
model.add(Conv2D(filters=64, kernel_size=(5, 5), padding='SAME', activation='relu'))
# Add subsequent Pooling and Dropout layers
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

# Now, add the Flatten layer to transform the tensor into a 1D vector
model.add(Flatten())

# Add a densely connected (Dense) layer specifying the number of nodes and the activation function
model.add(Dense(256, activation='relu'))

# Add another Dropout operation to halve the number of nodes
model.add(Dropout(0.5))

# Finally, connect the nodes to create the output with a single node.
# Since this is a classification problem, we use the sigmoid activation function.
model.add(Dense(1, activation='sigmoid')) 

# Model summary
print("Model Summary:\n", model.summary())
# In this summary, the size of the first tensor has the value None because it refers
# to the batch size dimension, which can take any value for flexibility.

# After applying the convolutional layer, we get a tensor with dimensions equal to 
# half the size of the input image and with 32 features in the z-axis.


# F) Compile and Train the Model

In [None]:
# To compile our CNN model we need to:
# 1) Define the optimization method (Adam)
# 2) Set the learning rate value
# 3) Choose the loss function: binary cross-entropy is a good choice for binary classification tasks
# 4) Set the evaluation metric: we will use accuracy
model.compile(Adam(lr=0.001), loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
# Once the model is compiled, we can start the training process
history = model.fit(train_generator,
                    epochs=30,
                    validation_data=val_generator)
# The history object records the progress during training, capturing the loss function value
# and the evaluation metric at each step of the process

# G) Performance evaluation

In [None]:
# Let's see that the values of loss and accuracy have been stored in history
history.history.keys()

In [None]:
# To evaluate the model's performance, we plot the values of the metrics of interest as a function of the
# training epochs
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.legend(['loss (training)', 'loss (validation)'])
plt.title("Evolution of the loss function value")
plt.xlabel('epoch')

In [None]:
# A look at the evolution of accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['Accuracy (training)', 'Accuracy (validation)'])
plt.title("Evolution of accuracy value")
plt.xlabel('epoch')

In [None]:
# Finally, we validate the model on the evaluation sample
test_loss, test_accuracy = model.evaluate(test_generator)
# We simply need to evaluate the model we just trained on the dataset that we had
# reserved for evaluation
print("\n=> Results on the test set:")
print(" - Loss: {:.2f}, Accuracy: {:.2f} ".format(test_loss, test_accuracy))

# H) Results (predictions) on unseen data

In [None]:
# In this last step, we will evaluate our model on the entire set of chest X-ray images
# to obtain a result (prediction) for the patient: COVID infection or healthy
from google.colab import files
from keras.preprocessing import image
from PIL import Image
uploads = files.upload()
for filename in uploads.keys():
  img_path = '/content/' + filename
  img = image.load_img(img_path, target_size=(150, 150))
  data = image.img_to_array(img)
  data = np.expand_dims(data, axis=0)
  prediction = model.predict(data)
  print("\nX-ray Image:", filename)

  if prediction == 0:
    print(" => COVID-19 Detected")
  else:
    print(" => Normal Status")