# Exercise - Prediction of medicial conditions

In today's exercise we will deal with an illness you might have heard of: Covid19. We will try to diagnose the condition the patient is in by classifying X-ray-images by either pneumonia, Covid-19 or no illness. There is an archive called **Covid19-dataset.zip**, which should be **unpacked in the same folder as this exercise** for the import to work without problems. The unpacked folder should be called **Covid19-dataset** and contain a **train** and a **test** subfolder.

In [1]:
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.preprocessing.image import ImageDataGenerator

from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import os

2024-07-12 18:07:23.012130: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1


### Covid19 X-ray image dataset

First we will handle the import of the data. As we haven't dealed with this so far, I will provide you with a solution for this problem using the Tensorflow ImageDataGenerator, which will provide a constant stream of images as we need them. Some of the data handling might be different for this generator, but this will be pointed out. This also provides us with some handy tools to generate data from data, as we can manipulate the training-images by zooming, rotating and moving it up and down, which will make the model more transferable and work against overfitting.

In [2]:
TRAIN_PATH = os.path.abspath("Covid19-dataset/train")
TEST_PATH = os.path.abspath("Covid19-dataset/test")
BATCH_SIZE = 8

training_data_generator = ImageDataGenerator(
    rescale=1.0/256, 
    zoom_range=0.2, 
    rotation_range=15, 
    width_shift_range=0.05, 
    height_shift_range=0.05
)
test_data_generator = ImageDataGenerator(
    rescale=1.0/256
)

training_iterator = training_data_generator.flow_from_directory(
    TRAIN_PATH, 
    class_mode="categorical",
    color_mode="grayscale", 
    target_size=(256,256), 
    batch_size=BATCH_SIZE,
    shuffle=False)
test_iterator = test_data_generator.flow_from_directory(
    TEST_PATH, 
    class_mode="categorical", 
    color_mode="grayscale", 
    target_size=(256,256),
    batch_size=BATCH_SIZE,
    shuffle=False)

FileNotFoundError: [Errno 2] No such file or directory: '/srv/nfs/home/dhoffmann/git/ml4chem/4_NeuralNet/Covid19-dataset/train'

Here are some example pictures

In [None]:
fig, axs = plt.subplots(1, 3)
plt.subplots_adjust(top=1.5,wspace=0.3)

for i, condition_num in enumerate([("Normal", 1), ("Pneumonia", 0), ("Covid", 2)]):
    condition, num = condition_num
    ax = axs[i]
    img_path = os.path.abspath(f"Covid19-dataset/train/{condition}/{num}.png")
    img = mpimg.imread(img_path)
    ax.imshow(img, cmap=plt.get_cmap('gray'))
    ax.set_title(f"{condition}")
plt.show()

### Creating the model

The images are scaled to a size of 256x256 pixels and will have one color-channel(greyscale). Our first layer of our Sequential model will therefore be an InputLayer with an input_shape of (256,256,1). Our next layers for convolutional operations are supposed to be:
- a Conv2D-layer with 3 filters, a 3x3 filter-size, a stride of 1 and a relu-activation-function
- a MaxPooling2D-layer with a pool-size and a stride of 3x3
- a Conv2D-layer with 3 filters, a 3x3 filter-size, a stride of 1 and a relu-activation-function
- a MaxPooling2D-layer with a pool-size and a stride of 3x3
- a Flatten-layer to switch to a regular fully-connected network

The fully-connected part should have:
- a Dense-layer with 50 neurons and a relu-activation
- a Dense-layer with 20 neurons and a relu-activation
- a Dense-layer with an appropiate amount of neurons and a suitable activation-function for our 3 class classification-problem

Print a summary of the model.


Create a Sequential model called `model` with the specified layers to pass the test cell. Only the specifications of the last layer are checked, as this is the most important one and responsible for results, that make sense.



In [None]:
model = None

In [None]:
assert isinstance(model, keras.src.engine.sequential.Sequential), "The variable model should be assigned to a Sequential model"
assert len(model.layers) == 8, "The model should have 8 layers without the InputLayer"

target_layers = [
    (keras.src.layers.convolutional.conv2d.Conv2D, "Conv2D"),
    (keras.src.layers.pooling.max_pooling2d.MaxPooling2D, "MaxPooling2D"),
    (keras.src.layers.convolutional.conv2d.Conv2D, "Conv2D"),
    (keras.src.layers.pooling.max_pooling2d.MaxPooling2D, "MaxPooling2D"),
    (keras.src.layers.reshaping.flatten.Flatten, "Flatten"),
    (keras.src.layers.core.dense.Dense, "Dense"),
    (keras.src.layers.core.dense.Dense, "Dense"),
    (keras.src.layers.core.dense.Dense, "Dense"),
]

for i in range(len(model.layers)):
    assert isinstance(model.layers[i], target_layers[i][0]), f"The {i}th layer should be a {target_layers[i][0]} layer"

assert model.layers[-1].get_config()["units"] == 3, "The final number of neurons should be 3, as we have 3 classes (Normal, Pneumonia, Covid)."
assert model.layers[-1].get_config()["activation"] == "softmax", "The final activation function should be the softmax function."


Compile the model with the Adam-optimizer with a learning_rate of 0.0005, CategoricalCrossentropy-loss and keep track of the accuracy metric to pass the test cell.

In [None]:
assert model.optimizer is not None and model.loss is not None, "The model is not compiled yet"
assert isinstance(model.loss, keras.src.losses.CategoricalCrossentropy) or model.loss=="categorical_crossentropy", "The Loss should be the categorical cross entropy loss"
assert isinstance(model.optimizer, keras.src.optimizers.adam.Adam), "The optimizer should be the Adam optimizer"
assert np.isclose(0.0005, model.optimizer.learning_rate.numpy()), "The Adam optimizer should have a learning rate of 0.0005"

### Training the model

Start the training! It should last for 40 epochs. As you supply the training_iterator as training data, you don't have to specify the output-data. Also the batch_size is determined by the iterator and shuffling the data is not possible. We won't give any validation data. If you want, turn the verbosity to 0 to supress the information about the proceeding of the training or 2 to return minimal information each epoch.

Make sure to save the history-object to a variable called history and to use 40 epochs to pass the test cell.

In [None]:
history = None

In [None]:
assert len(history.history["loss"]) == 40, "You should use 40 epochs for now"
assert history.history["accuracy"][-1] > 0.75, "The training does not appear to be particularly successful as the accuracy is low"

### Evaluating the model

Plot the loss and the accuarcy during the training. How is the convergence?

Figure out the loss and the accuracy for the test-set with your trained model

Finally plot the confusion-matrix for the predictions.

Hint 1: You can get the true classes of your test_iterator by accessing test_iterator.classes

Hint 2: Which number belongs to which class is saved in test_iterator.class_indices

That's it from our side. We hope you could get some insight into Machine Learing methods and could pick up some Python!