Welcome to this notebook on chest X-ray image classification using the fastai library. This notebook aims to provide a step-by-step guide on how to build and train a deep learning model to classify chest X-ray images into three categories: normal, bacterial pneumonia, and viral pneumonia. Accurate classification of chest X-ray images is crucial for early diagnosis and treatment of various respiratory diseases.

In this notebook, we will use the fastai library, which is built on top of PyTorch, to construct our deep learning model. We will start by exploring the data and then pre-processing it using various data augmentation techniques to improve the performance of our model. We will then construct a deep learning model using a pre-trained convolutional neural network (CNN) and fine-tune it on our dataset. Finally, we will evaluate the performance of our model and visualize the results using a confusion matrix.

This notebook assumes that you have some basic knowledge of deep learning and the Python programming language. However, we will provide explanations and examples for each step to make it accessible to beginners. So, let's get started!

## **Set Up**
Let's import all necessary libraries from Fastai, a popular open-source deep learning library. Fastai has been developed on top of PyTorch, another popular deep learning library, to simplify and streamline the process of building and training deep learning models.

In [None]:
!pip install --upgrade fastai
from fastai.vision.all import *

## **Loading the data**
First, we'll load the data from our Kaggle dataset using the Path and labeller functions from the Fastai library. We'll use the labeller function to label each image as either normal, bacterial pneumonia, or viral pneumonia based on the file path.

In [None]:
path = Path('/kaggle/input/chest-xray-pneumonia/chest_xray/')
def labeller(file_path):
    if 'virus' in file_path.name:
        return 'viral'
    elif 'bacteria' in file_path.name:
        return 'bacterial'
    else:
        return 'normal'


 

## **Creating the DataBlock**
Next, we'll create a DataBlock object to prepare the data for our neural network, this will be used to load and transform the images:

* blocks: a tuple containing the types of data blocks we will be working with, in this case an ImageBlock for our images and a CategoryBlock for our labels
* get_items: a function that gets the image files from the path defined in the previous cell
* splitter: a RandomSplitter that randomly splits the data into training and validation sets with a 20% validation set size and a random seed of 42
* batch_tfms: a list of data augmentations that will be applied to our images during training, in this case including random flips, rotations, and zooms, as well as normalization based on the ImageNet statistics.
* get_y: the labeller function defined in the previous cell that will be used to label the images.
* item_tfms: a resizing transformation that will resize all images to 128x128 pixels.

In [None]:
tfms = aug_transforms(do_flip=True, flip_vert=True, max_rotate=10.0, max_zoom=1.1)

dblock = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter= RandomSplitter(valid_pct=0.2, seed=42),
    batch_tfms=[*tfms, Normalize.from_stats(*imagenet_stats)],
    get_y=labeller,
    item_tfms=Resize(128))

dls = dblock.dataloaders(path)

## **Visualize and verify**
Now, let's visualize the images and verify that they have been loaded correctly. The one_batch() function returns a mini-batch of size bs (batch size), which is by default 64. The loaded images are then displayed using the show_batch() function, which takes the loaded batch as input and display a grid of images.

In [None]:
batch = dls.train.one_batch()
dls.train.show_batch(b=batch)

## **Training the Model**
After setting up our data with the appropriate transformations, we can now proceed with training our model. we will fist initialize our model using the vision_learner function, specify the architecture we want to use (resnet34) and the metrics we want to monitor during training (accuracy and error_rate).

Now, we will use the fine_tune method to fine-tune the pre-trained ResNet34 model on our pneumonia chest X-ray dataset. Fine-tuning involves training only the last few layers of the model, which are specific to our classification task, while keeping the earlier layers frozen. This approach allows us to leverage the pre-trained weights of the model to improve its performance on our specific task, while avoiding overfitting on our relatively small dataset.

We then set an early stopping callback fuinction, which monitors the validation loss and stops the training if there is no improvement in the validation loss after a certain number of epochs (patience=2 in this case).

During training, the loss and accuracy are displayed for each epoch, along with the duration of the epoch. Once training is complete, the model is saved to disk for future use.

In [None]:
learn = vision_learner(dls, resnet34, metrics=[accuracy, error_rate], loss_func=nn.CrossEntropyLoss())
learn.fine_tune(10, cbs=[EarlyStoppingCallback(patience=2)])

## **Evaluating the Model**
Finally, we'll evaluate the performance of our neural network by creating a confusion matrix using the ClassificationInterpretation function.
We can visualize the model's performance on the validation set. This allows us to see how well the model is able to classify images from each class and can help identify areas where the model is struggling.

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()