Skip to content

frankfletcher/Pneumonia-XRay-Differentiation-from-Kaggle-Dataset

Repository files navigation

Pneumonia-XRay-Differentiation-from-Kaggle-Dataset


DATASET

The dataset for this notebook is available here: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

The data contains chest x-ray images with 3 different labels

  1. NORMAL
  2. BACTERIAL (pneumonia)
  3. VIRAL (pneumonia)

IMPORTANT NOTE

Nothing of this project has clinical significance without clinical trials. These models should never be used diagnostically.


TASK

The original task was to differentiate between Normal and Pneumonia. This turned out to be quite simple to accomplish.
I chose the harder task of categorizing between Bacterial pneumonia and Viral pneumonia.


CHALLENGES

Several challenges presented themselves. The first challenge is the lack of data. The training set I used contained the following number of images per categegory:

len(bac_fnames) : 2772 len(viral_fnames) : 1493

The delta was 1279 images, or slightly more than half the larger set. I used oversampling to balance the dataset.
(See A systematic study of the class imbalance problem in convolutional neural networks by Buda, et al (arXiv:1710.05381v2) for why oversampling is often the best method for correcting imbalance issues.)

The second challenge was overfitting. I used a Resnet-50 architecture which tended to favor the training set over the validation set.
To overcome this, we applied a standard set of augmentation transforms provided by FastAI, with the exception of flipping (due to the need to differentiate the left and right lungs)


CURRENT RESULTS

At this point, I am only able to get about 83% Accuracy


FUTURE PROJECTS

Recombine the NORMAL category.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published