Visit my blog for a more in depth tutorial on how to create a Machine Learning/AI that predicts pneumonia given a lung x-ray image.
- Kaggle CoronaHack
- GitHub COVID Dataset
- GoogleDrive(contains combined datasets, pickles, and csv with image paths/labels)
All of the train, test, and pickle files can be downloaded from my GoogleDrive
After splitting the train and test set, the class frequencies were as follows:
The model began overfitting at about 30 epochs and had F1, AUC, and ROC scores of .935, .972, and .994 respectively. Code
To test the model on outside data, I randomly gathered 17 images from Google and used a probability threshold of .65. As you can see below, the model had a 100% True Positive rate and 56% True Negative Rate. If you download the code in my repo, there is a folder where you can try a prediction yourself. All you have to do is download the image, name the file normal or non-normal to remember its label, then run the code
- Classify the type of Pneumonia (Viral, Fungal, etc.)
- Use grey-scale (1D array)
- Apply SMOTE to alter class imbalances
- Use larger images with dimensions of (96,96,3) or (204, 204,3). I initially tried using larger dimensions but my computer was too low. Having larger images can make detection more accurate as there is more detail than the latter.