# Can we diagnosis Covid, Pneumonia, or Normal Chest X Rays with computer vision?

I am going to use TensorFlow for this project.  

## What will this notebook cover?
I am going to go show you:
* EDA
* Prepreocessing
* Model Building
* Model Evaluation
* Model Enhancement
* Plotting our loss curves

## Get the data for your own models!
The data is from Kaggle.  Click [here](https://www.kaggle.com/tawsifurrahman/covid19-radiography-database) to view it!

Are you ready?  Let's go.

#1 Import the necessary libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import random

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.layers import Dense, Conv2D, Flatten, MaxPool2D
from keras.optimizers import Adam

import os
from google.colab import drive

#2 Import the data

> Note: The data will be redownloaded every time we run this.  It is automatic, however, it is good to know :).

In [21]:
from google.colab import drive
drive.mount("/content/gdrive/")

#This will connect to our kaggle.json document
os.environ["KAGGLE_CONFIG_DIR"] = "/content/"

#This will download the dataset directly into the colab 
!kaggle datasets download -d tawsifurrahman/covid19-radiography-database

#Navigate to the directory the data is in
os.chdir("/content/COVID-19_Radiography_Dataset/")

Drive already mounted at /content/gdrive/; to attempt to forcibly remount, call drive.mount("/content/gdrive/", force_remount=True).
covid19-radiography-database.zip: Skipping, found more recently modified local copy (use --force to force download)


# 3.1 Explore the data

* What type of data are we dealing with?
  - Images
* What are the shapes?
* Are the other ways for us to learn about the data?
  - Metadata files
* Are there any imbalances?

In [24]:
#Import all of the metadata files to view
covid_metadata = pd.read_excel("COVID.metadata.xlsx")
normal_metadata = pd.read_excel("Normal.metadata.xlsx")
viral_pneumonia_metadata = pd.read_excel("Viral Pneumonia.metadata.xlsx")

#What can we learn before we plot?
* We can see the first few rows of data to be certain our data is in.
  - `head()`
* We can see the last few rows to get a sense of the size of the data
  - `tail()`
* We can see the description of the dataframes too.
  - `describe()`
* We can see the info of the dataframe.
  - `info()`
* We can see the column names of the dataframe.
  - `.columns`
* We can see the data types of the columns
  - `dtypes()`
* We can see if we have missing values
  - `isna()`
* We can see the shape
  - `.shape`

## Let's get plot happy.

I am a big visual learner.  I learn by seeing and doing.  We are going to use a lot of graphs to become one with the data!

What type of graphs are we making here?
- Correlations (to help see which columns influence the other)
- Count plots (to help visualize the value counts)
- Distplots (to help see the data distribution and see if we have outliers)
- Violin plots (to help see the data distribution in a different light)


### Correlations

What are correlation plots?  Correlation plots are heatmaps of the data.  We can see what correlates to something else.  It is super helpful to have a feel for the data before we do any preprocessing and modeling experiments.

### Countplots

What are countplots?  In pandas, we have a function called `value_counts()` and this is a visualization of that.  What if you do not know about `value_counts()`?  Do not worry, in layman's terms it is a way to see if we have any imbalances.

## Distplots

What are distplots?  Distplots show us a histogram with a line curve over it.  This will show us the distribution of the data to see how many outliers we may be dealing with.

## Violin plots

What are violin plots?  They are similar to distplots in the sense they show us the distribution of data, but we do not have the histogram style.  It provides a new visual.  As I mentioned prior, I am a visual learner so having mutliple ways to see the same thing helps.

#3.2 Explore random images

In [25]:
def view_random_image(target_dir, target_class):
  """
  This is from a course I am taking on Zero to Mastery.
  Here is a link: https://academy.zerotomastery.io/p/learn-tensorflow
  """
  #Setup target directory
  target_folder = target_dir + target_class

  #Get a random image path
  random_image = random.sample(os.listdir(target_folder), 1)

  #Read in the image and plot it using matplotlib
  img = mpimg.imread(target_folder + "/" + random_image[0])
  plt.imshow(img)
  plt.title(target_class)
  plt.axis("off")

  print(f"Image shape: {img.shape}")

  return img

# Preprocessing

Let's make our ImageDataGenerators with no augmentation for now

In [None]:
ImageDataGenerator

# Models!

## Model 1 of 10

* This is the baseline.  Let's build it and then beat it with model 2!

## Model 2 of 10
* How can we help this?