<a href="https://colab.research.google.com/github/google/applied-machine-learning-intensive/blob/master/content/05_deep_learning/05_image_classification_project/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2020 Google LLC.

In [None]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Image Classification Project

In this project we will build an image classification model and use the model to identify if the lungs pictured indicate that the patient has pneumonia. The outcome of the model will be true or false for each image.

The [data is hosted on Kaggle](https://www.kaggle.com/rob717/pneumonia-dataset) and consists of 5,863 x-ray images. Each image is classified as 'pneumonia' or 'normal'.

## Ethical Considerations

We will frame the problem as:

> *A hospital is having issues correctly diagnosing patients with pneumonia. Their current solution is to have two trained technicians examine every patient scan. Unfortunately, there are many times when two technicians are not available, and the scans have to wait for multiple days to be interpreted.*
>
> *They hope to fix this issue by creating a model that can identify if a patient has pneumonia. They will have one technician and the model both examine the scans and make a prediction. If the two agree, then the diagnosis is accepted. If the two disagree, then a second technician is brought in to provide their analysis and break the tie.*

Discuss some of the ethical considerations of building and using this model. 

* Consider potential bias in the data that we have been provided. 
* Should this model err toward precision or accuracy?
* What are the implications of massively over-classifying patients as having pneumonia?
* What are the implications of massively under-classifying patients as having pneumonia?
* Are there any concerns with having only one technician make the initial call?

The questions above are prompts. Feel free to bring in other considerations you might have.

### **Student Solution**

> *The model could hold bias towards smokers and people of older age. The model should err towards accuracy instead of precision. The reason why is because the more accurate that the model is of detecting patients with pneumonia.
The main concern with having only one technician make the initial is that there leaves room for error on the technician. Due to possibilities such as age, experience, and bias.*

---

## Modeling

In this section of the lab, you will build, train, test, and validate a model or models. The data is the ["Detecting Pneumonia" dataset](https://www.kaggle.com/rob717/pneumonia-dataset). You will build a binary classifier that determines if an x-ray image has pneumonia or not.

You'll need to:

* Download the dataset
* Perform EDA on the dataset
* Build a model that can classify the data
* Train the model using the training portion of the dataset. (It is already split out.)
* Test at least three different models or model configurations using the testing portion of the dataset. This step can include changing model types, adding and removing layers or nodes from a neural network, or any other parameter tuning that you find potentially useful. Score the model (using accuracy, precision, recall, F1, or some other relevant score(s)) for each configuration.
* After finding the "best" model and parameters, use the validation portion of the dataset to perform one final sanity check by scoring the model once more with the hold-out data.
* If you train a neural network (or other model that you can get epoch-per-epoch performance), graph that performance over each epoch.

Explain your work!

> *Note: You'll likely want to [enable GPU in this lab](https://colab.research.google.com/notebooks/gpu.ipynb) if it is not already enabled.*

If you get to a working solution you're happy with and want another challenge, you'll find pre-trained models on the [landing page of the dataset](https://www.kaggle.com/paultimothymooney/detecting-pneumonia-in-x-ray-images). Try to load one of those and see how it compares to your best model.

Use as many text and code cells as you need to for your solution.

### **Student Solution**

In [None]:
import matplotlib.pyplot as plt
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator
import numpy as np
from sklearn.linear_model import LogisticRegression

In [None]:
# Download the dataset
! chmod 600 kaggle.json && (ls ~/.kaggle 2>/dev/null || mkdir ~/.kaggle) && cp kaggle.json ~/.kaggle/ && echo 'Done'
! kaggle datasets download paultimothymooney/chest-xray-pneumonia
! unzip chest-xray-pneumonia.zip

# create a data generator
datagen = ImageDataGenerator()

# load and iterate training dataset
train_it = datagen.flow_from_directory('chest_xray/train/', class_mode='binary', 
                                       batch_size=32, shuffle = True, target_size=(256, 256))

# load and iterate validation dataset
# val_it = datagen.flow_from_directory('chest_xray/val/', class_mode='binary', batch_size=64)

# load and iterate test dataset
test_it = datagen.flow_from_directory('chest_xray/test/', class_mode='binary', target_size=(256, 256))

In [None]:
# EDA
print(np.array(train_it[0][0]).shape) # A batch of training data
# We can see our data is of size 256x256
print(train_it[0][1]) # That batch's corresponding labels

# Let's look at some samples with and without pneumonia
for i in range(5):
  if train_it[0][1][i] == 1:
    print("With Pneumonia: ")
    plt.imshow(train_it[0][0][i].astype(int))
    plt.show()
  else:
    print("Without Pneumonia: ")
    plt.imshow(train_it[0][0][i].astype(int))
    plt.show()


In [None]:
# Model 1

# This is a fairly large network that follows each convolutional layer with a
# Max pooling layer
model1 = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu',
                           input_shape=(256, 256, 3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='softmax')
])

model1.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])


# Training
history1 = model1.fit(
    train_it,
    epochs=3,
)

# Scoring 
print("model 1", model1.evaluate(test_it))


In [None]:
# Model 2

# The structure of this model is pretty similar to the first model,
# but I removed a convolutional layer along with a max pooling layer
# I also swapped out most of the activation functions for the sigmoid function
model2 = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='sigmoid',
                           input_shape=(256, 256, 3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='sigmoid'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='sigmoid'),
    tf.keras.layers.Dense(1, activation='softmax')
])


model2.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Training
history2 = model2.fit(
    train_it,
    epochs=3,
)

# Scoring
print("model 2", model2.evaluate(test_it))

In [None]:
# Model 3

# For this model, I decided to go with a logistic regression. I flatten out each
# of the images in order to do so.
X_train = []
X_test = []
for i in range(32):
  X_train.append(train_it[0][0][i].flatten())
  if i < 32:
    X_test.append(test_it[0][0][i].flatten())
y_train = np.array(train_it[0][1])
y_test = np.array(test_it[0][1])

# Training
model3 = LogisticRegression(random_state=0).fit(X_train, y_train)

# Scoring
model3.score(X_test,y_test)

In [None]:
# Graphs

# Model 1 graphs
plt.plot(list(range(len(history1.history['accuracy']))), history1.history['accuracy'])
plt.title(label='accuracy')
plt.show()
plt.plot(list(range(len(history1.history['loss']))), history1.history['loss'])
plt.title(label='loss')
plt.show()

# Model 2 graphs
plt.plot(list(range(len(history2.history['accuracy']))), history2.history['accuracy'])
plt.title(label='accuracy')
plt.show()
plt.plot(list(range(len(history2.history['loss']))), history2.history['loss'])
plt.title(label='loss')
plt.show()

As you can see, all three models actually performed pretty similarly. Surpirsingly, in terms of accuracy the third model which was just a logistic regression performed the best. For me, it gave an accuracy of 81.25%, whereas the other two models gave an accuracy in the 60s.

In [None]:
# Pre-trained model

Xception = tf.keras.applications.Xception(
    include_top=False,
    weights="imagenet",
    input_shape=(256,256,3),
    classifier_activation="softmax",
)

model = tf.keras.Sequential([
  Xception,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(1, activation='sigmoid'),
])

model.compile(
  loss=tf.keras.losses.BinaryCrossentropy(),
  metrics=['accuracy']
)

# Training
history = model.fit(
    train_it,
    epochs=3,
)

# Scoring
print("Xception", model.evaluate(test_it))

In [None]:
# pre-trained model graphs

plt.plot(list(range(len(history.history['accuracy']))), history.history['accuracy'])
plt.title(label='accuracy')
plt.show()
plt.plot(list(range(len(history.history['loss']))), history.history['loss'], label='loss')
plt.title(label='loss')
plt.show()

The pre-trained model actually overfit like crazy, but it was still pretty decent. I was getting training accuracies in the high 90%s, but my final testing accuracy was only 82%.

---