# Imports

In [1]:
import numpy as np
from tensorflow.keras.utils import set_random_seed
import os

# assigning random seed for reproducebility was taken from: https://stackoverflow.com/questions/51249811/reproducible-results-in-tensorflow-with-tf-set-random-seed
seed = 42
os.environ['PYTHONHASHSEED']=str(seed)
np.random.seed(seed)
set_random_seed(seed)


import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from sklearn.model_selection import train_test_split

from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers

___________________________________________________________________________________________________________________________________________________________________________________________________
# Neural Network

Here we build a basic neural network to classify out images. The below code repurposed many aspects of previous projects I worked on with colleagues. Most of the neural network and metrics takes inspiration from [A plant disease classification project](https://github.com/DerikVo/DSI_project_4_plant_disease) and a single day [Hack-a-thon](https://github.com/DerikVo/NN_hackathon) to classify if an object was a hotdog or not a hotdog.

We opted to use a convolutional neural network because of its ability to capture important features by scanning through segments of an image. These features can be shapes and textures that distinguish the uniqueness of a type of tumor. Additionally these models can be used in transfer learning which will allow for more accuracy and less time spent on training the model. Furthermore, because pre-trained models are trained on a diverse set of data our model can be more robust to unseen data. 

In [2]:
# set the training and testing paths
training_folder_path = '../Images/Training'
testing_folder_path = '../Images/Testing'

In [3]:
# manually list out the class names
class_names = ['glioma', 'meningioma', 'notumor', 'pituitary']

In [4]:
datagen = ImageDataGenerator(validation_split=0.30)
# Get the training data
train_ds = datagen.flow_from_directory(
    training_folder_path,
    target_size=(256, 256),
    color_mode='grayscale',
    batch_size=32,
    classes=class_names,
    class_mode='categorical',
    subset='training',  # Set as training data
    seed=42
)

# Get the validation data
val_ds = datagen.flow_from_directory(
    training_folder_path,
    target_size=(256, 256),
    color_mode='grayscale',
    batch_size=32,
    classes=class_names,
    class_mode='categorical',
    subset='validation',  # Set as validation data
    seed=42,
    shuffle=False
)

# Get the test data
test_ds = datagen.flow_from_directory(
    testing_folder_path,
    target_size=(256, 256),
    color_mode='grayscale',
    class_mode='categorical',
    seed=42,
    shuffle=False
)


Found 4000 images belonging to 4 classes.
Found 1712 images belonging to 4 classes.
Found 1311 images belonging to 4 classes.


## First model
This model uses many aspects of a prior project for [plant disease classification](https://github.com/DerikVo/DSI_project_4_plant_disease/tree/main)

In [5]:
early_stopping = EarlyStopping(patience=5)

In [6]:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(256, 256, 1)))
model.add(MaxPooling2D((2, 2)))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

model.add(Flatten())

model.add(Dense(128, activation='relu'))
model.add(Dense(4, activation='softmax'))

In [7]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [8]:
history = model.fit(train_ds, validation_data=val_ds, epochs=10, callbacks=[early_stopping])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10


In [9]:
model.save('../Models/CNN_base.h5')

### Interpretation:
Here we see that our training accuracy is about 99% while our validation is 74% which suggest our model is very overfit. We will need to either reduce features or add some regularization. The validation score is higher than our baseline, but the score is lower than [Munir)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7794124/) and their team's study accuracy of 87% (N=154). However, this is simply a supportive tool to assist radiologist, and the radiologist response would continue to train the model.

For out next iteration, lets try adding some regularization to see if it can reduce overfitting so our model can be more generalized.

## Second model
This model uses regularization to try to combat overfiting. The model uses techniques learned from the [General Assembly Data science immersive bootcamp](https://generalassemb.ly/education/data-science) which taught a lab on regularization with convolutional neural networks.

In [10]:
model2 = Sequential()
model2.add(Conv2D(32, (3, 3), activation='relu', input_shape=(256, 256, 1), kernel_regularizer=l2(0.01)))
model2.add(MaxPooling2D((2, 2)))

model2.add(Conv2D(64, (3, 3), activation='relu', kernel_regularizer=l2(0.01)))
model2.add(MaxPooling2D((2, 2)))
model2.add(Dropout(0.25))

model2.add(Conv2D(128, (3, 3), activation='relu', kernel_regularizer=l2(0.01)))
model2.add(MaxPooling2D((2, 2)))
model2.add(Dropout(0.25))

model2.add(Flatten())
model2.add(Dense(128, activation='relu', kernel_regularizer=l2(0.01)))
model2.add(Dropout(0.5))

model2.add(Dense(4, activation='softmax'))

In [11]:
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [12]:
history2 = model2.fit(train_ds, validation_data=val_ds, epochs=10, callbacks=[early_stopping])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [13]:
model2.save('../Models/CNN_regularization.h5')

### Interpretation:
Here we see that our training accuracy is about 91% while our validation is 74% which suggest our model is still overfit, but not as much. The model does better than our baseline model's accuracy of 46% but is less than the accuracy in [Munir's team's](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7794124/) study which found the accuracy of two radiologist was 87%.

Since our validation scores are pretty similar we will have to evaluate the models on other metrics such as precision to see which model suits our needs.

__________________________________________________________________________________________________________________________________________________________________________________________________________________
# Conclusion:

It appears our neural networks have similar scores. These score better than our baseline of 46%, but does less than the accuracy (87%) of the radiologists found in the [Munir et al. (2021)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7794124/) study. It should noted their sample size was 154 patients while this data set had over 7000 images; however, we need to keep in mind multiple images could be of the same patient.

A neural network implementing augmentation was attempted, but there was an issue with running out of memory. There was an attempt at saving the images instead, but that was causing conflicts as well so augmentation was scrapped. Using a pre-trained model was also tested, specifically MobileNet and NASNetMobile, but those models did not work with greyscale images so that idea was also scrapped. We wanted a lightweight pretrained model for the purposes of this classification problem, so that was the logic behind selecting those two models. In the future more research would need to be conducted on which pre-trained models can be combined with out model to improve accuracy, but due to team constraints that will have to be put on hold.

We will now proceed to our [Modeling Evaluation Notebook](../Notebooks/04_Model_evaluation.ipynb) to evaluate our models on other metrics such as their precision scores.