Importing Libraries


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**TensorFlow** is a Python library for fast numerical computing created and released by Google. It is a foundation library that can be used to create Deep Learning models


**tqdm** is a library in Python which is used for creating Progress Meters or Progress Bars

The **OS** module in Python provides functions for interacting with the operating system.

The **sklearn** library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering.

In [None]:
import matplotlib.pyplot as plt 
import numpy as np
import pandas as pd
import seaborn as sns
import cv2
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tqdm import tqdm
import os
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, TensorBoard, ModelCheckpoint
from sklearn.metrics import classification_report,confusion_matrix
import ipywidgets as widgets
import io
from PIL import Image
from IPython.display import display,clear_output
from warnings import filterwarnings

In [None]:
labels = ['glioma_tumor','no_tumor','meningioma_tumor','pituitary_tumor']

In [None]:
X_train = []
y_train = []
image_size = 150
for i in labels:
    folderPath = os.path.join('/content/drive/MyDrive/archive','Training',i)
    for j in tqdm(os.listdir(folderPath)):
        img = cv2.imread(os.path.join(folderPath,j))
        img = cv2.resize(img,(image_size, image_size))
        X_train.append(img)
        y_train.append(i)
        
for i in labels:
    folderPath = os.path.join('/content/drive/MyDrive/archive','Testing',i)
    for j in tqdm(os.listdir(folderPath)):
        img = cv2.imread(os.path.join(folderPath,j))
        img = cv2.resize(img,(image_size,image_size))
        X_train.append(img)
        y_train.append(i)
        
X_train = np.array(X_train)
y_train = np.array(y_train)

100%|██████████| 826/826 [04:33<00:00,  3.02it/s]
100%|██████████| 395/395 [02:06<00:00,  3.12it/s]
100%|██████████| 822/822 [04:29<00:00,  3.04it/s]
100%|██████████| 827/827 [04:27<00:00,  3.09it/s]
100%|██████████| 100/100 [00:30<00:00,  3.26it/s]
100%|██████████| 105/105 [00:26<00:00,  3.97it/s]
100%|██████████| 115/115 [00:37<00:00,  3.05it/s]
100%|██████████| 74/74 [00:20<00:00,  3.53it/s]


In [None]:
X_train, y_train = shuffle(X_train,y_train, random_state=101)

Image Data Augmentation: Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset. It uses techniques such as flipping, zooming, padding, cropping, etc.

To do so using Keras, we use the function ImageDataGenerator

In [None]:
datagen = ImageDataGenerator(
    rotation_range=30,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.2,
    horizontal_flip=True)

datagen.fit(X_train)
X_train.shape

(3264, 150, 150, 3)

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X_train,y_train, test_size=0.1,random_state=101)

**Performing** One Hot Encoding on the labels after converting it into numerical values:
A one hot encoding is appropriate for categorical data where no relationship exists between categories. 

It involves representing each categorical variable with a binary vector that has one element for each unique label and marking the class label with a 1 and all other elements 0

A one hot encoding allows the representation of categorical data to be more expressive. Many machine learning algorithms cannot work with categorical data directly. The categories must be converted into numbers.

https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/

https://towardsdatascience.com/building-a-one-hot-encoding-layer-with-tensorflow-f907d686bf39

In [None]:
y_train_new = []
for i in y_train:
    y_train_new.append(labels.index(i))
y_train = y_train_new
y_train = tf.keras.utils.to_categorical(y_train)


y_test_new = []
for i in y_test:
    y_test_new.append(labels.index(i))
y_test = y_test_new
y_test = tf.keras.utils.to_categorical(y_test)

Transfer Learning

Deep convolutional neural network models may take days or even weeks to train on very large datasets.

A way to short-cut this process is to re-use the model weights from pre-trained models that were developed for standard computer vision benchmark datasets, such as the ImageNet image recognition tasks. Top performing models can be downloaded and used directly, or integrated into a new model for your own computer vision problems.

we are using the EfficientNetB0 model which will use the weights from the ImageNet dataset.

The include_top parameter is set to False so that the network doesn't include the top layer/ output layer from the pre-built model which allows us to add our own output layer depending upon our use case!

https://www.tensorflow.org/api_docs/python/tf/keras/applications/efficientnet/EfficientNetB0

In [None]:
effnet = EfficientNetB0(weights='imagenet',include_top=False,input_shape=(image_size,image_size,3))

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb0_notop.h5


GlobalAveragePooling2D ->This really helps in decreasing the computational load on the machine while training.

https://adventuresinmachinelearning.com/global-average-pooling-convolutional-neural-networks/

Dropout -> This layer omits some of the neurons at each step from the layer making the neurons more independent from the neibouring neurons. It helps in avoiding overfitting. Neurons to be ommitted are selected at random. The rate parameter is the liklihood of a neuron activation being set to 0, thus dropping out the neuron

https://media.geeksforgeeks.org/wp-content/cdn-uploads/20190523171258/overfitting_2.png

Dense -> This is the output layer which classifies the image into 1 of the 4 possible classes. It uses the softmax function which is a generalization of the sigmoid function.
https://i.stack.imgur.com/0rewJ.png

In [None]:
model = effnet.output
model = tf.keras.layers.GlobalAveragePooling2D()(model)
model = tf.keras.layers.Dropout(rate=0.5)(model)
model = tf.keras.layers.Dense(4,activation='softmax')(model)
model = tf.keras.models.Model(inputs=effnet.input, outputs = model)

In [None]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 150, 150, 3) 0                                            
__________________________________________________________________________________________________
rescaling (Rescaling)           (None, 150, 150, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
normalization (Normalization)   (None, 150, 150, 3)  7           rescaling[0][0]                  
__________________________________________________________________________________________________
stem_conv_pad (ZeroPadding2D)   (None, 151, 151, 3)  0           normalization[0][0]              
______________________________________________________________________________________________

We finally compile our model.

categorical_crossentropy: Used as a loss function for multi-class classification model where there are two or more output labels.

Optimizers are algorithms or methods used to change the attributes of the neural network such as weights and learning rate to reduce the losses. Optimizers are used to solve optimization problems by minimizing the function.Optimizers help to get results faster.

metrics= ['accuracy']: Calculates how often predictions equal labels.A metric is a function that is used to judge the performance of your model.

In [None]:
model.compile(loss='categorical_crossentropy',optimizer = 'Adam', metrics= ['accuracy'])

Callbacks -> Callbacks can help you fix bugs more quickly, and can help you build better models. They can help you visualize how your model’s training is going, and can even help prevent overfitting by implementing early stopping or customizing the learning rate on each iteration.

By definition, "A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training."

we are using TensorBoard, ModelCheckpoint and ReduceLROnPlateau callback functions

https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard

https://keras.io/api/callbacks/model_checkpoint/

https://keras.io/api/callbacks/reduce_lr_on_plateau/

In [None]:
tensorboard = TensorBoard(log_dir = 'logs')
checkpoint = ModelCheckpoint("effnet.h5",monitor="val_accuracy",save_best_only=True,mode="auto",verbose=1)
reduce_lr = ReduceLROnPlateau(monitor = 'val_accuracy', factor = 0.3, patience = 2, min_delta = 0.001,mode='auto',verbose=1)

Training The Model

In [None]:
history = model.fit(X_train,y_train,validation_split=0.1, epochs =12, verbose=1, batch_size=32,callbacks=[tensorboard,checkpoint,reduce_lr])



Epoch 1/12

Epoch 00001: val_accuracy improved from -inf to 0.63265, saving model to effnet.h5
Epoch 2/12

Epoch 00002: val_accuracy improved from 0.63265 to 0.89456, saving model to effnet.h5
Epoch 3/12

Epoch 00003: val_accuracy did not improve from 0.89456
Epoch 4/12

Epoch 00004: val_accuracy did not improve from 0.89456

Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0003000000142492354.
Epoch 5/12

Epoch 00005: val_accuracy improved from 0.89456 to 0.96599, saving model to effnet.h5
Epoch 6/12

Epoch 00006: val_accuracy did not improve from 0.96599
Epoch 7/12

Epoch 00007: val_accuracy did not improve from 0.96599

Epoch 00007: ReduceLROnPlateau reducing learning rate to 9.000000427477062e-05.
Epoch 8/12

Epoch 00008: val_accuracy improved from 0.96599 to 0.96939, saving model to effnet.h5
Epoch 9/12

Epoch 00009: val_accuracy improved from 0.96939 to 0.97279, saving model to effnet.h5
Epoch 10/12

Epoch 00010: val_accuracy improved from 0.97279 to 0.97619, saving mod

Prediction

we used the argmax function as each row from the prediction array contains four values for the respective labels. The maximum value which is in each row depicts the predicted output out of the 4 possible outcomes. So with argmax, I'm able to find out the index associated with the predicted outcome

In [None]:
pred = model.predict(X_test)
pred = np.argmax(pred,axis=1)

Evaluation

In this,

0 - Glioma Tumor
1 - No Tumor
2 - Meningioma Tumor
3 - Pituitary Tumor

https://medium.com/@kohlishivam5522/understanding-a-classification-report-for-your-machine-learning-model-88815e2ce397

In [None]:
print(classification_report(y_test_new,pred))

              precision    recall  f1-score   support

           0       0.96      1.00      0.98        93
           1       1.00      1.00      1.00        51
           2       1.00      0.95      0.97        96
           3       0.99      1.00      0.99        87

    accuracy                           0.98       327
   macro avg       0.99      0.99      0.99       327
weighted avg       0.99      0.98      0.98       327



In [None]:
#save model
from tensorflow.python.keras.models import load_model
import keras
import keras.utils
from keras import utils as np_utils

keras.models.save_model(model,'tumor_prediction.h5', overwrite=True,include_optimizer=True)

model.save('tumor_prediction.h5')



In [None]:
def img_pred(upload):
    for name, file_info in uploader.value.items():
        img = Image.open(io.BytesIO(file_info['content']))
    opencvImage = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
    img = cv2.resize(opencvImage,(150,150))
    img = img.reshape(1,150,150,3)
    p = model.predict(img)
    p = np.argmax(p,axis=1)[0]

    if p==0:
        p='Glioma Tumor'
    elif p==1:
        print('The model predicts that there is no tumor')
    elif p==2:
        p='Meningioma Tumor'
    else:
        p='Pituitary Tumor'

    if p!=1:
        print(f'The Model predicts that it is a {p}')

In [None]:
uploader = widgets.FileUpload()
display(uploader)

FileUpload(value={}, description='Upload')

In [None]:
button = widgets.Button(description='Predict')
out = widgets.Output()
def on_button_clicked(_):
    with out:
        clear_output()
        try:
            img_pred(uploader)
            
        except:
            print('No Image Uploaded/Invalid Image File')
button.on_click(on_button_clicked)
widgets.VBox([button,out])

VBox(children=(Button(description='Predict', style=ButtonStyle()), Output()))