#### Overview

 - This notebook uses [BreakHis dataset](https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/).
  - The dataset has 7,909 histopathological images which  classified into benign and malignant with 2,480 and 5,429 images respectively. 
  - It has images consisting of 4 zoom levels, but for this notebook zoom levels are not taken in to consideration will traning.
 
 - 80-20 split is taken of the dataset.
 
|              	| Benign 	| Malignant	|  |
|--------------	| ----------:	| ----------:	| ------------:	|
| Training 	| 1984 	| 4343 	| 6327   	| 
| Validation 	| 496 	| 1086 	| 1582   	| 
|  |  2480 | 5429 | 7909 |
  
 
 - **Transfer Learning** is applied on the pretrained models Inception v3 and NasNet Large. 
  - Also, the top classification layer of those models is dropped.
  - Weights used 'imagenet'.
  - Models used via [Keras application module](https://keras.io/applications/).
  - Inception V3:
    - image_size 224x224.
  - NasNet Large:
    - image_size 331x331.
 
 - Custom classfication layers attached are:
  - GlobalAveragePooling2D layer
  - Dropout layer with drop probability 0.4 to avoid overfitting. 
  - Dense layers with 2 classes and activation function softmax.
  
 
 - Training parameters:
  - epochs: 150
  - batch size: 16
 
 - Results after 150 epochs:

|              	| Training 	|          	| Validation 	|          	|
|--------------	|----------	|----------	|------------	|----------	|
| **Model**        	| **Acc**      	| **Loss**     	| **Acc**        	| **Loss**     	|
| Inception V3 	| 0.810279 	| 0.432034 	| 0.690051   	| 1.480263 	|
| NasNet Large 	| 0.863103 	| 0.337798 	| 0.750638   	| 0.878199 	|
      
  - Graphs:
    - Accuracy
    - Loss

### Imports

In [0]:
!pip install split-folders

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals
from datetime import datetime
import os

import split_folders

# google drive
from google.colab import drive

# plots
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

# tested on TF 1.14.0
import tensorflow as tf
from tensorflow import keras
print("TensorFlow version is ", tf.__version__)

# TF hub
import tensorflow_hub as hub

tf.enable_eager_execution()

### Define Constants

In [0]:
DATA_ROOT = "/content/breakhis/" #@param {"type": "string"}
DRIVE_ROOT = "/content/gdrive/My Drive/colab_data/" #@param {"type": "string"}

EPOCHS = 150 #@param {"type": "number"}
BATCH_SIZE = 8 #@param {"type": "number"}
VALID_FREQ = 1 #@param {"type": "number"}
WORKERS = 4 #@param {"type": "number"}

SAVE_MODELS = True #@param {"type": "boolean"}
SAVE_FREQ = 25 #@param {"type": "number"}
SAVE_HISTORY = True #@param {"type": "boolean"}

FINE_TUNE = False

NASNET= "nasnet"
INC_V3 = "inc_v3"

MODEL = "nasnet" #@param ["nasnet", "inc_v3"]

In [0]:
!mkdir model_dump

### Dataset

Mount Google Drive and extract the files:

In [0]:
drive.mount('/content/gdrive')

In [0]:
!tar -xzf "{DRIVE_ROOT}breakhis_all.tar.gz"

In [0]:
TRAIN_P = 0.8
TEST_P = 1 - TRAIN_P

split_folders.ratio("breakhis_all/", 
                      output="breakhis/", 
                      seed=6482, 
                      ratio=(TRAIN_P, TEST_P)) # default values

In [0]:
!rm -r breakhis_all/

### Model based on Inception V3 or NASNet

Execute any one as base model:

In [0]:
# Create the model.
stamp = datetime.now().strftime("%d-%m-%Y_%H-%M")
# Inception V3
if MODEL == INC_V3:
  base_model = keras.applications.InceptionV3(weights="imagenet", 
                                              include_top=False)

  MODEL_PATH = "/content/model_dump/model-v3-{epoch}.hdf5"
  history_f = DRIVE_ROOT + "history-v3-" + stamp + ".pk"
  models_f = DRIVE_ROOT + "models-v3-" + stamp + "gz"

  from keras.applications.inception_v3 import preprocess_input

  train_datagen = keras.preprocessing.image.ImageDataGenerator(
                    preprocessing_function=preprocess_input)

  train_generator = train_datagen.flow_from_directory(
                      DATA_ROOT + "train",
                      batch_size=BATCH_SIZE
                    )

  test_datagen = keras.preprocessing.image.ImageDataGenerator(
                    preprocessing_function=preprocess_input)

  test_generator = train_datagen.flow_from_directory(
                      DATA_ROOT + "val"
                    )
# NASNET
elif MODEL == NASNET:
  base_model = keras.applications.NASNetLarge(
                                              weights="imagenet", 
                                              include_top=False)


  MODEL_PATH = "/content/model_dump/model-nasnet-{epoch}.hdf5"
  history_f = DRIVE_ROOT + "history-nasnet-" + stamp + ".pk"
  models_f = DRIVE_ROOT + "models-nasnet-" + stamp + ".gz"

  from keras.applications.nasnet import preprocess_input

  train_datagen = keras.preprocessing.image.ImageDataGenerator(
                    preprocessing_function=preprocess_input)

  train_generator = train_datagen.flow_from_directory(
                      DATA_ROOT + "train",
                      target_size=(331,331),
                      batch_size=BATCH_SIZE
                    )

  test_datagen = keras.preprocessing.image.ImageDataGenerator(
                    preprocessing_function=preprocess_input)

  test_generator = train_datagen.flow_from_directory(
                      DATA_ROOT + "val",
                      target_size=(331,331)
                    )

### Add layers and train

In [0]:
# Model Save callback
callbacks = []
if SAVE_MODELS:
  callbacks = [keras.callbacks.ModelCheckpoint(MODEL_PATH,
                                  period=SAVE_FREQ)]

x = base_model.output
x = keras.layers.GlobalAveragePooling2D(name='avg_pool')(x)
x = keras.layers.Dropout(0.4)(x)
predictions = keras.layers.Dense(train_generator.num_classes, activation='softmax')(x)
model = keras.Model(inputs=base_model.input, outputs=predictions)

In [0]:
for layer in model.layers:
    layer.trainable = FINE_TUNE

In [0]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [0]:
model.summary()

In [0]:
# train the added classifiers
history = model.fit_generator(train_generator, 
                    epochs=EPOCHS,
                    steps_per_epoch=train_generator.n//BATCH_SIZE,
                    workers=1,
                    validation_data=test_generator,
                    validation_steps=test_generator.n//BATCH_SIZE,
                    validation_freq=VALID_FREQ
                    ,callbacks=callbacks
                  ).history

Predict:

In [0]:
acc = history['acc']
val_acc = history['val_acc']

loss = history['loss']
val_loss = history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.legend(loc='upper right')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation Accuracy')


plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='lower right')
plt.ylabel('Cross Entropy')
plt.xlabel('Epoch')
plt.ylim([0,max(plt.ylim())])
plt.title('Training and Validation Loss')
plt.show()

## Export model & history

Compress and move to:

In [0]:
if SAVE_MODELS:
  !tar -czf models.gz model_dump
  !cp models.gz {models_f}

Export training history:

In [0]:
if SAVE_HISTORY:
  import pickle
  with open("history.pk", 'wb') as f:
    pickle.dump(history, f)
  !cp history.pk {history_f}

##Confusion Matrix

In [0]:
Y_pred_t = model.predict_generator(train_generator)
y_pred_t = np.argmax(Y_pred_t, axis=1)
len(y_pred_t)

In [0]:
Y_pred_v = model.predict_generator(test_generator)
y_pred_v = np.argmax(Y_pred_v, axis=1)
len(y_pred_v)

In [0]:
y_pred = np.concatenate((y_pred_t, y_pred_v))
classes = np.concatenate((train_generator.classes, test_generator.classes))

cnf = confusion_matrix(classes, y_pred)
#Confution Matrix and Classification Report
print('Confusion Matrix')
print(cnf)
print('Classification Report')
target_names = ["Benign", "Malignant"]
print(classification_report(classes, y_pred, target_names=target_names))

In [0]:
sns.heatmap(cnf, annot=True, xticklabels=["Benign", "Malignant"], yticklabels=["Benign", "Malignant"], fmt="d")

In [0]:
#to export model as png
#keras.utils.plot_model(model, to_file='nasnet.png', show_shapes=True, show_layer_names=True)