# Transfer Learning (EfficientNetB0)

Instead of extracting high-level features, one can use pretrained models directly for classfication by adding some output layers. It is also possible to fine-tune such a model, though it is very ressource consuming.

I use here the original images, as data augmentation is done by two added input layers for random horizontal flipping and random rotation.

## 1. Models used for high-level feature extraction

 **Model**         | **Size (MB)** | **Top-1 Accuracy** | **Top-5 Accuracy** | **Parameters** | **Depth** | **Time (ms) per inference step (CPU)** | **Time (ms) per inference step (GPU)** 
------------------:|--------------:|-------------------:|-------------------:|---------------:|----------:|---------------------------------------:|---------------------------------------:
 InceptionV3       | 92            | 0.779              | 0.937              | 23,851,784     | 159       | 42.25                                  | 6.86                                   
 *EfficientNetB0*    | 29            | -                  | -                  | 5,330,571      | -         | 46                                     | 4.91                                   
 ResNet50          | 98            | 0.749              | 0.921              | 25,636,712     | -         | 58.2                                   | 4.55                                   
 VGG16             | 528           | 0.713              | 0.901              | 138,357,544    | 23        | 69.5                                   | 4.16                                   
 DenseNet121       | 33            | 0.75               | 0.923              | 8,062,504      | 121       | 77.14                                  | 5.38                                   
 Xception          | 88            | 0.79               | 0.945              | 22,910,480     | 126       | 109.42                                 | 8.06                                   
 InceptionResNetV2 | 215           | 0.803              | 0.953              | 55,873,736     | 572       | 130.19                                 | 10.02                                  


> Data source: https://keras.io/api/applications/#available-models  
> Table converter: https://tableconvert.com/excel-to-markdown

For transfer learning I will use EfficientNetB0, as it is the most lightweight model (smallest size, least parameters) from the ones listed above.

## 2. Import packages

In [1]:
import numpy as np
import os
import pandas as pd
from tensorflow.keras.preprocessing.image import ImageDataGenerator

## 3. Structure of `data/split` directory

```
data/split
└── 40X
    ├── test
    │   ├── adenosis
    │   ├── ductal_carcinoma
    │   ├── fibroadenoma
    │   ├── lobular_carcinoma
    │   ├── mucinous_carcinoma
    │   ├── papillary_carcinoma
    │   ├── phyllodes_tumor
    │   └── tubular_adenoma
    ├── train
    │   ├── adenosis
    │   ├── ductal_carcinoma
    │   ├── fibroadenoma
    │   ├── lobular_carcinoma
    │   ├── mucinous_carcinoma
    │   ├── papillary_carcinoma
    │   ├── phyllodes_tumor
    │   └── tubular_adenoma
    └── val
        ├── adenosis
        ├── ductal_carcinoma
        ├── fibroadenoma
        ├── lobular_carcinoma
        ├── mucinous_carcinoma
        ├── papillary_carcinoma
        ├── phyllodes_tumor
        └── tubular_adenoma
```

## 4. Define image data generator

In [2]:
# data generator for train
image_generator_train = ImageDataGenerator()

In [3]:
# data generator validation and test
image_generator_valtest = ImageDataGenerator()

In [4]:
# train
image40Xtrain = image_generator_train.flow_from_directory(
    os.path.join('data','split','40X','train'),
    batch_size=4, # very small batch size to preserve RAM
    target_size=(460, 700),
    class_mode = 'sparse',
    shuffle=True
)

Found 1594 images belonging to 8 classes.


In [5]:
# validation
image40Xval = image_generator_valtest.flow_from_directory(
    os.path.join('data','split','40X','val'),
    batch_size=4, 
    target_size=(460, 700),
    class_mode = 'sparse',
    shuffle=True
)

Found 195 images belonging to 8 classes.


In [6]:
# test
image40Xtest = image_generator_valtest.flow_from_directory(
    os.path.join('data','split','40X','test'),
    batch_size=4, 
    target_size=(460, 700),
    class_mode = 'sparse',
    shuffle=True
)

Found 206 images belonging to 8 classes.


#### Print shape of images and labels

In [7]:
imgs, labels = image40Xtrain.next()
print('Images:', imgs.shape)
print('Labels:', labels.shape)

Images: (4, 460, 700, 3)
Labels: (4,)


#### Print range of pixel values

In [8]:
print('lowest pixel value:',np.min(imgs), '\nhighest pixel value:', np.max(imgs))

lowest pixel value: 38.0 
highest pixel value: 255.0


Pixel values range are in the range between 0 and 255.

## 5. Number of images per class for magnitude 40x

Check number of images for each class in each set (train, validation, test)

In [9]:
nperclass = []
for imgset, imgset_title in zip([image40Xtrain, image40Xval, image40Xtest], ['train','val','test']):
    #print('\n', imgset_title)
    for i in range(8):
        lb = list(imgset.class_indices)[i]
        sumclass = sum(imgset.labels==i)
        #print(lb, '\n   n:', sumclass, '\n   percentage:', '{:.2%}'.format(sumclass/imgset.n))
        nperclass.append({
            'set': imgset_title,
            'class': lb,
            'sumclass': sumclass,
            'proportion': '{:.2%}'.format(sumclass/imgset.n)
        })
    #print(imgset.n,': Total images with magnitude 40x',)

In [10]:
ncl = pd.DataFrame(nperclass)
tr_df = ncl.iloc[:8,]
val_df = ncl.iloc[8:16,]
te_df = ncl.iloc[16:24,]
tr_df.set_index(['class'], inplace=True)
val_df.set_index(['class'], inplace=True)
te_df.set_index(['class'], inplace=True)

In [11]:
pd.concat([tr_df, val_df, te_df], axis=1)

Unnamed: 0_level_0,set,sumclass,proportion,set,sumclass,proportion,set,sumclass,proportion
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
adenosis,train,91,5.71%,val,11,5.64%,test,12,5.83%
ductal_carcinoma,train,691,43.35%,val,86,44.10%,test,87,42.23%
fibroadenoma,train,202,12.67%,val,25,12.82%,test,26,12.62%
lobular_carcinoma,train,124,7.78%,val,15,7.69%,test,17,8.25%
mucinous_carcinoma,train,164,10.29%,val,20,10.26%,test,21,10.19%
papillary_carcinoma,train,116,7.28%,val,14,7.18%,test,15,7.28%
phyllodes_tumor,train,87,5.46%,val,10,5.13%,test,12,5.83%
tubular_adenoma,train,119,7.47%,val,14,7.18%,test,16,7.77%


As we see, proportion of each class was preserved by splitting in training, validation, and testing.

## 6. Transfer-learnig by using EfficentNetB0

> **The typical transfer-learning workflow**

> 1. Instantiate a base model and load pre-trained weights into it.
> 1. Freeze all layers in the base model by setting trainable = False.
> 1. Create a new model on top of the output of one (or several) layers from the base model.
> 1. Train your new model on your new dataset.

> see [The typical transferlearning workflow](https://keras.io/guides/transfer_learning/#the-typical-transferlearning-workflow)

#### Import packages

In [29]:
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.efficientnet import EfficientNetB0
from tensorflow.keras.applications.efficientnet import preprocess_input

The following workflow is adapted from [An end-to-end example: fine-tuning an image classification model on a cats vs. dogs dataset](https://keras.io/guides/transfer_learning/#an-endtoend-example-finetuning-an-image-classification-model-on-a-cats-vs-dogs-dataset).

#### 1. Instantiate a base model and load pre-trained weights into it

In [13]:
# Random data augmentation
data_augmentation = keras.Sequential(
    [layers.RandomFlip("horizontal"), layers.RandomRotation(0.1),]
)

2022-03-10 18:11:37.939894: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [14]:
# base model is EfficientNetB0
base_model = keras.applications.EfficientNetB0(
    weights="imagenet",  # Load weights pre-trained on ImageNet.
    input_shape=(460, 700, 3),
    include_top=False,
)  # Do not include the ImageNet classifier at the top.

#### 2. Freeze all layers in the base model by setting `trainable = False`

In [15]:
# Freeze the base_model
base_model.trainable = False

#### 3. Create a new model on top of the output of one (or several) layers from the base model

In [16]:
# Create new model on top
inputs = keras.Input(shape=(460, 700, 3))
x = data_augmentation(inputs)  # Apply random data augmentation
#x = inputs

In [17]:
# Pre-trained EfficientNetB0 weights requires that input be in a range of (0, 255)
# Therefore skip the following lines:

# Pre-trained Xception weights requires that input be scaled
# from (0, 255) to a range of (-1., +1.), the rescaling layer
# outputs: `(inputs * scale) + offset`
#scale_layer = keras.layers.Rescaling(scale=1 / 127.5, offset=-1)
#x = scale_layer(x)

In [18]:
# The base model contains batchnorm layers. We want to keep them in inference mode
# when we unfreeze the base model for fine-tuning, so we make sure that the
# base_model is running in inference mode here.

# inference mode: _using_ the model (for prediction), in contrast to _train_ a model
# batchnorm layer: 
#   Batch normalization applies a transformation that maintains the mean output close to 0 
#   and the output standard deviation close to 1.
#   Importantly, batch normalization works differently during training and during inference.
#   During training (training=True), the layer normalizes its output using the mean and standard deviation 
#      of the current batch of inputs. 
#   During inference (training=False), the layer normalizes its output using a moving average of the mean and standard deviation 
#      of the batches it has seen during training. 
#   See https://keras.io/api/layers/normalization_layers/batch_normalization/

x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.2)(x)  # Regularize with dropout
outputs = keras.layers.Dense(8)(x)
model = keras.Model(inputs, outputs)

model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 460, 700, 3)]     0         
_________________________________________________________________
sequential (Sequential)      (None, 460, 700, 3)       0         
_________________________________________________________________
efficientnetb0 (Functional)  (None, 15, 22, 1280)      4049571   
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1280)              0         
_________________________________________________________________
dense (Dense)                (None, 8)                 10248     
Total params: 4,059,819
Trainable params: 10,248
Non-trainable params: 4,049,571
______________________________________________

#### 4. Train your new model on your new dataset

In [19]:
model.compile(
    optimizer=keras.optimizers.Adam(),
    #optimizer='sgd',
    loss='sparse_categorical_crossentropy',
    metrics=['acc']
)

In [20]:
# End training when accuracy stops improving (optional)
early_stopping = keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

In [21]:
# Train model with a subsample
image40Xtrain.reset()
for i in range(10):
    print('batch number:',i)
    train_imgs, train_lbs = image40Xtrain.next()
    val_imgs, val_lbs = image40Xval.next()
    epochs = 2 #20
    model.fit(
        x=train_imgs, 
        y=train_lbs, 
        epochs=epochs, 
        validation_data=(val_imgs, val_lbs),
        callbacks=[early_stopping]
    )

batch number: 0


2022-03-10 18:11:41.230347: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/2
Epoch 2/2
batch number: 1
Epoch 1/2
Epoch 2/2
batch number: 2
Epoch 1/2
Epoch 2/2
batch number: 3
Epoch 1/2
Epoch 2/2
batch number: 4
Epoch 1/2
Epoch 2/2
batch number: 5
Epoch 1/2
Epoch 2/2
batch number: 6
Epoch 1/2
Epoch 2/2
batch number: 7
Epoch 1/2
Epoch 2/2
batch number: 8
Epoch 1/2
Epoch 2/2
batch number: 9
Epoch 1/2
Epoch 2/2


In [28]:
# Train model with whole sample
if False:
    epochs = 20
    history = model.fit(
        x=image40Xtrain, 
        validation_data=image40Xval, 
        epochs=epochs, 
        callbacks=[early_stopping]
    )

#### 5. (Additonal step) Do a round of fine-tuning of the entire model

In [23]:
# Unfreeze the base_model. Note that it keeps running in inference mode
# since we passed `training=False` when calling it. This means that
# the batchnorm layers will not update their batch statistics.
# This prevents the batchnorm layers from undoing all the training
# we've done so far.
base_model.trainable = True
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 460, 700, 3)]     0         
_________________________________________________________________
sequential (Sequential)      (None, 460, 700, 3)       0         
_________________________________________________________________
efficientnetb0 (Functional)  (None, 15, 22, 1280)      4049571   
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1280)              0         
_________________________________________________________________
dense (Dense)                (None, 8)                 10248     
Total params: 4,059,819
Trainable params: 4,017,796
Non-trainable params: 42,023
______________________________________________

Note here the amount of trainable params, compared to model above.

In [24]:
model.compile(
    optimizer=keras.optimizers.Adam(1e-5),  # Low learning rate
    loss='sparse_categorical_crossentropy',
    metrics=['acc']
)

In [25]:
epochs = 2 #10
model.fit(
    x=train_imgs, 
    y=train_lbs, 
    epochs=epochs, 
    validation_data=(val_imgs, val_lbs),
    callbacks=[early_stopping]
)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x17c8b30a0>

In [27]:
# Train model with whole sample
if False:
    epochs = 20
    history = model.fit(
        x=image40Xtrain, 
        validation_data=image40Xval, 
        epochs=epochs, 
        callbacks=[early_stopping]
    )