# Exercise 1
Q: When does transfer learning make sense?

* A lot of the low level features such as edge detection or curve detection can be trained on big image recognition datasets. 
    * these low level features should be helpful for the problem you are transferring to.
* Knowledge of the sturcture and nature of images might be useful for the radiology diagnosis dataset because the network does not necessarily need a lot of extra data to train on since it already knows basic features.
* Transfer learning makes sense when you have a lot of data for the problem where you are transferring from and relatively less data for the problem you are transferring to. 
* Tasks should have the same input x.

Q: Does it make sense to do transfer learning from ImageNet to the Patch-CAMELYON dataset?

Yes, all the reasons why transfer learning would make sense apply to the transfer from ImageNet to Patch-CAMELYON.

# Exercise 2
Run the example in transfer.py. Then, modify the code so that the MobileNetV2 model is not initialized from the ImageNet weights, but randomly (you can do that by setting the weights parameter to None). 

### Run with imagenet weights

In [5]:
# disable overly verbose tensorflow logging
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}   
import tensorflow as tf


import numpy as np

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, GlobalAveragePooling2D, Dropout
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input

In [6]:
def get_pcam_generators(base_dir, train_batch_size=32, val_batch_size=32):

     # dataset parameters
     train_path = os.path.join(base_dir, 'train+val', 'train')
     valid_path = os.path.join(base_dir, 'train+val', 'valid')
	 
     # instantiate data generators
     datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

     train_gen = datagen.flow_from_directory(train_path,
                                             target_size=(IMAGE_SIZE, IMAGE_SIZE),
                                             batch_size=train_batch_size,
                                             class_mode='binary')

     val_gen = datagen.flow_from_directory(valid_path,
                                             target_size=(IMAGE_SIZE, IMAGE_SIZE),
                                             batch_size=val_batch_size,
                                             class_mode='binary')

     return train_gen, val_gen

In [7]:
# the size of the images in the PCAM dataset
IMAGE_SIZE = 96

input_shape = (IMAGE_SIZE, IMAGE_SIZE, 3)


input = Input(input_shape)

In [8]:
# get the pretrained model, cut out the top layer
pretrained = MobileNetV2(input_shape=input_shape, include_top=False, weights='imagenet')

# if the pretrained model it to be used as a feature extractor, and not for
# fine-tuning, the weights of the model can be frozen in the following way
# for layer in pretrained.layers:
#    layer.trainable = False

output = pretrained(input)
output = GlobalAveragePooling2D()(output)
output = Dropout(0.5)(output)
output = Dense(1, activation='sigmoid')(output)

model = Model(input, output)

# note the lower lr compared to the cnn example
model.compile(SGD(learning_rate=0.001, momentum=0.95), loss = 'binary_crossentropy', metrics=['accuracy'])

# print a summary of the model on screen
model.summary()

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 96, 96, 3)]       0         
                                                                 
 mobilenetv2_1.00_96 (Functi  (None, 3, 3, 1280)       2257984   
 onal)                                                           
                                                                 
 global_average_pooling2d_2   (None, 1280)             0         
 (GlobalAveragePooling2D)                                        
                                                                 
 dropout_2 (Dropout)         (None, 1280)              0         
                                                                 
 dense_2 (Dense)             (None, 1)                 1281      
                                                                 
Total params: 2,259,265
Trainable params: 2,225,153
Non-tra

In [9]:
# get the data generators
train_gen, val_gen = get_pcam_generators(r'C:\Users\20192236\Documents\Project_Imaging')


# save the model and weights
model_name = 'my_first_transfer_model'
model_filepath = model_name + '.json'
weights_filepath = model_name + '_weights.hdf5'

model_json = model.to_json() # serialize model to JSON
with open(model_filepath, 'w') as json_file:
    json_file.write(model_json)


# define the model checkpoint and Tensorboard callbacks
checkpoint = ModelCheckpoint(weights_filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
tensorboard = TensorBoard(os.path.join('logs', model_name))
callbacks_list = [checkpoint, tensorboard]


# train the model, note that we define "mini-epochs"
train_steps = train_gen.n//train_gen.batch_size//20
val_steps = val_gen.n//val_gen.batch_size//20

# since the model is trained for only 10 "mini-epochs", i.e. half of the data is
# not used during training
history = model.fit(train_gen, steps_per_epoch=train_steps,
                    validation_data=val_gen,
                    validation_steps=val_steps,
                    epochs=10,
                    callbacks=callbacks_list)

Found 144000 images belonging to 2 classes.
Found 16000 images belonging to 2 classes.
Epoch 1/10
Epoch 1: val_loss improved from inf to 2.36042, saving model to my_first_transfer_model_weights.hdf5
Epoch 2/10
Epoch 2: val_loss did not improve from 2.36042
Epoch 3/10
Epoch 3: val_loss improved from 2.36042 to 0.99772, saving model to my_first_transfer_model_weights.hdf5
Epoch 4/10
Epoch 4: val_loss improved from 0.99772 to 0.40459, saving model to my_first_transfer_model_weights.hdf5
Epoch 5/10
Epoch 5: val_loss did not improve from 0.40459
Epoch 6/10
Epoch 6: val_loss did not improve from 0.40459
Epoch 7/10
Epoch 7: val_loss did not improve from 0.40459
Epoch 8/10
Epoch 8: val_loss did not improve from 0.40459
Epoch 9/10
Epoch 9: val_loss improved from 0.40459 to 0.33520, saving model to my_first_transfer_model_weights.hdf5
Epoch 10/10
Epoch 10: val_loss improved from 0.33520 to 0.27567, saving model to my_first_transfer_model_weights.hdf5


### Run without imagenet weights

In [10]:
# disable overly verbose tensorflow logging
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}   
import tensorflow as tf


import numpy as np

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, GlobalAveragePooling2D, Dropout
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input

In [11]:
def get_pcam_generators(base_dir, train_batch_size=32, val_batch_size=32):

     # dataset parameters
     train_path = os.path.join(base_dir, 'train+val', 'train')
     valid_path = os.path.join(base_dir, 'train+val', 'valid')
	 
     # instantiate data generators
     datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

     train_gen = datagen.flow_from_directory(train_path,
                                             target_size=(IMAGE_SIZE, IMAGE_SIZE),
                                             batch_size=train_batch_size,
                                             class_mode='binary')

     val_gen = datagen.flow_from_directory(valid_path,
                                             target_size=(IMAGE_SIZE, IMAGE_SIZE),
                                             batch_size=val_batch_size,
                                             class_mode='binary')

     return train_gen, val_gen

In [12]:
# the size of the images in the PCAM dataset
IMAGE_SIZE = 96

input_shape = (IMAGE_SIZE, IMAGE_SIZE, 3)


input = Input(input_shape)

In [18]:
# get the pretrained model, cut out the top layer
pretrained = MobileNetV2(input_shape=input_shape, include_top=False, weights=None)

# if the pretrained model it to be used as a feature extractor, and not for
# fine-tuning, the weights of the model can be frozen in the following way
# for layer in pretrained.layers:
#    layer.trainable = False

output = pretrained(input)
output = GlobalAveragePooling2D()(output)
output = Dropout(0.5)(output)
output = Dense(1, activation='sigmoid')(output)

model = Model(input, output)

# note the lower lr compared to the cnn example
model.compile(SGD(learning_rate=0.001, momentum=0.95), loss = 'binary_crossentropy', metrics=['accuracy'])

# print a summary of the model on screen
model.summary()

Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_6 (InputLayer)        [(None, 96, 96, 3)]       0         
                                                                 
 mobilenetv2_1.00_96 (Functi  (None, 3, 3, 1280)       2257984   
 onal)                                                           
                                                                 
 global_average_pooling2d_3   (None, 1280)             0         
 (GlobalAveragePooling2D)                                        
                                                                 
 dropout_3 (Dropout)         (None, 1280)              0         
                                                                 
 dense_3 (Dense)             (None, 1)                 1281      
                                                                 
Total params: 2,259,265
Trainable params: 2,225,153
Non-tra

In [19]:
# get the data generators
train_gen, val_gen = get_pcam_generators(r'C:\Users\20192236\Documents\Project_Imaging')


# save the model and weights
model_name = 'Weights_None_transfer_model'
model_filepath = model_name + '.json'
weights_filepath = model_name + '_weights.hdf5'

model_json = model.to_json() # serialize model to JSON
with open(model_filepath, 'w') as json_file:
    json_file.write(model_json)


# define the model checkpoint and Tensorboard callbacks
checkpoint = ModelCheckpoint(weights_filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
tensorboard = TensorBoard(os.path.join('logs', model_name))
callbacks_list = [checkpoint, tensorboard]


# train the model, note that we define "mini-epochs"
train_steps = train_gen.n//train_gen.batch_size//20
val_steps = val_gen.n//val_gen.batch_size//20

# since the model is trained for only 10 "mini-epochs", i.e. half of the data is
# not used during training
history = model.fit(train_gen, steps_per_epoch=train_steps,
                    validation_data=val_gen,
                    validation_steps=val_steps,
                    epochs=10,
                    callbacks=callbacks_list)

Found 144000 images belonging to 2 classes.
Found 16000 images belonging to 2 classes.
Epoch 1/10
Epoch 1: val_loss improved from inf to 0.69436, saving model to Weights_None_transfer_model_weights.hdf5
Epoch 2/10
Epoch 2: val_loss did not improve from 0.69436
Epoch 3/10
Epoch 3: val_loss improved from 0.69436 to 0.69311, saving model to Weights_None_transfer_model_weights.hdf5
Epoch 4/10
Epoch 4: val_loss did not improve from 0.69311
Epoch 5/10
Epoch 5: val_loss did not improve from 0.69311
Epoch 6/10
Epoch 6: val_loss improved from 0.69311 to 0.69303, saving model to Weights_None_transfer_model_weights.hdf5
Epoch 7/10
Epoch 7: val_loss did not improve from 0.69303
Epoch 8/10
Epoch 8: val_loss did not improve from 0.69303
Epoch 9/10
Epoch 9: val_loss did not improve from 0.69303
Epoch 10/10
Epoch 10: val_loss did not improve from 0.69303


Q: Analyze the results from both runs and compare them to the CNN example in assignment 3.

transfer model with imagenet weights:
val_loss: 0.2757 - val_accuracy: 0.9062
transfer model without imagenet weights:
val_loss: 0.6932 - val_accuracy: 0.5025
CNN model from assignment 3:
val_loss: 0.3280 - val_accuracy: 0.8565

The most accuracy model is the transfer model with imagenet weights, than the model from assignment 3 and than the transfer model without imagenet weights.
This is to be expected since the transfer model with imagenet initialization is trained on a very large dataset and therefore knows all the low level features. The model without the imagenet weights has too little data from the Patch-CAMELYON dataset to train on which results in the weights not converging to an accurate predicting model. The model from assignment 3 has a bit lower accuracy than the transfer model with weights initialization. This could be the result of not having enough data to train on or the slight difference in model structure.


# Exercise 3
The model in transfer.py uses a dropout layer. How does dropout work and what is the effect of adding dropout layers the the network architecture? What is the observed effect when removing the dropout layer from this model? Hint: check out the Keras documentation for this layer.

 <b>not answered yet</b>