Training notebook for the deeptail task with data augmentation. If you haven't done so already, plese first run the image preprocessing code in train_no_aug.ipynb

In [1]:
import tensorflow
import keras
keras.__version__

Using TensorFlow backend.


'2.1.2'

## Getting started 
- Download data from: https://www.kaggle.com/c/whale-categorization-playground
- Rename train.csv to targets.csv
- Rename the train directory to kaggle_train
- Run image processing cells in `train_no_aug.ipynb`


### Using a pre-trained convolutional base


In [2]:
from keras.applications import Xception
image_size = (180,180) #adjustable parameter for processed image_size. Run time should 
classes_count = 4250 # There are 4250 classes, not including new_whale

conv_base = Xception(weights='imagenet',
                  include_top=False,
                  input_shape=(image_size[0], image_size[1], 3))

In [11]:
conv_base.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 180, 180, 3)  0                                            
__________________________________________________________________________________________________
block1_conv1 (Conv2D)           (None, 89, 89, 32)   864         input_2[0][0]                    
__________________________________________________________________________________________________
block1_conv1_bn (BatchNormaliza (None, 89, 89, 32)   128         block1_conv1[0][0]               
__________________________________________________________________________________________________
block1_conv1_act (Activation)   (None, 89, 89, 32)   0           block1_conv1_bn[0][0]            
__________________________________________________________________________________________________
block1_con

The final feature map has shape `(6, 6, 2048)`. That's the feature on top of which we will stick a densely-connected classifier.


In [3]:
from keras import models
from keras import layers
from keras import optimizers

model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(classes_count, activation='softmax')) 

In [4]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
xception (Model)             (None, 6, 6, 2048)        20861480  
_________________________________________________________________
flatten_1 (Flatten)          (None, 73728)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               9437312   
_________________________________________________________________
dense_2 (Dense)              (None, 4250)              548250    
Total params: 30,847,042
Trainable params: 30,792,514
Non-trainable params: 54,528
_________________________________________________________________


Before we compile and train our model, a very important thing to do is to freeze the convolutional base. "Freezing" a layer or set of layers means preventing their weights from getting updated during training. If we don't do this, then the representations that were previously learned by the convolutional base would get modified during training. Since the Dense layers on top are randomly initialized, very large weight updates would be propagated through the network, effectively destroying the representations previously learned.

In Keras, freezing a network is done by setting its trainable attribute to False:

In [6]:
conv_base.trainable = False

In [9]:
from keras.preprocessing.image import ImageDataGenerator
import os

home_dir = os.getcwd()
train_dir = os.path.join(home_dir, 'train')
validation_dir = os.path.join(home_dir, 'validation')

train_image_count = 8709
validation_image_count = 331

batch_size = 128
# data augmentation settings
train_datagen = ImageDataGenerator(
      rescale=1./255,
      #rotation_range=20,
      shear_range=0.2,
      #zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')

# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        # This is the target directory
        train_dir,
        color_mode = 'rgb',
        # All images will be resized to 150x150
        target_size=image_size,
        batch_size=batch_size,
        class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        color_mode = 'rgb',
        target_size=image_size,
        batch_size=batch_size,
        class_mode='categorical')

model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.Adam(lr=2e-4),
              metrics=['acc'])

test_steps = int(round(train_image_count/batch_size))
print('test steps: ' + str(test_steps))
validation_steps = int(round(validation_image_count/batch_size))
print('validation_steps: ' + str(validation_steps))

Found 8709 images belonging to 4250 classes.
Found 331 images belonging to 4250 classes.
test steps: 68
validation_steps: 3


In [10]:
history = model.fit_generator(
      train_generator,
      steps_per_epoch=test_steps,
      epochs=10,
      validation_data=validation_generator,
      validation_steps=validation_steps)

Epoch 1/10
 5/68 [=>............................] - ETA: 22:36 - loss: 8.3711 - acc: 0.0000e+00

KeyboardInterrupt: 

In [None]:
model.save('weights/data_aug_no_fine_tune_0.h5')

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

#plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

# Fine-Tuning the convolutional base

In [7]:
conv_base.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 180, 180, 3)  0                                            
__________________________________________________________________________________________________
block1_conv1 (Conv2D)           (None, 89, 89, 32)   864         input_2[0][0]                    
__________________________________________________________________________________________________
block1_conv1_bn (BatchNormaliza (None, 89, 89, 32)   128         block1_conv1[0][0]               
__________________________________________________________________________________________________
block1_conv1_act (Activation)   (None, 89, 89, 32)   0           block1_conv1_bn[0][0]            
__________________________________________________________________________________________________
block1_con

We will unfreeze the layers in blocks 12-14

In [9]:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
    if layer.name == "block12_sepconv1_act":
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False


Now we can start fine-tuning our network. We will do this with the RMSprop optimizer, using a very low learning rate. The reason for using a low learning rate is that we want to limit the magnitude of the modifications we make to the representations of the 3 layers that we are fine-tuning. Updates that are too large may harm these representations.

Now let's proceed with fine-tuning

In [None]:
model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-5),
              metrics=['acc'])

test_steps = int(round(train_image_count/batch_size))
print('test steps: ' + str(test_steps))
validation_steps = int(round(validation_image_count/batch_size))
print('validation_steps ' + str(validation_steps))

history = model.fit_generator(
      train_generator,
      steps_per_epoch=test_steps,
      epochs=150,
      validation_data=validation_generator,
      validation_steps=validation_steps)

In [None]:
model.save('weights/data_aug_fine_tuned_0.h5')

In [None]:
%matplotlib inline
import  matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

#plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()