# Landmark recognition with deep learning
- Andrew J. Graves
- 04/05/21
- Run on Google Colab with GPUs

# Load Packages

In [1]:
# Import modules
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras

 # Note that you must use the same seed to ensure consistency in your training/validation/testing
np.random.seed(49)
tf.random.set_seed(49)

# Import Data

In [2]:
# Download dataset from FirebaseStorage
!wget https://firebasestorage.googleapis.com/v0/b/uva-landmark-images.appspot.com/o/dataset.zip?alt=media&token=e1403951-30d6-42b8-ba4e-394af1a2ddb7

--2021-04-05 00:51:51--  https://firebasestorage.googleapis.com/v0/b/uva-landmark-images.appspot.com/o/dataset.zip?alt=media
Resolving firebasestorage.googleapis.com (firebasestorage.googleapis.com)... 74.125.20.95, 2607:f8b0:400e:c09::5f
Connecting to firebasestorage.googleapis.com (firebasestorage.googleapis.com)|74.125.20.95|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 397174132 (379M) [application/zip]
Saving to: ‘dataset.zip?alt=media’


2021-04-05 00:51:54 (155 MB/s) - ‘dataset.zip?alt=media’ saved [397174132/397174132]



In [3]:
# Extract content (already unzipped)
!echo "A" | unzip "/content/dataset.zip?alt=media"

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: dataset/Rotunda/0000010694.jpg  
  inflating: __MACOSX/dataset/Rotunda/._0000010694.jpg  
  inflating: dataset/Rotunda/0000010680.jpg  
  inflating: __MACOSX/dataset/Rotunda/._0000010680.jpg  
  inflating: dataset/Rotunda/0000010858.jpg  
  inflating: __MACOSX/dataset/Rotunda/._0000010858.jpg  
  inflating: dataset/Rotunda/0000010870.jpg  
  inflating: __MACOSX/dataset/Rotunda/._0000010870.jpg  
  inflating: dataset/Rotunda/0000010864.jpg  
  inflating: __MACOSX/dataset/Rotunda/._0000010864.jpg  
  inflating: dataset/Rotunda/0000010865.jpg  
  inflating: __MACOSX/dataset/Rotunda/._0000010865.jpg  
  inflating: dataset/Rotunda/0000010871.jpg  
  inflating: __MACOSX/dataset/Rotunda/._0000010871.jpg  
  inflating: dataset/Rotunda/0000010859.jpg  
  inflating: __MACOSX/dataset/Rotunda/._0000010859.jpg  
  inflating: dataset/Rotunda/0000010681.jpg  
  inflating: __MACOSX/dataset/Rotunda/._0000010681.jpg  
  inflat

In [4]:
data_dir = '/content/dataset/'
batch_size = 32
img_height, img_width = 224, 224

# Training Dataset
train_ds = keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset='training',
    seed=49,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

# Validation Dataset
validation_ds = keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset='validation',
    seed=49,
    image_size=(img_height, img_width),
    batch_size=batch_size
)        

Found 14286 files belonging to 18 classes.
Using 11429 files for training.
Found 14286 files belonging to 18 classes.
Using 2857 files for validation.


# Design and fit the model

In [6]:
# Build a learning rate schedule
def lr_schedule(epoch, lr):
    if epoch < 2:
        return lr
    else:
        # Exponentially decay the learning rate
        return lr*tf.math.exp(-0.2)
lr_sched = keras.callbacks.LearningRateScheduler(lr_schedule)

# Specify optimizer and start learning rate low
opt = keras.optimizers.Adam(learning_rate=1e-4, epsilon=1e-9)

# Apply early stopping
early_stop = keras.callbacks.EarlyStopping(monitor='val_accuracy', 
                                           patience=10, 
                                           restore_best_weights=True)

# Use ResNet 152 for transfer learning
base_model = keras.applications.ResNet152(weights='imagenet', 
                                          include_top=False, 
                                          pooling='avg')

# Update weights after this layer index
layer_idx = 200
for layer in base_model.layers[:layer_idx]:
    # Allow training for all BatchNorm statistics
    if layer.__class__.__name__ != 'BatchNormalization':
        layer.trainable = False

# Specify input dimensions
inputs = keras.Input(shape=(img_height, img_width, 3))
# Preprocess for ResNet 152
preproc = keras.applications.resnet.preprocess_input(inputs)
# Feed preprocessed inputs into ResNet 152
res_net = base_model(preproc)
# Apply a small amount of dropout for regularization
drop = tf.keras.layers.Dropout(1e-3)(res_net)
# Apply softmax layer on output
output = keras.layers.Dense(len(class_names), activation='softmax')(drop)

# Build the model
model = keras.Model(inputs=inputs, outputs=output)
# Compile the model with multinomial classification loss
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, 
            metrics=['accuracy'])
# Fit the model
model.fit(train_ds, 
          validation_data=validation_ds, 
          epochs=40, 
          callbacks=[lr_sched, early_stop]
        )

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet152_weights_tf_dim_ordering_tf_kernels_notop.h5
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40


<tensorflow.python.keras.callbacks.History at 0x7f6e9a4436d0>

In [7]:
# Print the model summary
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
tf.__operators__.getitem (Sl (None, 224, 224, 3)       0         
_________________________________________________________________
tf.nn.bias_add (TFOpLambda)  (None, 224, 224, 3)       0         
_________________________________________________________________
resnet152 (Functional)       (None, 2048)              58370944  
_________________________________________________________________
dropout (Dropout)            (None, 2048)              0         
_________________________________________________________________
dense (Dense)                (None, 18)                36882     
Total params: 58,407,826
Trainable params: 46,379,922
Non-trainable params: 12,027,904
________________________________________

In [8]:
# Print out final model accuracy
loss, acc = model.evaluate(validation_ds, verbose=0)
print(f'Final model accuracy: {acc}')

Final model accuracy: 0.9716485738754272


In terms of preprocessing, I resized the images width and height to $224 x 224$. These are the image dimensions ResNet 152 was designed to take in. Then, I used a preprocessing layer specifically designed for ResNet. This preprocessing layer converts from RGB to BGR (i.e., reverses the order of the channels) and then mean centers the images with respect to the ImageNet scale statistics. I tried adding some data augmentation by introducing small random rotations and random flips into the data, but I found that this tended to reduce my accuracy by a percentage point or two. This may be because there already exist instances in the data viewed from various angles and rotations, as can be seen in the plots displayed above. Adding additional rotations did not seem to help on this specific hold out validation set, however it could be useful for making the model more robust for real-world applications. 

At first, I experimented with adding several of my own custom layers (e.g., additional convolutions, pooling, and batch normalization) on top of Resnet 152. However, I found that I could get better accuracy simply by training the ResNet 152 weights with Adam parameterized by a small learning rate. Importantly, I did not train the entire network, but rather set an index at which the weights were frozen (layer index 200 in terms of the computational graph). This froze approximately 12,000,000 parameters within ResNet 152. I found this index simply through experimentation by trying different values. I also unfroze all BatchNorm layers in case the scale of the ImageNet dataset weights/ activations was significantly different than the scale of the data. I ended up adding a small amount of dropout on top of the ResNet 152 architecture to insert some regularization into the top layers. For callbacks, I defined an exponential decay in the learning rate and applied early stopping with a patience of 10 epochs. The model achieved optimal performance around epoch 14.