# Residual Net for the CIFAR-10 dataset

In this notebook, I will implement a residual network of 20 layers, following the architecture from the paper :
- Kaiming He et al., 2015, Deep Residual Learning for Image Recognition

The main novel from these residual networks is that they add shortcut connection that let the model perform the identity transformation easily, if necessary. This, in theory, allows the residual network to only improve or keep the same performance for each layer added, while other networks see their performance decline after a certain threshold of depth.

In [1]:
import numpy as np
import keras
from keras.layers import Input, Activation, Conv2D, AvgPool2D
from keras.layers import Dense, Flatten, BatchNormalization
from keras.models import Model
from keras.preprocessing.image import ImageDataGenerator
from keras.datasets import cifar10
from keras.callbacks import ModelCheckpoint
import time

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


# Data Preparation

In [2]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

print('X shape :', x_train.shape)
print(len(x_train), 'train samples')
print(len(x_test), 'test samples')

X shape : (50000, 32, 32, 3)
50000 train samples
10000 test samples


In [3]:
x_train_mean = np.mean(x_train, axis=0)
x_train_std = np.std(x_train, axis=0)

x_train = (x_train - x_train_mean)/x_train_std
x_test = (x_test - x_train_mean)/x_train_std

n_y = 10
y_train = keras.utils.to_categorical(y_train, n_y)
y_test = keras.utils.to_categorical(y_test, n_y)

# Model

In [4]:
l2_reg = keras.regularizers.l2(0.0001)

def residual_unit(x_input, filters, stride=1, regularizer=l2_reg):
    x = Conv2D(filters, kernel_size=(3,3), padding='same', use_bias=False, 
               strides=stride, kernel_regularizer=regularizer)(x_input)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(filters, kernel_size=(3,3), padding='same', use_bias=False, 
               kernel_regularizer=regularizer)(x)
    x = BatchNormalization()(x)
    
    if stride!=1:
        x_input = AvgPool2D(pool_size=stride)(x_input)
        x_input = Conv2D(filters, kernel_size=(1,1), use_bias=False, 
                         kernel_regularizer=regularizer)(x_input)
        x_input = BatchNormalization()(x_input)
    
    x_res = keras.layers.add([x, x_input])
    
    x_out = Activation('relu')(x_res)
    
    return x_out

In [5]:
x_input = Input(shape=(32, 32, 3))

x = Conv2D(16, (3,3), padding='same', use_bias=False, 
           kernel_regularizer=l2_reg)(x_input)
x = BatchNormalization()(x)
x = Activation('relu')(x)

x = residual_unit(x, 16)
x = residual_unit(x, 16)
x = residual_unit(x, 16)

x = residual_unit(x, 32, stride=2)
x = residual_unit(x, 32)
x = residual_unit(x, 32)

x = residual_unit(x, 64, stride=2)
x = residual_unit(x, 64)
x = residual_unit(x, 64)

x = AvgPool2D((8,8))(x)
x = Flatten()(x)
output = Dense(10, activation='softmax')(x)

model = Model(x_input, output)

model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 32, 32, 3)    0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 32, 32, 16)   432         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 32, 32, 16)   64          conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 32, 32, 16)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_2 (

# Model training

In [6]:
# Set optimizer to train the model
optimizer = keras.optimizers.SGD(0.1, momentum=0.9, nesterov=True)
model.compile(optimizer, keras.losses.categorical_crossentropy, ['accuracy'])

# Data augmentation
shift = 4/32
generator = ImageDataGenerator(width_shift_range=shift, height_shift_range=shift, 
                               horizontal_flip=True)

batch_size = 64
n_steps = x_train.shape[0]//batch_size # training steps per epoch


save_path = './Model_trained/ResNet20_cifar10.h5'
ckeckpoint = ModelCheckpoint(save_path, monitor='val_acc', save_best_only=True, verbose=1)
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=np.sqrt(0.1), patience=5, 
                                              min_delta=0.01, min_lr=1e-6)
early_stopping = keras.callbacks.EarlyStopping(monitor='val_acc', patience=20)

t0 = time.time()
model.fit_generator(generator.flow(x_train, y_train, batch_size=batch_size), steps_per_epoch=n_steps, 
                    epochs=200, validation_data=(x_test, y_test), 
                    callbacks=[ckeckpoint, reduce_lr, early_stopping])
print('Total training time : %.3f s' %(time.time()-t0))

Epoch 1/200

Epoch 00001: val_acc improved from -inf to 0.31990, saving model to ./Model_trained/ResNet20_cifar10.h5
Epoch 2/200

Epoch 00002: val_acc improved from 0.31990 to 0.35010, saving model to ./Model_trained/ResNet20_cifar10.h5
Epoch 3/200

Epoch 00003: val_acc improved from 0.35010 to 0.61700, saving model to ./Model_trained/ResNet20_cifar10.h5
Epoch 4/200

Epoch 00004: val_acc improved from 0.61700 to 0.68080, saving model to ./Model_trained/ResNet20_cifar10.h5
Epoch 5/200

Epoch 00005: val_acc did not improve from 0.68080
Epoch 6/200

Epoch 00006: val_acc improved from 0.68080 to 0.73520, saving model to ./Model_trained/ResNet20_cifar10.h5
Epoch 7/200

Epoch 00007: val_acc improved from 0.73520 to 0.75530, saving model to ./Model_trained/ResNet20_cifar10.h5
Epoch 8/200

Epoch 00008: val_acc improved from 0.75530 to 0.79580, saving model to ./Model_trained/ResNet20_cifar10.h5
Epoch 9/200

Epoch 00009: val_acc did not improve from 0.79580
Epoch 10/200

Epoch 00010: val_acc di


Epoch 00041: val_acc did not improve from 0.89930
Epoch 42/200

Epoch 00042: val_acc did not improve from 0.89930
Epoch 43/200

Epoch 00043: val_acc did not improve from 0.89930
Epoch 44/200

Epoch 00044: val_acc did not improve from 0.89930
Epoch 45/200

Epoch 00045: val_acc did not improve from 0.89930
Epoch 46/200

Epoch 00046: val_acc did not improve from 0.89930
Epoch 47/200

Epoch 00047: val_acc did not improve from 0.89930
Epoch 48/200

Epoch 00048: val_acc did not improve from 0.89930
Epoch 49/200

Epoch 00049: val_acc did not improve from 0.89930
Epoch 50/200

Epoch 00050: val_acc did not improve from 0.89930
Epoch 51/200

Epoch 00051: val_acc did not improve from 0.89930
Epoch 52/200

Epoch 00052: val_acc did not improve from 0.89930
Epoch 53/200

Epoch 00053: val_acc did not improve from 0.89930
Epoch 54/200

Epoch 00054: val_acc did not improve from 0.89930
Epoch 55/200

Epoch 00055: val_acc did not improve from 0.89930
Epoch 56/200

Epoch 00056: val_acc did not improve fr


Epoch 00083: val_acc did not improve from 0.91000
Epoch 84/200

Epoch 00084: val_acc did not improve from 0.91000
Epoch 85/200

Epoch 00085: val_acc did not improve from 0.91000
Epoch 86/200

Epoch 00086: val_acc did not improve from 0.91000
Epoch 87/200

Epoch 00087: val_acc did not improve from 0.91000
Epoch 88/200

Epoch 00088: val_acc did not improve from 0.91000
Total training time : 2353.824 s


# Final evaluation

In [8]:
best_model = keras.models.load_model(save_path)
print('Best model test accuracy :', best_model.evaluate(x_test, y_test, batch_size=64)[1])

Best model test accuracy : 0.91


# Conclusion

This Residual network of 20 layers is, despite his depth, quite simple in his architecture. The shortcut connection, the vital part of the residual block, let us train a deep (and thin) network, able to score $91 \%$ accuracy on the CIFAR-10 dataset. This result is in line with the result from the original paper *Deep Residual Learning for Image Recognition*, their 20 layers model reach a very close $91.25 \%$ accuracy.