# Wide Residual Net for the CIFAR-10 dataset

In this notebook, I will implement a wide residual network, inspired by the paper :
- Sergey Zagoruyko & Nikos Komodakis, 2017, Wide Residual Networks

In this paper, the effects of depth vs width have been studied for residual networks. The conclusion is that increasing the width and limiting the depth of the residual networks presented in *Deep Residual Learning for Image Recognition* can significantly improve performance while reducing training time.

In [1]:
import numpy as np
import keras
from keras.layers import Input, Activation, Conv2D, AvgPool2D
from keras.layers import Dense, Flatten, BatchNormalization, Dropout
from keras.models import Model
from keras.preprocessing.image import ImageDataGenerator
from keras.datasets import cifar10
from keras.callbacks import ModelCheckpoint
import time

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


# Data Preparation

In [2]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

print('X shape :', x_train.shape)
print(len(x_train), 'train samples')
print(len(x_test), 'test samples')

X shape : (50000, 32, 32, 3)
50000 train samples
10000 test samples


In [3]:
x_train_mean = np.mean(x_train, axis=0)
x_train_std = np.std(x_train, axis=0)

x_train = (x_train - x_train_mean)/x_train_std
x_test = (x_test - x_train_mean)/x_train_std

n_y = 10
y_train = keras.utils.to_categorical(y_train, n_y)
y_test = keras.utils.to_categorical(y_test, n_y)

# Model

In [4]:
l2_reg = keras.regularizers.l2(0.0005)

def residual_unit(x_input, filters, stride=1, regularizer=l2_reg, dropout_rate=None):
    x = Conv2D(filters, kernel_size=(3,3), padding='same', use_bias=False, 
               strides=stride, kernel_regularizer=regularizer)(x_input)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    
    if dropout_rate:
        x = Dropout(dropout_rate)(x)
    
    x = Conv2D(filters, kernel_size=(3,3), padding='same', use_bias=False, 
               kernel_regularizer=regularizer)(x)
    x = BatchNormalization()(x)
    
    if stride!=1:
        x_input = AvgPool2D(pool_size=stride)(x_input)
        x_input = Conv2D(filters, kernel_size=(1,1), use_bias=False, 
                         kernel_regularizer=regularizer)(x_input)
        x_input = BatchNormalization()(x_input)
    
    x_res = keras.layers.add([x, x_input])
    x_res = BatchNormalization()(x_res)
    x_out = Activation('relu')(x_res)
    
    return x_out

In [5]:
def WideResidualNet(k=1, n=3, dropout_rate=None, n_map=16, regularizer=l2_reg):
    n_map = k*16
    
    x_input = Input(shape=(32, 32, 3))

    x = Conv2D(n_map, (3,3), padding='same', use_bias=False, 
               kernel_regularizer=regularizer)(x_input)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    
    if dropout_rate:
        x = Dropout(dropout_rate)(x)
    
    # Block 1
    for i in range(n):
        x = residual_unit(x, n_map, dropout_rate=dropout_rate)
    
    # Block 2
    x = residual_unit(x, n_map*2, stride=2, regularizer=l2_reg, dropout_rate=dropout_rate)
    for i in range(n-1):
        x = residual_unit(x, n_map*2, regularizer=l2_reg, dropout_rate=dropout_rate)
    
    # Block 3
    x = residual_unit(x, n_map*4, stride=2, regularizer=l2_reg, dropout_rate=dropout_rate)
    for i in range(n-1):
        x = residual_unit(x, n_map*4, regularizer=l2_reg, dropout_rate=dropout_rate)

    x = AvgPool2D((8,8))(x)
    x = Flatten()(x)
    output = Dense(10, activation='softmax')(x)

    return Model(x_input, output)

In [6]:
model = WideResidualNet(k=2, n=2, dropout_rate=0.3)
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 32, 32, 3)    0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 32, 32, 32)   864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 32, 32, 32)   128         conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 32, 32, 32)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
dropout_1 

# Model training

In [7]:
# Set optimizer to train the model
optimizer = keras.optimizers.SGD(0.1, momentum=0.9, nesterov=True)
model.compile(optimizer, keras.losses.categorical_crossentropy, ['accuracy'])

# Data augmentation
shift = 4/32
generator = ImageDataGenerator(width_shift_range=shift, height_shift_range=shift, 
                               horizontal_flip=True)

batch_size = 64
n_steps = x_train.shape[0]//batch_size # training steps per epoch


save_path = './Model_trained/WideResNet20_cifar10.h5'
ckeckpoint = ModelCheckpoint(save_path, monitor='val_acc', save_best_only=True, verbose=1)
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=np.sqrt(0.1), patience=5, 
                                              min_delta=0.01, min_lr=1e-6)
early_stopping = keras.callbacks.EarlyStopping(monitor='val_acc', patience=20)

t0 = time.time()
model.fit_generator(generator.flow(x_train, y_train, batch_size=batch_size), steps_per_epoch=n_steps, 
                    epochs=200, validation_data=(x_test, y_test), 
                    callbacks=[ckeckpoint, reduce_lr, early_stopping])
print('Total training time : %.3f s' %(time.time()-t0))

Epoch 1/200

Epoch 00001: val_acc improved from -inf to 0.42600, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 2/200

Epoch 00002: val_acc improved from 0.42600 to 0.65360, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 3/200

Epoch 00003: val_acc did not improve from 0.65360
Epoch 4/200

Epoch 00004: val_acc improved from 0.65360 to 0.65490, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 5/200

Epoch 00005: val_acc did not improve from 0.65490
Epoch 6/200

Epoch 00006: val_acc did not improve from 0.65490
Epoch 7/200

Epoch 00007: val_acc improved from 0.65490 to 0.69220, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 8/200

Epoch 00008: val_acc did not improve from 0.69220
Epoch 9/200

Epoch 00009: val_acc improved from 0.69220 to 0.74600, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 10/200

Epoch 00010: val_acc did not improve from 0.74600
Epoch 11/200

Epoch 00011: val_acc did not improve from 0.74600
Epo


Epoch 00041: val_acc improved from 0.89940 to 0.90250, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 42/200

Epoch 00042: val_acc did not improve from 0.90250
Epoch 43/200

Epoch 00043: val_acc did not improve from 0.90250
Epoch 44/200

Epoch 00044: val_acc did not improve from 0.90250
Epoch 45/200

Epoch 00045: val_acc improved from 0.90250 to 0.90650, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 46/200

Epoch 00046: val_acc did not improve from 0.90650
Epoch 47/200

Epoch 00047: val_acc did not improve from 0.90650
Epoch 48/200

Epoch 00048: val_acc did not improve from 0.90650
Epoch 49/200

Epoch 00049: val_acc did not improve from 0.90650
Epoch 50/200

Epoch 00050: val_acc did not improve from 0.90650
Epoch 51/200

Epoch 00051: val_acc did not improve from 0.90650
Epoch 52/200

Epoch 00052: val_acc did not improve from 0.90650
Epoch 53/200

Epoch 00053: val_acc improved from 0.90650 to 0.91310, saving model to ./Model_trained/WideResNet20_cifar10.h


Epoch 00082: val_acc improved from 0.92140 to 0.92150, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 83/200

Epoch 00083: val_acc did not improve from 0.92150
Epoch 84/200

Epoch 00084: val_acc improved from 0.92150 to 0.92170, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 85/200

Epoch 00085: val_acc did not improve from 0.92170
Epoch 86/200

Epoch 00086: val_acc did not improve from 0.92170
Epoch 87/200

Epoch 00087: val_acc did not improve from 0.92170
Epoch 88/200

Epoch 00088: val_acc did not improve from 0.92170
Epoch 89/200

Epoch 00089: val_acc did not improve from 0.92170
Epoch 90/200

Epoch 00090: val_acc improved from 0.92170 to 0.92180, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 91/200

Epoch 00091: val_acc improved from 0.92180 to 0.92190, saving model to ./Model_trained/WideResNet20_cifar10.h5
Epoch 92/200

Epoch 00092: val_acc did not improve from 0.92190
Epoch 93/200

Epoch 00093: val_acc did not improve from 0.92190
E

# Final evaluation

In [8]:
best_model = keras.models.load_model(save_path)
print('Best model test accuracy :', best_model.evaluate(x_test, y_test, batch_size=64)[1])

Best model test accuracy : 0.922


# Conclusion

Reducing the number of $3*3$ convolutional layers from 18 to 12 and doubling the number of kernels, coupled with the addition of dropout, has proved to be an effective way to improve accuracy (by aproximately $1 \%$). The cost to pay is a significant increase in parameters, but dispite that, the computational time for each trainig steps remained similar.