# Explore Different ConvNets Models 
## Motivation

After learning Convolution Neural Network in the class, we know that Alex Net is widely used today in this area. In lab8 and its demos, we mainly use VGG16 and Xception as the model. Therefore, we think that whether we could use CNNs with different architecture models to work. That is the initial motivation of our project. In our project, we recreate three small ConvNets via reading some published papers. Each network (model) will be trained base on CIFAR-10 dataset. We will calculate the error rate of validation to judge the models' performance.

## Import the data

In [1]:
import numpy as np
from keras.models import Model, Input
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, Dropout, Activation, Average
from keras.utils import to_categorical
from keras.losses import categorical_crossentropy
from keras.callbacks import ModelCheckpoint, TensorBoard
from keras.optimizers import Adam
from keras.datasets import cifar10

Using TensorFlow backend.


Here the dataset is imported. Both train and test image data should be normalized.

In [2]:
(x_train, y_train),(x_test, y_test) = cifar10.load_data()
x_train = x_train / 255.
x_test = x_test / 255.
y_train = to_categorical(y_train, num_classes=10)

CIFAR-10 consists of 60000 32x32 RGB images from 10 classes. 50000 images are used for training set and the other 10000 for testing set. Now we can varify the shape of x_train, y_train, x_test and y_test.

In [3]:
print('x_train shape: {} | y_train shape: {}\nx_test shape : {} | y_test shape : {}'
      .format(x_train.shape, y_train.shape, x_test.shape, y_test.shape))

x_train shape: (50000, 32, 32, 3) | y_train shape: (50000, 10)
x_test shape : (10000, 32, 32, 3) | y_test shape : (10000, 1)


We need to deifne a single model input because we are going to use the same input for our project. 

In [4]:
input_shape = x_train[0,:,:,:].shape
model_input = Input(shape=input_shape)
print(input_shape)

(32, 32, 3)


## Model 1: Strided-CNN-C

The first model is Strided-CNN-C \[[Springenberg et al., 2015, Striving for Simplicity: The All Convolutional Net](https://arxiv.org/abs/1412.6806)\]. This model in which max-pooling is removed and the stride of the convolution layers preceding the max-pool layers is increased by 1.

The last convolutional layer `Conv2D(10, (1, 1))` outputs 10 feature maps corresponding to ten output classes. Then the `GlobalAveragePooling2D()` layer computes spatial average of these 10 feature maps, which means that its output is just a vector with a lenght 10. After that, a softmax activation is applied to that vector.

In [5]:
def strided_cnn(model_input):
    
    x = Conv2D(96, kernel_size=(3, 3), activation='relu', padding = 'same')(model_input)
    x = Conv2D(96, (3, 3), activation='relu', padding = 'same', strides = 2)(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same', strides = 2)(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(192, (1, 1), activation='relu')(x)
    x = Conv2D(10, (1, 1))(x)
    x = GlobalAveragePooling2D()(x)
    x = Activation(activation='softmax')(x)
        
    model = Model(model_input, x, name='strided_cnn')
    
    return model

In [6]:
strided_cnn_model = strided_cnn(model_input)

We are going to list this model's layers.

In [7]:
strided_cnn_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 96)        2688      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 96)        83040     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 192)       166080    
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 192)         331968    
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 8, 8, 192)         331968    
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 8, 8, 192)         37056     
__________

We are using epochs with a batch size of 32 (1250 steps per epoch) to get to some local minima. Randomly chose 1/5 of the training dataset for validation. We use Adam as the optimizer. Moreover, we defined that all the epoch results should be stored in 'weights' file.

In [9]:
def compile_and_train(model, num_epochs): 
    
    model.compile(loss=categorical_crossentropy, optimizer=Adam(), metrics=['acc']) 
    filepath = 'weights/' + model.name + '.{epoch:02d}-{loss:.2f}.hdf5'
    checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=0, save_weights_only=True,
                                                 save_best_only=True, mode='auto', period=1)
    tensor_board = TensorBoard(log_dir='logs/', histogram_freq=0, batch_size=32)
    history = model.fit(x=x_train, y=y_train, batch_size=32, 
                     epochs=num_epochs, verbose=1, callbacks=[checkpoint, tensor_board], validation_split=0.2)
    return history

Since there are 50,000 images to be trained and 10,000 to be test, we will use Tesla K80 GPU. To save time, we set the number of epochs as 20.

In [10]:
_ = compile_and_train(strided_cnn_model, num_epochs=20)

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [11]:
def evaluate_error(model):
    pred = model.predict(x_test, batch_size = 32)
    pred = np.argmax(pred, axis=1)
    pred = np.expand_dims(pred, axis=1) # make same shape as y_test
    error = np.sum(np.not_equal(pred, y_test)) / y_test.shape[0]    
    return error

To evaluate this model, we are gonna calculate the error rate on test set.

In [12]:
evaluate_error(strided_cnn_model)

0.25629999999999997

## Model 2: ConvPool-CNN-C

The second model that we are going to use is ConvPool-CNN-C \[[Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units](https://arxiv.org/pdf/1603.05201)\]. This model composed of convolution and pooling followed by ReLU without fully connected layers. 

This model is pretty straightforward since It has clearly outlined network architecture only with convolution, pooling, and ReLU. It features a common pattern where several convolutional layers are followed by a pooling layer.  Instead of using several fully-connected layers, a global average pooling layer is used as the final layer. 


In [13]:
def conv_pool_cnn(model_input):
    
    x = Conv2D(96, kernel_size=(3, 3), activation='relu', padding = 'same')(model_input)
    x = Conv2D(96, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(96, (3, 3), activation='relu', padding = 'same')(x)
    x = MaxPooling2D(pool_size=(3, 3), strides = 2)(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same')(x)
    x = MaxPooling2D(pool_size=(3, 3), strides = 2)(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(192, (1, 1), activation='relu')(x)
    x = Conv2D(10, (1, 1))(x)
    x = GlobalAveragePooling2D()(x)
    x = Activation(activation='softmax')(x)
    
    model = Model(model_input, x, name='conv_pool_cnn')
    
    return model

In [14]:
conv_pool_cnn_model = conv_pool_cnn(model_input)

Here we list the layers of this model.

In [15]:
conv_pool_cnn_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 32, 32, 96)        2688      
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 32, 32, 96)        83040     
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 32, 32, 96)        83040     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 15, 15, 96)        0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 15, 15, 192)       166080    
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 15, 15, 192)       331968    
__________

We are using K80 GPU to train data again.

In [16]:
_ = compile_and_train(conv_pool_cnn_model, num_epochs=20)

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [17]:
evaluate_error(conv_pool_cnn_model)

0.22070000000000001

## Model 3: ALL-CNN-C

Next, we use ALL-CNN-C which also comes from the paper \[[Springenberg et al., 2015, Striving for Simplicity: The All Convolutional Net](https://arxiv.org/abs/1412.6806)\]. This model is similar to the previous one. The only difference is that convolutional layers with a stride of 2 are used in place of max pooling layers. 

In [18]:
def all_cnn(model_input):
    
    x = Conv2D(96, kernel_size=(3, 3), activation='relu', padding = 'same')(model_input)
    x = Conv2D(96, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(96, (3, 3), activation='relu', padding = 'same', strides = 2)(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same', strides = 2)(x)
    x = Conv2D(192, (3, 3), activation='relu', padding = 'same')(x)
    x = Conv2D(192, (1, 1), activation='relu')(x)
    x = Conv2D(10, (1, 1))(x)
    x = GlobalAveragePooling2D()(x)
    x = Activation(activation='softmax')(x)
        
    model = Model(model_input, x, name='all_cnn')
    
    return model

In [19]:
all_cnn_model = all_cnn(model_input)

The same, we are going to list the layers of this model.

In [20]:
all_cnn_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 32, 32, 96)        2688      
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 32, 32, 96)        83040     
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 16, 16, 96)        83040     
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 16, 16, 192)       166080    
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 16, 16, 192)       331968    
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 8, 8, 192)         331968    
__________

K80 GPU used here, too.

In [21]:
_ = compile_and_train(all_cnn_model, num_epochs=20)

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [22]:
evaluate_error(all_cnn_model)

0.26079999999999998

## Model 4: Network In Network CNN

The last CNN we will use is Network in Network CNN \[[Lin et al., 2013, Network In Network](https://arxiv.org/abs/1312.4400)\]. It uses three multilayer pereceptions with 'relu' activation. The overall structure of NIN is stacking multiple mlpconv layers. Instead of adopting the traditional fully connected layers for classification in CNN, NIN directly output the spatial average of the feature maps from the last mlpconv layer as the confidence of categories via a global average pooling layer, and then the resulting vector is fed into the softmax layer.

In [23]:
def nin_cnn(model_input):
    
    # multilayer perception 1
    x = Conv2D(32, (5, 5), activation='relu',padding='valid')(model_input)
    x = Conv2D(32, (1, 1), activation='relu')(x)
    x = Conv2D(32, (1, 1), activation='relu')(x)
    x = MaxPooling2D((2,2))(x)
    x = Dropout(0.5)(x)
    
    # multilayer perception 2
    x = Conv2D(64, (3, 3), activation='relu',padding='valid')(x)
    x = Conv2D(64, (1, 1), activation='relu')(x)
    x = Conv2D(64, (1, 1), activation='relu')(x)
    x = MaxPooling2D((2,2))(x)
    x = Dropout(0.5)(x)
    
    #multilayer perception 3
    x = Conv2D(128, (3, 3), activation='relu',padding='valid')(x)
    x = Conv2D(32, (1, 1), activation='relu')(x)
    x = Conv2D(10, (1, 1))(x)
    
    x = GlobalAveragePooling2D()(x)
    x = Activation(activation='softmax')(x)
    
    model = Model(model_input, x, name='nin_cnn')
    
    return model

In [24]:
nin_cnn_model = nin_cnn(model_input)

Let us output the layers.

In [25]:
nin_cnn_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_26 (Conv2D)           (None, 28, 28, 32)        2432      
_________________________________________________________________
conv2d_27 (Conv2D)           (None, 28, 28, 32)        1056      
_________________________________________________________________
conv2d_28 (Conv2D)           (None, 28, 28, 32)        1056      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_29 (Conv2D)           (None, 12, 12, 64)        18496     
__________

We can see that this model contains less trainable parameters than previous three models from the model summary above. therefore, we come out a conclusion that this model is smaller so that the training time should be less than previous two.

In [26]:
_ = compile_and_train(nin_cnn_model, num_epochs=20)

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


The error rate should be a bit higher than the other three since it is simpler.

In [27]:
evaluate_error(nin_cnn_model)

0.3543

## Improvement

We find that the error rate of these four models are higher than our expectation. Therefore, we come out an idea to combine them together to see whether the combination of them could reduce the error rate. It is called 'ensembling' in statistics. Here, we load the weights with each best one that saved in the 'weights' file. 

In [29]:
strided_cnn_model = strided_cnn(model_input)
conv_pool_cnn_model = conv_pool_cnn(model_input)
all_cnn_model = all_cnn(model_input)
nin_cnn_model = nin_cnn(model_input)

strided_cnn_model.load_weights('weights/strided_cnn.19-0.08.hdf5')
conv_pool_cnn_model.load_weights('weights/conv_pool_cnn.29-0.10.hdf5')
all_cnn_model.load_weights('weights/all_cnn.30-0.08.hdf5')
nin_cnn_model.load_weights('weights/nin_cnn.30-0.93.hdf5')

models = [strided_cnn_model, conv_pool_cnn_model, all_cnn_model, nin_cnn_model]

This model uses the same input layer which is shared between all previous models. In the top layer, the combine model computes the average of three models' outputs by using `Average()` merge layer.

In [30]:
def combine(models, model_input):
    
    outputs = [model.outputs[0] for model in models]
    y = Average()(outputs)
    
    model = Model(model_input, y, name='combine')
    
    return model

In [31]:
combine_model = combine(models, model_input)

Now we need to output the list of combination layers.

In [32]:
combine_model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_1 (InputLayer)             (None, 32, 32, 3)     0                                            
____________________________________________________________________________________________________
conv2d_94 (Conv2D)               (None, 28, 28, 32)    2432        input_1[0][0]                    
____________________________________________________________________________________________________
conv2d_95 (Conv2D)               (None, 28, 28, 32)    1056        conv2d_94[0][0]                  
____________________________________________________________________________________________________
conv2d_76 (Conv2D)               (None, 32, 32, 96)    2688        input_1[0][0]                    
___________________________________________________________________________________________

Let us print out the error rate to see whether the combination of models will work or not. 

In [33]:
evaluate_error(combine_model)

0.1915

## Conclusion

The models we mensioned above cannot provide better accuracy though we combine them together. But it is a good try to develop CNNs with different theorems from each paper. There should be more complicated models that could get perfectly performance. We will keep learning the CNN and find the better models. Besides, We only simply stacked the models but there should be a better way to combine them which could get lower error rate. 