# CSE5ML Lab 5A: Build a Convolutional Neural Network


## Part 1: Create a simple CNN

When we build a Convolutional Neural Network model, we would need to have convolutional layers, max pooling and dense layers. To enhance the performance, we would also include dropouts. Bellow is a Simple CNN model for the CIFAR-10 Dataset.

Dropout is a regularization method proposed by Srivastava, et al at 2014. It is a  simple yet effective way to Prevent Neural Networks from Overfitting. Dropout randomly selectes percentage of neurons and ignore them during training. This means that their contribution to the activation is temporally removed on the forward pass, and any weight updates are not applied to the neuron on the backward pass.

### Load dataset and Preprocess data

In [5]:
# packages
import numpy as np
import tensorflow as tf
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils

# load data
(Inputs, Labels), (Test_Data, Test_Label) = cifar10.load_data() # notice the first line of importing packages
#Check the shape of the data


# normalize inputs from 0-255 to 0.0-1.0
# Neural networks process inputs using small weight values, and inputs with large integer values can disrupt or slow down the learning process. As such it is good practice to normalize the pixel values so that each pixel value has a value between 0 and 1.
Inputs = Inputs.astype('float32')
Test_Data = Test_Data.astype('float32')
Inputs = Inputs / 255.0
Test_Data = Test_Data / 255.0

# Encode the outputs with one hot coding
Labels = np_utils.to_categorical(Labels) #Converts a class vector (integers) to binary class matrix.
Test_Label = np_utils.to_categorical(Test_Label)
num_classes = Test_Label.shape[1]

print(Inputs.shape)
print(Labels.shape)
print(Test_Data.shape)
print(Test_Label.shape)

(50000, 32, 32, 3)
(50000, 10)
(10000, 32, 32, 3)
(10000, 10)


### Build a convolutional neural networks model

More information about parameters settings in Conv2D can be found here: https://keras.io/api/layers/convolution_layers/convolution2d/

In [11]:
# Build the model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), padding='same', activation='relu', kernel_constraint=maxnorm(3)))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_constraint=maxnorm(3)))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(512, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

### Compile the model
define loss function, optimizer and addtional evaluation metrics

#### Some addtional information about optimizers

To undrestand the concept of optimizers one usually begins with the most basic and popular one, Gradient Descent (used in the bellow example). The important part of the Gradient Descent algorithm (and optimizers in general) is to understand gradients, which indicates: what a small change in a a given parameter (here weight) would do to the loss function. Gradients are a measure of change. They are the connection between the loss function and the weights. In a simple language, they tell us what specific operation should be performed to the weights (ezamples: add 2.1, subtract .07, etc.), for the purpose of reducing the loss (which will increase the accuracy).

In [12]:
# Define optimizer
lrate = 0.002
epochs = 5
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.7, decay=decay, nesterov=False) #Stochastic gradient descent optimizer

# Compile model
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

### Plot the model
it can help us understand model structure, the shape of output and the number of parameters in a model

In [13]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_8 (Conv2D)           (None, 32, 32, 32)        896       
                                                                 
 conv2d_9 (Conv2D)           (None, 32, 32, 32)        9248      
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 flatten_2 (Flatten)         (None, 8192)              0         
                                                                 
 dense_5 (Dense)             (None, 512)               4194816   
                                                                 
 dropout_2 (Dropout)         (None, 512)               0         
                                                      

### Train the mode

In [14]:
tf.random.set_seed(1)
np.random.seed(1)

epochs = 5
# Fit the model
model.fit(Inputs, Labels, validation_data=(Test_Data, Test_Label), epochs=epochs, batch_size=60, verbose=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x206035bc4c0>

### Evaluate the trained model with testing dataset

In [15]:
# Final evaluation of the model
scores = model.evaluate(Test_Data, Test_Label, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 47.51%


### Deeper CNN network and optimization
We can add more layers to have a more complex model. Bellow is an example of a deeper CNN model for the CIFAR-10 Dataset.


In [6]:
# Pakages
import numpy as np
import tensorflow as tf
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils

tf.random.set_seed(1)
np.random.seed(1)

# load data
(Inputs, Labels), (Test_Data, Test_Label) = cifar10.load_data()
# normalize inputs (so all pixel values are transformed from [0,255] to [0,0-1.0]
Inputs = Inputs.astype('float32')
Test_Data = Test_Data.astype('float32')
Inputs = Inputs / 255.0
Test_Data = Test_Data / 255.0
# Encode outputs
Labels = np_utils.to_categorical(Labels)
Test_Label = np_utils.to_categorical(Test_Label)
num_classes = Test_Label.shape[1]

#Print out the shapes
print(Inputs.shape)
print(Labels.shape)
print(Test_Data.shape)
print(Test_Label.shape)

# Build a deeper CNN model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), activation='relu', padding='same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D())
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D())
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(1024, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dense(512, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
epochs = 20
lrate = 0.001
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.summary()
# Fit the model
model.fit(Inputs, Labels, validation_data=(Test_Data, Test_Label), epochs=epochs, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(Test_Data, Test_Label, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

(50000, 32, 32, 3)
(50000, 10)
(10000, 32, 32, 3)
(10000, 10)
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 32, 32, 32)        896       
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        9248      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 16, 16, 32)       0         
 )                                                               
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 64)        18496     
                                                                 
 conv2d_3 (Conv2D)           (None, 16, 16, 64)        36928     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 8, 8, 64)         0    

  super().__init__(name, **kwargs)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Accuracy: 68.34%


## Part 2: Some popular network network structures for image classification and load 

VGGNet, ResNet, Inception, and Xception are four types popular neural networks that are proven to be effective in image classification tasks. In keras, there are five popular model structures, namely VGG16, VGG19, ResNet50, Inception V3 and Xception; you can train these models from scratch with newly initialized weights, or you can load pretrained model, based on the ImageNet dataset (another popular benchmark dataset in image classification, which is even larger than the CIFAR10 dataset, with a size of 224*224 for each image). Sometimes we find the pretrained model very helpful when the input data is similar and we do not want to use a lot of time to retrain the model from scratch.

If you are interested in more information about these models and implementation with Keras, you can check this link: https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/

Here we use VGG16 as an example. The same steps can be applied in other models.

The below attached is the model achitecture for the original VGG16: 

    # Block 1
    x = layers.Conv2D(64, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block1_conv1')(img_input)
    x = layers.Conv2D(64, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block1_conv2')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

    # Block 2
    x = layers.Conv2D(128, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block2_conv1')(x)
    x = layers.Conv2D(128, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block2_conv2')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

    # Block 3
    x = layers.Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv1')(x)
    x = layers.Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv2')(x)
    x = layers.Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv3')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

    # Block 4
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv1')(x)
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv2')(x)
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv3')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

    # Block 5
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv1')(x)
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv2')(x)
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv3')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)


    # Classification block
    x = layers.Flatten(name='flatten')(x)
    x = layers.Dense(4096, activation='relu', name='fc1')(x)
    x = layers.Dense(4096, activation='relu', name='fc2')(x)
    x = layers.Dense(classes, activation='softmax', name='predictions')(x)

### Apply a pre-defined network structure for a task

### Load dataset and Preprocess data

In [17]:
# packages
import numpy as np
import tensorflow as tf
from keras.datasets import cifar10
from keras.utils import np_utils

# load data
(Inputs, Labels), (Test_Data, Test_Label) = cifar10.load_data() # notice the first line of importing packages

# normalize inputs from 0-255 to 0.0-1.0
# Neural networks process inputs using small weight values, and inputs with large integer values can disrupt or slow down the learning process. As such it is good practice to normalize the pixel values so that each pixel value has a value between 0 and 1.
Inputs = Inputs.astype('float32')
Test_Data = Test_Data.astype('float32')
Inputs = Inputs / 255.0
Test_Data = Test_Data / 255.0

# Encode the outputs with one hot coding
Labels = np_utils.to_categorical(Labels) #Converts a class vector (integers) to binary class matrix.
Test_Label = np_utils.to_categorical(Test_Label)
num_classes = Test_Label.shape[1]

### Load Model
Here we give two ways to load the model structure and train the model on CIFAR10 dataset. First, if you do not care about how much time or resources to use on trainig, you can train the model from scratch with newly intialized weights. Another way is to train the model based some pretrained weights on similar datasets (because it is found that the first few layers trained for different dataset basically do the similar things, and this's the reason we can consider adopt the weights trained from other dataset, and further fine-tune the weights on our dataset). This can accelate training with fewer epochs/iterations. 

Note: when apply the below codes, you may find your computer resources can not really support you running them, and here I just show you how you can load a pretrained model weights which has been trained on CIFAR10 already, and you can see how this complex neural network structure can improve classification performance.

In [18]:
# train model from scratch

# load vgg model
from keras.applications.vgg16 import VGG16

# load the model
model = VGG16(weights=None, include_top=False, input_shape=(32, 32, 3))
model.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 block1_conv1 (Conv2D)       (None, 32, 32, 64)        1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 32, 32, 64)        36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 16, 16, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 16, 16, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 16, 16, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 8, 8, 128)         0     

In [20]:
# load pretrained model and finetune
# Note that we drop the 3 fully-connected layers at the top of the network which mainly act like classifiers to classify the extracted features from the convolutional layers, because we have a new dataset and we want to train a new classifier.

# load vgg model
from keras.applications.vgg16 import VGG16

# load the model
model = VGG16(weights="imagenet", include_top=False, input_shape=(32, 32, 3))
model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 block1_conv1 (Conv2D)       (None, 32, 32, 64)        1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 32, 32, 64)        36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 16, 16, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 16, 16, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 16, 16, 128)      

In [23]:
# load pretrained model which has been finetuned on CIFAR10, not that this model has only used the first 3 blocks in the VGG16 model
# load vgg model
from keras.applications.vgg16 import VGG16
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
#from keras.engine import Model
from tensorflow.keras.models import Model

# load the model
model = VGG16(weights=None, include_top=False, input_shape=(32, 32, 3))
# Extract the last layer from third block of vgg16 model
last = model.get_layer('block3_pool').output
# Add classification layers on top of it
x = Flatten()(last)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
pred = Dense(10, activation='softmax')(x)
model = Model(model.input, pred)

# load pretrained weigths
model.load_weights('cifar10-vgg16_model.h5')

# summarize the model
model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 block1_conv1 (Conv2D)       (None, 32, 32, 64)        1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 32, 32, 64)        36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 16, 16, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 16, 16, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 16, 16, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 8, 8, 128)         0   

### Compile the model

In [24]:
# Compile model
model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])

### Train the model
This will take some time because of the large model and big amount of data. Since we already loaded the trained weights with `model.load_weights('cifar10-vgg16_model.h5')`you do not need to train it again using the below code. This is just for your information of how you can train or further fine-tune a model.

In [25]:
tf.random.set_seed(1)
np.random.seed(1)

epochs = 10
# Fit the model
model.fit(Inputs, Labels, validation_data=(Test_Data, Test_Label), epochs=epochs, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2063043bc10>

you can save the model weight to use it next time

In [26]:
model.save_weights('cifar10_vgg16_new_model.h5')

### Evaluate the model with testing dataset

In [27]:
# Final evaluation of the model
scores = model.evaluate(Test_Data, Test_Label, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 78.76%
