<a href="https://colab.research.google.com/github/WesleyAldridge/HW2_MachineLearning/blob/master/HW2_MachineLearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## HW2 - Convolutional Neural Network For CIFAR10 Data Set

####Instructions from professor:

"The goal of this homework is to create a convolutional neural network for the CIFAR10 data set.

You should not use any pretrained convnets that come with Keras. You have to create and train your own convnets with Keras from scratch.

Make sure that the data is divided into:

- training set (80%)
- validation set (20%)
- test set.

Use the training set to train your neural networks. Evaluate their performance on the validation data set.

After trying several different architectures, choose the one that performs best on the validation set. Try at least four different architectures by using data augmentation, using dropout, varying the number of layers, the number of filters, etc.

Train this final architecture on the data from the training set and validation set and evaluate its performance on the test set.

Reevaluate your best architecture using k-fold validation with k=5, that is, the size of the validation fold is 20%. Does the accuracy/loss obtain by k-fold validation differ from the accuracy/loss obtain by simple hold-out validation.

## Loading the CIFAR10 data set

**train_images.shape = (50000, 32, 32, 3)**

**test_images.shape  = (10000, 32, 32, 3) **

50,000 train_images of shape 32x32 pixels, made up of 3 channels (R,G,B)

10,000 test_images of shape 32x32 pixels, made up of 3 channels (R,G,B)

In [2]:
from keras import layers
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import RMSprop
from keras.datasets import cifar10
from keras.utils import to_categorical
import numpy as np
import matplotlib.pyplot as mpl
from math import ceil
from keras.preprocessing.image import ImageDataGenerator

(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)

train_images.shape, test_images[0][0][0]#.shape

Using TensorFlow backend.


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


((50000, 32, 32, 3),
 array([0.61960787, 0.4392157 , 0.19215687], dtype=float32))

## Making Basic CNN, Architecture 1

Example CNN from class notes:

Input Images > Conv2d (ReLU) > MaxPool > Conv2d (ReLU) > MaxPool > Fully Connected > Fully Connected (output) 

This is two convolution modules (convolution(ReLU) + pooling) for feature extraction, and two fully connected layers for classification.

In [0]:
CNN_basic = Sequential()

# "To start, the CNN receives an input feature map: a 3-dimensional matrix, where
# the size of the first two dimensions corresponds to the length and width of the
# images in pixels, and the size of the third dimension is 3 (corresponding to the
# 3 channels of a color image: red, green, and blue)." - class notes

# "A convolution extracts tiles of the input feature map, and applies filters to them
# to compute new features, producing an output feature map, or convolved feature
# (which may have a different size and depth than the input feature map)
# Convolutions are defined by two parameters:
# Size of the tiles that are extracted (typically 3x3 or 5x5 pixels).
# The depth of the output feature map, which corresponds to the number of filters
# that are applied." - class notes

# "After each convolution operation, the CNN applies a Rectified Linear Unit (ReLU)
# transformation to the convolved feature, in order to introduce nonlinearity into
# the model" - class notes
CNN_basic.add(layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)))

# "After ReLU comes a pooling step, in which the CNN downsamples the convolved feature
# (to save on processing time), reducing the number of dimensions of the feature map,
# while still preserving the most critical feature information." - class notes

# "size of the max-pooling filter is typically 2x2 pixels" - from class notes
# the class notes talk about using a stride of 2, so I will use that here as well:
CNN_basic.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2))

CNN_basic.add(layers.Conv2D(64, (3, 3), activation='relu'))
CNN_basic.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2) )

CNN_basic.add(layers.Flatten())
CNN_basic.add(layers.Dense(512, activation='relu'))
# "Typically, the final fully-connected layer contains a softmax activation function,
# which outputs a probability value from 0 to 1 for each of the classification labels
# the model is trying to predict" - class notes
CNN_basic.add(layers.Dense(10, activation='softmax')) 
#CNN.summary()

### Evaluating on validation set:

In [54]:
rmsprop = RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)  # default values: RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
CNN_basic.compile(optimizer=rmsprop, loss='categorical_crossentropy', metrics=['accuracy'])

CNN_basic.fit(x=train_images, y=train_labels, batch_size=32, epochs=20, verbose=1, validation_split=0.2, shuffle=True)

Train on 40000 samples, validate on 10000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f753f334630>

There is overfitting: the accuracy is much higher (97%) on the training data than on the validation data (69%).

## Network with Dropout added, Architecture 2

In [0]:
CNN_dropout = Sequential()

CNN_dropout.add(layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)))

CNN_dropout.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2))
CNN_dropout.add(Dropout(1 - .9))

CNN_dropout.add(layers.Conv2D(64, (3, 3), activation='relu'))
CNN_dropout.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2) )
CNN_dropout.add(Dropout(1 - .9))

CNN_dropout.add(layers.Flatten())
CNN_dropout.add(layers.Dense(512, activation='relu'))
CNN_dropout.add(Dropout(1 - .9))

CNN_dropout.add(layers.Dense(10, activation='softmax')) 
#CNN.summary()

In [56]:
rmsprop = RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)  # default values: RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
CNN_dropout.compile(optimizer=rmsprop, loss='categorical_crossentropy', metrics=['accuracy'])

CNN_dropout.fit(x=train_images, y=train_labels, batch_size=32, epochs=20, verbose=2, validation_split=0.2, shuffle=True)

Train on 40000 samples, validate on 10000 samples
Epoch 1/50
 - 12s - loss: 1.4741 - acc: 0.4752 - val_loss: 1.1561 - val_acc: 0.5927
Epoch 2/50
 - 11s - loss: 1.0974 - acc: 0.6166 - val_loss: 1.0064 - val_acc: 0.6481
Epoch 3/50
 - 11s - loss: 0.9479 - acc: 0.6729 - val_loss: 0.9147 - val_acc: 0.6875
Epoch 4/50
 - 11s - loss: 0.8412 - acc: 0.7107 - val_loss: 0.9151 - val_acc: 0.6948
Epoch 5/50
 - 11s - loss: 0.7555 - acc: 0.7396 - val_loss: 0.9307 - val_acc: 0.7013
Epoch 6/50
 - 11s - loss: 0.6920 - acc: 0.7681 - val_loss: 0.9372 - val_acc: 0.6861
Epoch 7/50
 - 11s - loss: 0.6390 - acc: 0.7849 - val_loss: 0.9116 - val_acc: 0.7148
Epoch 8/50
 - 11s - loss: 0.5982 - acc: 0.7997 - val_loss: 0.9397 - val_acc: 0.7080
Epoch 9/50
 - 11s - loss: 0.5690 - acc: 0.8103 - val_loss: 0.8930 - val_acc: 0.7197
Epoch 10/50
 - 11s - loss: 0.5454 - acc: 0.8187 - val_loss: 0.9343 - val_acc: 0.7060
Epoch 11/50
 - 11s - loss: 0.5315 - acc: 0.8261 - val_loss: 0.9405 - val_acc: 0.7181
Epoch 12/50
 - 11s - los

<keras.callbacks.History at 0x7f753f334b38>

There is still overfitting. Acc (83%) is much higher than val_acc (66%). Acc worsened compared to the first architecture, and so did the val_acc. So maybe dropout isn't the solution?

## Network with additional layers and more filters on each layer, Architecture 3

In [0]:
CNN_expanded = Sequential()

CNN_expanded.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)))
CNN_expanded.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2))
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Conv2D(128, (3, 3), activation='relu'))
CNN_expanded.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2) )
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Conv2D(256, (3, 3), activation='relu'))
CNN_expanded.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2) )
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Flatten())
CNN_expanded.add(layers.Dense(512, activation='relu'))
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Dense(256, activation='relu'))
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Dense(128, activation='relu'))

CNN_expanded.add(layers.Dense(10, activation='softmax')) 
#CNN.summary()

In [83]:
rmsprop = RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)  # default values: RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
CNN_expanded.compile(optimizer=rmsprop, loss='categorical_crossentropy', metrics=['accuracy'])

CNN_expanded.fit(x=train_images, y=train_labels, batch_size=32, epochs=20, verbose=2, validation_split=0.2, shuffle=True)

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
 - 25s - loss: 1.6340 - acc: 0.4077 - val_loss: 1.2870 - val_acc: 0.5361
Epoch 2/20
 - 23s - loss: 1.2158 - acc: 0.5747 - val_loss: 1.1573 - val_acc: 0.5919
Epoch 3/20
 - 23s - loss: 1.0403 - acc: 0.6438 - val_loss: 0.9968 - val_acc: 0.6697
Epoch 4/20
 - 23s - loss: 0.9566 - acc: 0.6776 - val_loss: 1.0593 - val_acc: 0.6494
Epoch 5/20
 - 23s - loss: 0.9159 - acc: 0.6970 - val_loss: 0.9529 - val_acc: 0.6851
Epoch 6/20
 - 23s - loss: 0.9112 - acc: 0.7039 - val_loss: 0.9401 - val_acc: 0.6896
Epoch 7/20
 - 23s - loss: 0.9109 - acc: 0.7072 - val_loss: 0.8655 - val_acc: 0.7260
Epoch 8/20
 - 23s - loss: 0.9147 - acc: 0.7105 - val_loss: 1.0572 - val_acc: 0.6397
Epoch 9/20
 - 23s - loss: 0.9025 - acc: 0.7160 - val_loss: 1.1887 - val_acc: 0.6373
Epoch 10/20
 - 23s - loss: 0.9063 - acc: 0.7194 - val_loss: 0.9254 - val_acc: 0.7091
Epoch 11/20
 - 23s - loss: 0.9135 - acc: 0.7181 - val_loss: 0.9058 - val_acc: 0.7175
Epoch 12/20
 - 23s - los

<keras.callbacks.History at 0x7f753a18b6d8>

## Network with augmented images, Architecture 4

In [0]:
aug_train = train_images[0:40000]
aug_validate = train_images[40000:50000]

#datagen = ImageDataGenerator(rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.15, zoom_range=0.1, channel_shift_range=10., horizontal_flip=True)
datagen = ImageDataGenerator(width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.1, channel_shift_range=0.1, horizontal_flip=True)

datagen.fit(aug_train)
#test = [0, 1, 2, 3]
#print(test[0:2])

In [0]:
CNN_augmented = Sequential()

CNN_augmented.add(layers.Conv2D(filters=32, kernel_size=(5, 5), activation='relu', input_shape=(32, 32, 3)))

CNN_augmented.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2))
#CNN_augmented.add(Dropout(1 - .9))

CNN_augmented.add(layers.Conv2D(64, (3, 3), activation='relu'))
CNN_augmented.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2) )
#CNN_augmented.add(Dropout(1 - .9))

CNN_augmented.add(layers.Flatten())
CNN_augmented.add(layers.Dense(512, activation='relu'))
#CNN_augmented.add(Dropout(1 - .9))

CNN_augmented.add(layers.Dense(10, activation='softmax')) 
#CNN.summary()

In [81]:
rmsprop = RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)  # default values: RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
CNN_augmented.compile(optimizer=rmsprop, loss='categorical_crossentropy', metrics=['accuracy'])

#CNN_augmented.fit(x=train_images_bw, y=train_labels, batch_size=32, epochs=20, verbose=2, validation_split=0.2, shuffle=True)

CNN_augmented.fit_generator(datagen.flow(train_images[0:40000], train_labels[0:40000], batch_size=32),
                    steps_per_epoch=ceil(40000 / 32), epochs=20, verbose=1, validation_data=(train_images[40000:50000], train_labels[40000:50000]))#, validation_split=0.2, shuffle=True)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f753ac675f8>

### "Train this final architecture on the data from the training set and validation set and evaluate its performance on the test set."

In [87]:
CNN_expanded = Sequential()

CNN_expanded.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)))
CNN_expanded.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2))
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Conv2D(128, (3, 3), activation='relu'))
CNN_expanded.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2) )
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Conv2D(256, (3, 3), activation='relu'))
CNN_expanded.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2) )
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Flatten())
CNN_expanded.add(layers.Dense(512, activation='relu'))
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Dense(256, activation='relu'))
CNN_expanded.add(Dropout(1 - 0.95))

CNN_expanded.add(layers.Dense(128, activation='relu'))

CNN_expanded.add(layers.Dense(10, activation='softmax')) 


rmsprop = RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)  # default values: RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
CNN_expanded.compile(optimizer=rmsprop, loss='categorical_crossentropy', metrics=['accuracy'])


CNN_expanded.fit(x=train_images, y=train_labels, batch_size=32, epochs=20, verbose=1, shuffle=True)
score = CNN_expanded.evaluate(test_images, test_labels, batch_size=32)
print()
print(str(score[1] * 100) + "% accuracy")

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

67.85% accuracy


### Reevaluate your best architecture using k-fold validation with k=5, that is, the size of the validation fold is 20%. Does the accuracy/loss obtain by k-fold validation differ from the accuracy/loss obtain by simple hold-out validation.

In [4]:
# Folds (validation sets):
fold1 = train_images[0:10000]
fold2 = train_images[10000:20000]
fold3 = train_images[20000:30000]
fold4 = train_images[30000:40000]
fold5 = train_images[40000:50000]
folds = [fold1,fold2,fold3,fold4,fold5] 

fold_label_set1 = train_labels[0:10000]
fold_label_set2 = train_labels[10000:20000]
fold_label_set3 = train_labels[20000:30000]
fold_label_set4 = train_labels[30000:40000]
fold_label_set5 = train_labels[40000:50000]
fold_labels = [fold_label_set1,fold_label_set2,fold_label_set3,fold_label_set4,fold_label_set5] 

# Train sets
train1 = train_images[10000:50000]
train2 = np.concatenate((train_images[0:10000], train_images[20000:50000]))
train3 = np.concatenate((train_images[0:20000], train_images[30000:50000]))
train4 = np.concatenate((train_images[0:30000], train_images[40000:50000]))
train5 = train_images[0:40000]
train_sets = [train1,train2,train3,train4,train5]

trainlabel1 = train_labels[10000:50000]
trainlabel2 = np.concatenate((train_labels[0:10000], train_labels[20000:50000]))
trainlabel3 = np.concatenate((train_labels[0:20000], train_labels[30000:50000]))
trainlabel4 = np.concatenate((train_labels[0:30000], train_labels[40000:50000]))
trainlabel5 = train_labels[0:40000]
train_label_sets = [trainlabel1,trainlabel2,trainlabel3,trainlabel4,trainlabel5]


for fold,fold_label_set,train_set,train_label_set in zip(folds,fold_labels,train_sets,train_label_sets):
    
    datagen = ImageDataGenerator(width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.1, channel_shift_range=0.1, horizontal_flip=True)
    datagen.fit(train_set)

    CNN_final = Sequential()
    CNN_final.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)))
    CNN_final.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2))
    CNN_final.add(Dropout(1 - 0.95))
    CNN_final.add(layers.Conv2D(128, (3, 3), activation='relu'))
    CNN_final.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2) )
    CNN_final.add(Dropout(1 - 0.95))
    CNN_final.add(layers.Conv2D(256, (3, 3), activation='relu'))
    CNN_final.add(layers.MaxPooling2D(pool_size=(2, 2), strides=2) )
    CNN_final.add(Dropout(1 - 0.95))
    CNN_final.add(layers.Flatten())
    CNN_final.add(layers.Dense(512, activation='relu'))
    CNN_final.add(Dropout(1 - 0.95))
    CNN_final.add(layers.Dense(256, activation='relu'))
    CNN_final.add(Dropout(1 - 0.95))
    CNN_final.add(layers.Dense(128, activation='relu'))
    CNN_final.add(layers.Dense(10, activation='softmax')) 

    rmsprop = RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
    CNN_final.compile(optimizer=rmsprop, loss='categorical_crossentropy', metrics=['accuracy'])
    
    CNN_final.fit_generator(datagen.flow(train_set, train_label_set, batch_size=32), steps_per_epoch=ceil(40000 / 32), epochs=20, verbose=1, validation_data=(fold, fold_label_set))
    print()

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 