In [15]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.constraints import maxnorm
from keras.optimizers import SGD, RMSprop

from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
import tensorflow
import numpy
from keras import backend as K
K.set_image_dim_ordering('th')
# Plot ad hoc CIFAR10 instances
from keras.datasets import cifar10
from matplotlib import pyplot
# init random seed for reproductability
seed = 7
numpy.random.seed(seed)

## Restricting GPU memory usage

The code here should be added to any work you do on Volta.  If you don't, then your code will monopolize all available memory on each of the 4 GPUs on the machine, preventing others from working on it.  If you do **that**, you will be frowned upon.

The code in the next cell has the effect that:
1. Memory use will start off with some small fraction of the memory on each GPU.
1. It will grow if necessary (since `allow_growth` is set to `True`).
1. It will max out at 5% of overall memory.  Given the GPUs we have, this gives you (4 x 808 MB), which should be sufficient here.

In [4]:
########################
# Limit TensorFlow GPU use.
config = tensorflow.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.05
K.tensorflow_backend.set_session(tensorflow.Session(config=config))
########################

## Loading our data

Loading the training and testing data is from the `keras` with the built-in dataset tools
1. Normalizing the data inputs takes the pixel array information and divides by max possible value
1. This fixes the problem of the neural network model from finding patterns based on magnitude of the values rather than connected patters of similar color in a region

In [5]:
# load data
(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()

# normalizing inputs from 0-255 to 0.0-1.0
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train = X_train / 255.0
X_test = X_test / 255.0

## Converting classification training and testing data to a one-hot vector form
`to_categorical` converts this into a matrix with as many columns as there are classes. The number of rows stays the same.

In [6]:
# one hot encode the outputs
Y_train = np_utils.to_categorical(Y_train)
Y_test = np_utils.to_categorical(Y_test)
num_classes = Y_test.shape[1]

## Building and evaluating our model

The initial model initial configuration (6 convolutions of size 3x3 with neurons from 32, 64, and 128; 3 max pooling layers using size 2x2; 3 dropout layers of 0.2; 2 hidden layers of size 1024, 512 using maxnorm kernel; 3 dropout layers into the hidden layers; categorical-cross entropy with sgd as an optimizer)

The stocastic gradient descent omptimizing function fixes the training along the learning rate of the network

The rmsprop optimizing function is a different but similar approach to optimizing network training within a learning rate of the network.

The loss function is categorical cross entropy (and so we want to measure the associated binary accuracy value based on the category chosen by our network). The model is trained using a batch size equal to 25 epochs.

1. Categorical cross entropy is a loss function for evaluating the updates for the weights in the network. Specifically for when a classification problem has a categorical classification.

In [24]:
# Create the model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), activation='relu', padding='same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

# Compile model
epochs = 25
lrate = 0.01
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
# opt = RMSprop(lr=0.0001, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_21 (Conv2D)           (None, 32, 32, 32)        896       
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 32, 32, 32)        9248      
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 32, 16, 16)        0         
_________________________________________________________________
dropout_18 (Dropout)         (None, 32, 16, 16)        0         
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 64, 16, 16)        18496     
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 64, 16, 16)        36928     
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 64, 8, 8)          0         
__________

## Traceback for program concurrent epochs

Setting the random seed ensures that the randomness applied to our network is a consistent randomness so we can monitor how changes to the model effect the accuracy of the classifications.

Fitting the model to the training data and testing against the testing data for each epoch with a variable batch size per epoch

In [25]:
numpy.random.seed(seed)
model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=epochs, batch_size=32)

Train on 50000 samples, validate on 10000 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f8fe0745dd0>

## Evaluation of models on training data
Training data accuracy on initial configuration (6 convolutions of size 3x3 with neurons from 32, 64, and 128; 3 max pooling layers using size 2x2; 3 dropout layers of 0.2; 2 hidden layers of size 1024, 512 using maxnorm kernel; 3 dropout layers into the hidden layers; categorical-cross entropy with sgd as an optimizer) is 92.60%

Training data accuracy on a modified configuration of changing 2 of the convolutions to be of size 5x5 without any other changes makes the categorical classification doesn't make much of an impact on the training accuracy at 92.57%

Training data accuracy on a different model with RMSprop optimizer (4 convolutions of size 3x3 with neurons from 32, and 64; 2 max pooling layers using size 2x2; 2 dropout layers of 0.25; 2 hidden layers of size 512 and num_classes; 1 dropout layer in the hidden layers of 0.5; categorical-cross entropy loss function and RMSprop optimizer and a epoch size of 100 and batch size of 32) is 84.80%

Training data accuracy on a similar model with SGD optimizer (4 convolutions of size 3x3 with neurons from 32, and 64; 2 max pooling layers using size 2x2; 2 dropout layers of 0.25; 2 hidden layers of size 512 and num_classes; 1 dropout layer in the hidden layers of 0.5; categorical-cross entropy loss function and sgd optimizer and a epoch size of 50 and batch size of 32) is 99.85%

Training data accuracy with same model above with a smaller epoch size to avoid the overfitting found in the last iteration is 92.49%

In [26]:
# Final evaluation of the model on training data
scores = model.evaluate(X_train, Y_train, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 92.49%


## Evaluation of models on testing data
Testing data accuracy on initial configuration (6 convolutions of size 3x3 with neurons from 32, 64, and 128; 3 max pooling layers using size 2x; 3 dropout layers of 0.2; 2 hidden layers of size 1024, 512 using maxnorm kernel; 3 dropout layers into the hidden layers; categorical-cross entropy with sgd as an optimizer) is 80.17%

Testing data accuracy on a modified configuration of changing 2 of the convolutions to be of size 5x5 without any other changes makes the categorical classification slightly worse at 79.77%

Testing data accuracy on a different model with RMSprop optimizer (4 convolutions of size 3x3 with neurons from 32, and 64; 2 max pooling layers using size 2x2; 2 dropout layers of 0.25; 2 hidden layers of size 512 and num_classes; 1 dropout layer in the hidden layers of 0.5; categorical-cross entropy loss function and RMSprop optimizer and a epoch size of 100 and batch size of 32) is 78.30%

Testing data accuracy on a similar model with SGD optimizer (4 convolutions of size 3x3 with neurons from 32, and 64; 2 max pooling layers using size 2x2; 2 dropout layers of 0.25; 2 hidden layers of size 512 and num_classes; 1 dropout layer in the hidden layers of 0.5; categorical-cross entropy loss function and sgd optimizer and a epoch size of 50 and batch size of 32) is 80.25%

Testing data accuracy with same model above with a smaller epoch size to avoid the overfitting found in the last iteration is 78.45%

In [27]:
# Final evaluation of the model on training data
scores = model.evaluate(X_test, Y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 78.45%
