# CIFAR Convolutional Neural Network
### CST 463<br>Jose Mijangos <br>Kirk Worley<br>Victoria Zamora

## Imported Modules
The CIFAR data set is popular for testing new image classification learning algorithms because the images are only 32 by 32 pixels and belong to ten classes, so training on this data set is relatively quick. Like many others have done, we will construct a convolutional neural network that classifies the CIFAR data set.

In [73]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

## Utility Functions
unpickle allows us to open the CIFAR file and access a dictionary containing the data and labels. 
shuffle_batches returns a random sample from given batch data.

In [74]:
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx, :], y[batch_idx]
        yield X_batch, y_batch

## Load Data

The data was retrieved from Dr. Bruns Google Drive. It contains five training batches of 10000 instances, one test batch of 10000, and a ReadMe file describing the data.

In [75]:
base_path = 'C:/Users/jose/Desktop/cifar-10-batches-py/data_batch_'
filenames = [base_path + str(i) for i in range(1, 6)]
batches = [unpickle(batch) for batch in filenames]

cf_test_batch = unpickle('C:/Users/jose/Desktop/cifar-10-batches-py/test_batch')
X_test = cf_test_batch[b'data']
y_test = np.array(cf_test_batch[b'labels'])

first_done = False
for batch in batches:
    if not first_done:
        X_train = batch[b'data']
        y_train = np.array(batch[b'labels'])
        first_done = True
    else:
        X_train = np.vstack((X_train, batch[b'data']))
        y_train = np.concatenate((y_train, batch[b'labels']))

## Convolutional Neural Network Structure
Here we define a layout for our convolutional neural network.

In [76]:
height = 32
width = 32
channels = 3
n_inputs = height * width * channels

conv1_fmaps = 48
conv1_ksize = 2
conv1_stride = 1
conv1_pad = "SAME"

pool1_fmaps = conv1_fmaps

conv2_fmaps = 96
conv2_ksize = 2
conv2_stride = 2
conv2_pad = "SAME"

pool2_fmaps = conv2_fmaps

conv3_fmaps = 192
conv3_ksize = 2
conv3_stride = 1
conv3_pad = "SAME"

pool3_fmaps = conv3_fmaps

n_fc1 = 1024
n_fc2 = 512
n_fc3 = 256
n_fc4 = 128
n_fc5 = 64

n_outputs = 10

## Construction Phase
Now we actually wire the convolutional neural network together.

In [None]:
tf.reset_default_graph()
he_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")
training = tf.placeholder_with_default(False, shape=(), name='training')

with tf.name_scope("inputs"):
    X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X")
    X_reshaped = tf.reshape(X, shape=[-1, height, width, channels])
    y = tf.placeholder(tf.int32, shape=[None], name="y")

conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,
                         strides=conv1_stride, padding=conv1_pad,
                         activation=tf.nn.relu, name="conv1")

conv2 = tf.layers.conv2d(conv1, filters=conv1_fmaps, kernel_size=conv1_ksize,
                         strides=conv1_stride, padding=conv1_pad,
                         activation=tf.nn.relu, name="conv2")

norm1 = tf.nn.lrn(conv2, 3, bias=1.0, alpha=0.001/9.0, beta=0.75)

with tf.name_scope("pool1"):
    pool1 = tf.nn.max_pool(norm1, ksize=[1, 2, 2, 1], strides=[1, 1, 1, 1], padding="VALID")

conv3 = tf.layers.conv2d(pool1, filters=conv2_fmaps, kernel_size=conv2_ksize,
                         strides=conv2_stride, padding=conv2_pad,
                         activation=tf.nn.relu, name="conv3")

conv4 = tf.layers.conv2d(conv3, filters=conv2_fmaps, kernel_size=conv2_ksize,
                         strides=conv2_stride, padding=conv2_pad,
                         activation=tf.nn.relu, name="conv4")

norm2 = tf.nn.lrn(conv4, 3, bias=1.0, alpha=0.001/9.0, beta=0.75)

with tf.name_scope("pool2"):
    pool2 = tf.nn.max_pool(norm2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")

conv5 = tf.layers.conv2d(pool2, filters=conv3_fmaps, kernel_size=conv3_ksize,
                         strides=conv3_stride, padding=conv3_pad,
                         activation=tf.nn.relu, name="conv5")

norm5 = tf.nn.lrn(conv5, 3, bias=1.0, alpha=0.001/9.0, beta=0.75)

with tf.name_scope("pool3"):
    pool3 = tf.nn.max_pool(norm5, ksize=[1, 1, 1, 1], strides=[1, 2, 2, 1], padding="VALID")
    pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 2 * 2])

with tf.name_scope("fc1"):
    fc1 = tf.layers.dense(pool3_flat, n_fc1, activation=tf.nn.elu, name="fc1", kernel_initializer=he_init)
    fc1 = tf.layers.batch_normalization(fc1, True)
    fc1_drop = tf.layers.dropout(fc1, 0.3, training=training)
    
    fc2 = tf.layers.dense(fc1, n_fc2, activation=tf.nn.elu, name="fc2", kernel_initializer=he_init)
    fc2 = tf.layers.batch_normalization(fc2, True)
    fc2_drop = tf.layers.dropout(fc2, 0.3, training=training)
    
    fc3 = tf.layers.dense(fc2, n_fc3, activation=tf.nn.elu, name="fc3", kernel_initializer=he_init)
    fc3 = tf.layers.batch_normalization(fc3, True)
    fc3_drop = tf.layers.dropout(fc3, 0.3, training=training)
    
    fc4 = tf.layers.dense(fc3, n_fc4, activation=tf.nn.relu, name="fc4", kernel_initializer=he_init)
    fc4 = tf.layers.batch_normalization(fc4, True)
    fc4_drop = tf.layers.dropout(fc4, 0.3, training=training)
    
    fc5 = tf.layers.dense(fc4, n_fc5, activation=tf.nn.relu, name="fc5", kernel_initializer=he_init)
    fc5 = tf.layers.batch_normalization(fc5, True)
    fc5_drop = tf.layers.dropout(fc5, 0.3, training=training)

with tf.name_scope("output"):
    logits = tf.layers.dense(fc3_drop, n_outputs, name="output")
    Y_proba = tf.nn.softmax(logits, name="Y_proba")

with tf.name_scope("train"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)
    loss = tf.reduce_mean(xentropy)
    optimizer = tf.train.AdamOptimizer()
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    predict = tf.arg_max(logits, 1)
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

with tf.name_scope("init_and_save"):
    init = tf.global_variables_initializer()
    saver = tf.train.Saver() 

## Setup Training and Test Batches
The test data is takes up too much space and causes out of memory errors. Hence, we took only a subset of the test batch to be used in the execution phase. During validation, we also took a subset of the training batch to speed up training but since we are presenting our final model we are using all the training data.

In [None]:
for X_batch, y_batch in shuffle_batch(X_test, y_test, 5000):
    X_test_new, y_test_new = X_batch, y_batch
    break

for X_batch, y_batch in shuffle_batch(X_train, y_train, 50000):
    X_train_new, y_train_new = X_batch, y_batch
    break

## Execution Phase
Now let us see how well this convolutional neural network preforms during training.

In [None]:
from datetime import datetime
now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = ""
logdir = "{}/run-{}/".format(root_logdir, now)

file_writer = tf.summary.FileWriter('logdir', tf.get_default_graph())
saver = tf.train.Saver()

n_epochs = 20
batch_size = 1000

trains = []
accs = []

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train_new, y_train_new, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch, training: True})
        
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        print(epoch, "Train accuracy:", acc_train)

        acc_test = accuracy.eval(feed_dict={X: X_test_new, y: y_test_new})
        print(epoch, "Test accuracy:", acc_test)
        
        trains.append(acc_train)
        accs.append(acc_test)

    save_path = saver.save(sess, "./CIFAR_final_model")

file_writer.close()

0 Train accuracy: 0.375
0 Test accuracy: 0.3616
1 Train accuracy: 0.434
1 Test accuracy: 0.4306
2 Train accuracy: 0.475
2 Test accuracy: 0.468
3 Train accuracy: 0.518
3 Test accuracy: 0.469
4 Train accuracy: 0.496
4 Test accuracy: 0.4978
5 Train accuracy: 0.572
5 Test accuracy: 0.5102
6 Train accuracy: 0.557
6 Test accuracy: 0.521
7 Train accuracy: 0.582
7 Test accuracy: 0.5264
8 Train accuracy: 0.609
8 Test accuracy: 0.5288
9 Train accuracy: 0.613
9 Test accuracy: 0.5464
10 Train accuracy: 0.623
10 Test accuracy: 0.5374
11 Train accuracy: 0.653
11 Test accuracy: 0.5614
12 Train accuracy: 0.663
12 Test accuracy: 0.5714
13 Train accuracy: 0.683
13 Test accuracy: 0.573
14 Train accuracy: 0.696
14 Test accuracy: 0.5676
15 Train accuracy: 0.71
15 Test accuracy: 0.5794


We got nearly 56% accuracy on the test data on the final epoch, which is pretty good considering there are ten classes and the images are low resolution.
## Learning Curve for CNN

In [None]:
xs = [i for i in range(n_epochs)]

plt.plot(xs, trains, 'g', label='training')
plt.plot(xs, accs, 'r', label='validation')
plt.title('Training and Test Accuracy Over Time')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

From this learning curve it appears that our nueral network begins to overfit the training data after the second epoch. Also, the test accuracy plateaus after the eighth epoch. Perhaps more regularization is needed to prevent the model from overfitting.
## Model Prediction Examples

In [None]:
with tf.Session() as sess:
    saver.restore(sess, "./CIFAR_final_model")
    prediction = predict.eval(feed_dict={X: X_test_new, y: y_test_new})
    
print(prediction[:20])
print(y_test_new[:20])

For this example our model seems to confuse horses and deer as well as tucks and automobiles, which as understandable.

## Tuning Attempts
To try and tune the model, we did a multitude of things. We began by adding some dense layers close to the output layer of the model, which improved accuracy very slightly, but not very noticeably. Then we began tweaking the activation functions to use ELU, but this did not really offer any significant change.<br>
Next, we began adding convolutional layers to the model, which resulted in a decent increase to test accuracy, bringing it up to about .55 percent. We began to tweak the number of filter maps that each convolutional layer was using, which resulted in more consistent accuracies. Before, accuracies could vary wildly between runs of the net, but they were beginning to get more stable.<br>
We continued to tweak the layers further, by adjusting kernels and strides, resulting in the model doing fairly well on training data. Then we began to normalize our dense hidden layers with dropout and batch normalization to prevent the model from overfitting the training data. Our final model measures 63% accuracy on training data and 56% accuracy on the test data with early stopping.