<a href="https://colab.research.google.com/github/dangrenell/CIFAR/blob/master/grenell_cifar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I'm using Tensorflow 2 here because it was recently resleased and I want to learn the conventions of the latest release. I don't think there's much if anything here that won't work with previous versions of Tensorflow. I grabbed the CIFAR-10 data directly from Keras because it was easiest. I experiemented with other libraries (many of which were delete). I left a few in for later experimentation.

In [1]:
!pip install tensorflow-gpu==2.0.0-alpha0
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
# from tensorflow.keras import utils as np_utils
# from keras.preprocessing.image import ImageDataGenerator

Collecting tensorflow-gpu==2.0.0-alpha0
[?25l  Downloading https://files.pythonhosted.org/packages/1a/66/32cffad095253219d53f6b6c2a436637bbe45ac4e7be0244557210dc3918/tensorflow_gpu-2.0.0a0-cp36-cp36m-manylinux1_x86_64.whl (332.1MB)
[K    100% |████████████████████████████████| 332.1MB 68kB/s 
Collecting tb-nightly<1.14.0a20190302,>=1.14.0a20190301 (from tensorflow-gpu==2.0.0-alpha0)
[?25l  Downloading https://files.pythonhosted.org/packages/a9/51/aa1d756644bf4624c03844115e4ac4058eff77acd786b26315f051a4b195/tb_nightly-1.14.0a20190301-py3-none-any.whl (3.0MB)
[K    100% |████████████████████████████████| 3.0MB 8.4MB/s 
Collecting tf-estimator-nightly<1.14.0.dev2019030116,>=1.14.0.dev2019030115 (from tensorflow-gpu==2.0.0-alpha0)
[?25l  Downloading https://files.pythonhosted.org/packages/13/82/f16063b4eed210dc2ab057930ac1da4fbe1e91b7b051a6c8370b401e6ae7/tf_estimator_nightly-1.14.0.dev2019030115-py2.py3-none-any.whl (411kB)
[K    100% |████████████████████████████████| 419kB 12.3MB/s

Using TensorFlow backend.


I'm just grabbing the data here. Since each pixel in each channel ranges from 0 to 255. I centered it at the mean and scaled by the standard deviation.

In [0]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# x_train, x_test = x_train / 255.0, x_test / 255.0

x_train = (x_train - x_train.mean()) / (x_train.std() + 1e-8)
x_test = (x_test - x_test.mean()) / (x_test.std() + 1e-8)

# y_train, y_test = tf.one_hot(y_train, 10), tf.one_hot(y_test, 10)

# y_train = np_utils.to_categorical(y_train, 10)
# y_test = np_utils.to_categorical(y_test, 10)

# datagen = ImageDataGenerator(
#     featurewise_center=True,
#     featurewise_std_normalization=True,
#     rotation_range=20,
#     width_shift_range=0.2,
#     height_shift_range=0.2,
#     horizontal_flip=True)

# # compute quantities required for featurewise normalization
# # (std, mean, and principal components if ZCA whitening is applied)
# datagen.fit(x_train)

# # fits the model on batches with real-time data augmentation:
# model.fit_generator(datagen.flow(x_train, y_train, batch_size=32),
#                     steps_per_epoch=len(x_train) / 32, epochs=10)

The model is built, compiled, and fit in this cell. A relu activation was chosen both for speed and because experiements with other activation functions (leaky ReLU and ELU were tried) led to negligible improvements.

A CNN was built as the convolutions capture two dimensional features of the image. The final model was selected after experimenting with the following hyperparameters: number of convolutions layers before pooling, kernel size, number of filters, number of max pooling layers, batch normalization vs dropout, dropout rate, number of convolutional-pooling blocks, number of fully connected layers, and width of fully connected layers.

The VGG network architecture has been successful with only using 3x3 convolutional windows. After experimenting with various window sizes and orders I adopted their practice. The number of filters increases in each convolutional-pooling block as the lower layers are capturing simpler features, but as the complexity rises there may be more features to capture. 

A high dropout rate was used in the fully connected layer to prevent overreliance on any particular ouput from the dense layer. Batch normalization is used after convolutional layers to speed up training. As batch normalization introduces noise no dropout was included at these points. All other parameters were set experimentally.

Adam is used to optimize rather than generic stochastic gradient descent as the literature indicate that it is a strong performer. Five epochs were used so as to minimize overfitting on training data. After five epochs the accuracy is already well above the loss. 

In [15]:
# activation = tf.keras.layers.LeakyReLU(alpha=0.1)
activation = 'relu'

model = tf.keras.models.Sequential([
    # First set of convolutional layers, followed by max pooling.
    tf.keras.layers.Conv2D(input_shape=(32, 32, 3),
                           kernel_size = 3,
                           filters = 32,
                           padding = 'same',
                           activation = activation),
    tf.keras.layers.BatchNormalization(),
    
    tf.keras.layers.Conv2D(kernel_size = 3,
                           filters = 32,
                           padding = 'same',
                           activation = activation),
    tf.keras.layers.MaxPooling2D(pool_size=2),
    tf.keras.layers.BatchNormalization(),



    # Convolutional layers with more filters, followed by max pooling.
    tf.keras.layers.Conv2D(kernel_size = 3,
                           filters = 64,
                           padding = 'same',
                           activation = activation),
    tf.keras.layers.BatchNormalization(),
    
    tf.keras.layers.Conv2D(kernel_size = 3,
                           filters = 64,
                           padding = 'same',
                           activation = activation),
    tf.keras.layers.MaxPooling2D(pool_size=2),
    tf.keras.layers.BatchNormalization(),

    
    # Convolutional layers with even more filters, followed by max pooling.
    tf.keras.layers.Conv2D(kernel_size = 3,
                           filters = 128,
                           padding = 'same',
                           activation = activation),
    tf.keras.layers.BatchNormalization(),
    
    tf.keras.layers.Conv2D(kernel_size = 3,
                           filters = 128,
                           padding = 'same',
                           activation = activation),
    tf.keras.layers.MaxPooling2D(pool_size=2),
    tf.keras.layers.BatchNormalization(),
    
#     # Convolutional layers with even MORE filters, followed by max pooling.
#     tf.keras.layers.Conv2D(kernel_size = 3,
#                            filters = 254,
#                            padding = 'same',
#                            activation = activation),
#     tf.keras.layers.BatchNormalization(),
    
#     tf.keras.layers.Conv2D(kernel_size = 3,
#                            filters = 254,
#                            padding = 'same',
#                            activation = activation),
#     tf.keras.layers.MaxPooling2D(pool_size=2),
#     tf.keras.layers.BatchNormalization(),
    
    # Flattened fully connected layer
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512,
                          activation = activation),
    tf.keras.layers.Dropout(0.5),


    # Ouput layer
    tf.keras.layers.Dense(10, activation = 'softmax')
])

model.compile(optimizer = 'adam',
             loss = 'sparse_categorical_crossentropy',
#              loss = 'categorical_crossentropy',
             metrics = ['accuracy'])

model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7fdadc1aecc0>

Finally, the model is evaluated on the test set. Accuracy is approximately 79%.

In [16]:
model.evaluate(x_test, y_test)[1]



0.7907