[Home](http://realai.org/) > [Course](http://realai.org/course/) > [TensorFlow](http://realai.org/course/tensorflow/) > [GPU](http://realai.org/course/tensorflow/#gpu) >

# Deep CNN for CIFAR-10

*Last Updated: September 1, 2017*

On a [NVIDIA® Tesla® K80](http://www.nvidia.com/object/tesla-k80.html) GPU, the convolutional neural network (CNN) built in the [last session](http://realai.org/course/tensorflow/#deep-models) for [MNIST handwritten digits](http://yann.lecun.com/exdb/mnist/) trains in less than 2 minutes. Let's use the GPU for a harder dataset called [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html), which again contains 60000 images, divided into a 50000-image training set and a 10000-image test set, with exactly 10 class labels: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. The following tech report describes it in more details:

* 2009 April 8, Alex Krizhevsky. [Learning Multiple Layers of Features from Tiny Images](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf).

What's more difficult than the MNIST images is that the CIFAR-10 data are 32x32 color images. Moreover, we cannot load the data as easily as before, and need to a bit of extra work. First we manually [download](https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz) the binary version of the dataset, then store and unpack it in a persistent directory. Here we use `CIFAR_10_data/`:

In [1]:
%ls -R CIFAR_10_data/

CIFAR_10_data/:
[0m[01;34mcifar-10-batches-bin[0m/  [01;31mcifar-10-binary.tar.gz[0m

CIFAR_10_data/cifar-10-batches-bin:
batches.meta.txt  data_batch_2.bin  data_batch_4.bin  readme.html
data_batch_1.bin  data_batch_3.bin  data_batch_5.bin  test_batch.bin


Since the CIFAR-10 dataset has a similar structure to MNIST, we intend to load it into our program in a way that it can be used just as the "MNIST" dataset in the previous [exercise](http://nbviewer.jupyter.org/url/realai.org/course/tensorflow/solving-MNIST-by-convolution.ipynb), with more interesting images. For that we import the modules `base` and `mnist`. We also need `numpy` and `os` for data processing. Finally we define some constants in capital letters, including LOGDIR for tf.summary.FileWriter and TensorBoard to use later, and put them below so that they're easier to find:

In [2]:
import numpy as np
import os
import tensorflow as tf

from tensorflow.contrib.learn.python.learn.datasets import base
from tensorflow.contrib.learn.python.learn.datasets import mnist

DATA_DIR = "CIFAR_10_data/cifar-10-batches-bin"
LOGDIR = "/tmp/CIFAR_10"
VALIDATION_SIZE = 5000

The next two cells define functions to process the data files. The first function parses a single CIFAR-10 data file. The second function combines the outputs from the first function:

In [3]:
def read_cifar10_one_file(filename):
  """Reads and parses examples from CIFAR-10 data files
  
  Args:
    filename: A string of the name of the data file to read from

  Returns:
    images: 4D numpy.uint8 array of size [num_samples, height (32), width (32), depth (3)]
    labels: 1D numpy.float64 array of [num_samples] size
  """
  
  with open(filename, 'rb') as f:
    data = f.read()
  data = np.frombuffer(data, dtype=np.uint8)
  data = data.reshape(10000, 3073)
  labels, images = np.split(data, (1,), axis=1)
  
  images = images.reshape(10000, 3, 32, 32)
  images = np.transpose(images, (0, 2, 3, 1))
  
  num_samples = labels.shape[0]
  one_hot = np.zeros((num_samples, 10))
  one_hot.flat[np.arange(num_samples)*10 + labels.ravel()] = 1
  
  return images, one_hot

In [4]:
def read_cifar10_files(data_dir, test_data=False):
  """Combine CIFAR-10 data using returns from function read_cifar10_one_file
  
  Args:
    data_dir: Path to the CIFAR-10 data directory
    test_data: bool, indicating if one should use the train or test data file

  Return: (same as read_cifar10_one_file)
    images: 4D numpy.uint8 array of size [num_samples, height (32), width (32), depth (3)]
    labels: 1D numpy.float64 array of [num_samples] size
  """
  if not test_data:
    filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
                 for i in range(1, 6)]
  else:
    filenames = [os.path.join(data_dir, 'test_batch.bin')]

  images = np.empty((0, 32, 32, 3), dtype=np.uint8)
  labels = np.empty((0, 10), dtype=np.float64)

  for f in filenames:
    if not tf.gfile.Exists(f):
      raise ValueError('Failed to find file: ' + f)
    data = read_cifar10_one_file(f)
    images = np.concatenate((images, data[0]))
    labels = np.concatenate((labels, data[1]))

  return images, labels

Using these two helper functions, we can feed the data into a "Datasets" object called "CIFAR":

In [5]:
images, labels = read_cifar10_files(DATA_DIR)
validation = mnist.DataSet(images[:VALIDATION_SIZE], labels[:VALIDATION_SIZE], reshape=False)
train = mnist.DataSet(images[VALIDATION_SIZE:], labels[VALIDATION_SIZE:], reshape=False)

images, labels = read_cifar10_files(DATA_DIR, test_data=True)
test = mnist.DataSet(images, labels, reshape=False)

CIFAR = base.Datasets(train=train, validation=validation, test=test)

Now it's possible to follow almost exactly the steps for MNIST, noting a few differences: (1) the CIFAR-10 images are already in shape (batch_size, 32, 32, 3), we no longer need the `tf.reshape` operation; (2) after two rounds of max pooling, the resulting tensor is in shape (batch_size, 8, 8, 64) and should be reshaped into 8*8*64 instead of 7*7*64; and (3) change the old variable name "MNIST" to "CIFAR"! At this point, if we ran this experiment using the MNIST model, it would again train in around 2 minutes but the validation error should be around 30%!

Only a little bit of extra work is needed to build a better model. We know that sometimes the construct of two successive 3x3 convnet layers followed by a max pooling layer can improve model performance. Such a style is referred to as "VGG", based on research from the following paper:

* 2015 April 11, Karen Simonyan and Andrew Zisserman. [Very Deep Convolutional Networks for Large-Scale Image Recognitio
n](https://arxiv.org/abs/1409.1556). *arXiv:1409.1556*.

The next cell contains extra convolution and max pooling layers to build a deep CNN in VGG style:

![](http://realai.org/course/tensorflow/deep-cnn-for-CIFAR-10-1.png)

In [6]:
# Start with CIFAR-10 input
images = tf.placeholder(tf.float32, (None, 32, 32, 3), name="Images")

# A regularizer to be added to all conv layers
regularizer = tf.contrib.layers.l2_regularizer(scale=0.1)

# Add two convolution layers with max pooling
conv1_1 = tf.layers.conv2d(
  images, 32, 3, padding="same", activation=tf.nn.relu, kernel_regularizer=regularizer,
  name="Conv1_1")
conv1_2 = tf.layers.conv2d(
  conv1_1, 32, 3, padding="same", activation=tf.nn.relu, kernel_regularizer=regularizer,
  name="Conv1_2")
pool1 = tf.layers.max_pooling2d(conv1_2, 2, 2, name="Pool1")

conv2_1 = tf.layers.conv2d(
  pool1, 64, 3, padding="same", activation=tf.nn.relu, kernel_regularizer=regularizer,
  name="Conv2_1")
conv2_2 = tf.layers.conv2d(
  conv2_1, 64, 3, padding="same", activation=tf.nn.relu, kernel_regularizer=regularizer,
  name="Conv2_2")
pool2 = tf.layers.max_pooling2d(conv2_2, 2, 2, name="Pool2")

# Add another conv
conv3_1 = tf.layers.conv2d(
  pool2, 128, 3, padding="same", activation=tf.nn.relu, kernel_regularizer=regularizer,
  name="Conv3_1")
conv3_2 = tf.layers.conv2d(
  conv3_1, 128, 3, padding="same", activation=tf.nn.relu, kernel_regularizer=regularizer,
  name="Conv3_2")
pool3 = tf.layers.max_pooling2d(conv3_2, 2, 2, name="Pool3")

# Reshape the 2D tensor back to 1D to be fed into "Dense"
pool3_flat = tf.reshape(pool3, (-1, 4*4*128), name="Pool3_Flat")

# Two dense layers with one dropout
# dense0 = tf.layers.dense(pool2_flat, 384, activation=tf.nn.relu, name="Dense0")
dense = tf.layers.dense(pool3_flat, 512, activation=tf.nn.relu, name="Dense")
keep_prob = tf.placeholder(tf.float32, name="Keep_Probability")
dropout = tf.nn.dropout(dense, keep_prob, name="Dropout")

# The original dense layer to compute logits that are later used for classification
logits = tf.layers.dense(dropout, 10, activation=None, name="Logits")

For training and test, we simply follow the MNIST exercise:

In [7]:
# A few more tensors for training and reporting
labels = tf.placeholder(tf.float32, (None, 10), name="Labels")

with tf.name_scope("Loss"):
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits),
        name="Mean")

with tf.name_scope("Optimizer"):
    train = tf.train.AdamOptimizer(learning_rate=0.001, name="Adam").minimize(loss)

with tf.name_scope("Error"):
    error = tf.reduce_mean(
        tf.cast(tf.not_equal(tf.argmax(labels, 1), tf.argmax(logits, 1)), tf.float32), name="Mean")

tf.summary.image("Images", images, max_outputs=4)
tf.summary.scalar("Loss", loss)
tf.summary.scalar("Error", error)
summ = tf.summary.merge_all()

Session, FileWriter and variable initialization:

In [8]:
if tf.gfile.Exists(LOGDIR):
    tf.gfile.DeleteRecursively(LOGDIR)
tf.gfile.MakeDirs(LOGDIR)

sess = tf.Session()
writer = tf.summary.FileWriter(LOGDIR, sess.graph)
sess.run(tf.global_variables_initializer())

Full training using a cloud [GPU](https://cloud.google.com/compute/pricing#gpus) on an n1-standard-2 (2 vCPUs, 7.2 GB memory) machine should take less than 10 minutes:

In [9]:
%%time
for i in range(10000):
    batch = CIFAR.train.next_batch(128)
    
    if i % 10 == 0:
        Error, Loss, Summ = sess.run((error, loss, summ), feed_dict={images: batch[0], labels: batch[1], keep_prob: 1.0})
        writer.add_summary(Summ, i)
        
        if i % 500 == 0:
            print("Step {}: Training loss is {:.5f}, error is {:.2f}%".format(i, Loss, Error * 100))

    sess.run(train, feed_dict={images: batch[0], labels: batch[1], keep_prob: 0.5})

Step 0: Training loss is 2.30075, error is 87.50%
Step 500: Training loss is 1.02283, error is 35.16%
Step 1000: Training loss is 0.86207, error is 30.47%
Step 1500: Training loss is 0.59323, error is 21.88%
Step 2000: Training loss is 0.50439, error is 19.53%
Step 2500: Training loss is 0.26510, error is 9.38%
Step 3000: Training loss is 0.29858, error is 7.81%
Step 3500: Training loss is 0.24689, error is 8.59%
Step 4000: Training loss is 0.18905, error is 9.38%
Step 4500: Training loss is 0.13786, error is 3.91%
Step 5000: Training loss is 0.11263, error is 3.91%
Step 5500: Training loss is 0.07144, error is 1.56%
Step 6000: Training loss is 0.03573, error is 0.00%
Step 6500: Training loss is 0.06624, error is 1.56%
Step 7000: Training loss is 0.07402, error is 2.34%
Step 7500: Training loss is 0.05901, error is 2.34%
Step 8000: Training loss is 0.06364, error is 3.12%
Step 8500: Training loss is 0.03966, error is 1.56%
Step 9000: Training loss is 0.01623, error is 0.00%
Step 9500: 

Print the validation and test result:

In [10]:
Error = sess.run(error, feed_dict={images: CIFAR.validation.images, labels: CIFAR.validation.labels, keep_prob: 1.0})
print("Validation error is {:.2f}%".format(Error * 100))

Validation error is 20.30%


In [11]:
Error = sess.run(error, feed_dict={images: CIFAR.test.images, labels: CIFAR.test.labels, keep_prob: 1.0})
print("Test error is {:.2f}%".format(Error * 100))

Test error is 21.27%


Our test and validation errors are fairly close, but a lot higher than the training error, which suggests that our model [overfits](https://en.wikipedia.org/wiki/Overfitting). General strategies to reduce overfitting include getting more training data and designing simpler models. They're outside the scope of this exercise and will be covered elsewhere.

Close FileWriter and Session:

In [12]:
writer.close()
sess.close()