# CNN from scratch

In this notebook, we're going to build a convolutional neural network for recognizing handwritten digits from scratch. By from scratch, I mean without using tensorflow's almighty neural network functions like `tf.nn.conv2d`. This way, you'll be able to uncover the blackbox and understand how CNN works more clearly. We'll use tensorflow interactively, so you can check the intermediate results along the way. This will also help your understanding.


### Outline
Here are some functions we will implement from scratch in this notebook.

1. Convolutional layer
2. ReLU
3. Max Pooling
4. Affine layer (Fully connected layer)
5. Softmax
6. Cross entropy error

First things first, let's import TensorFlow

In [1]:
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.4.1
Default GPU Device: /device:GPU:0


These two lines of code will download and read in the handwritten digits data automatically.

In [4]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/home/arasdar/datasets/MNIST_data/", one_hot=True, reshape=False)

Extracting /home/arasdar/datasets/MNIST_data/train-images-idx3-ubyte.gz
Extracting /home/arasdar/datasets/MNIST_data/train-labels-idx1-ubyte.gz
Extracting /home/arasdar/datasets/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting /home/arasdar/datasets/MNIST_data/t10k-labels-idx1-ubyte.gz


We're going to look at only 100 examples at a time.

In [5]:
batch_size = 100

Here is the first example of data. It's a representation of a picture as a bunch of numbers.

In [6]:
example_X, example_ys = mnist.train.next_batch(batch_size)
example_X[0].shape

(28, 28, 1)

We use the convenient `InteractiveSession`, for checking the intermediate results along the way. You can now use `Tensor.eval()` and `Operation.run()` without having to specify a session explicitly.

In [5]:
session = tf.InteractiveSession()

We start building the computation graph by creating placeholders for the input images(`X`) and target output labels(`t`).

In [6]:
X = tf.placeholder('float', [batch_size, 28, 28, 1])
t = tf.placeholder('float', [batch_size, 10])

Below is an overview of the model we will build. It starts with a convolutional layer, pass the result to ReLU, pool, affine layer, ReLU again, second affine layer, and then apply softmax function. Keep in mind this architecture while you're following the notebook.

$$ conv - relu - pool - affine - relu - affine - softmax$$

## Convolutional layer

In [7]:
filter_h, filter_w, filter_c, filter_n = 5, 5, 1, 30

In [8]:
W1 = tf.Variable(tf.random_normal([filter_h, filter_w, filter_c, filter_n], stddev=0.01))
b1 = tf.Variable(tf.zeros([filter_n]))

In [9]:
def convolution(X, W, b, padding, stride):
    n, h, w, c = map(lambda d: d.value, X.get_shape())
    filter_h, filter_w, filter_c, filter_n = [d.value for d in W.get_shape()]
    
    out_h = (h + 2*padding - filter_h)//stride + 1
    out_w = (w + 2*padding - filter_w)//stride + 1

    X_flat = flatten(X, filter_h, filter_w, filter_c, out_h, out_w, stride, padding)
    W_flat = tf.reshape(W, [filter_h*filter_w*filter_c, filter_n])
    
    z = tf.matmul(X_flat, W_flat) + b     # b: 1 X filter_n
    
    return tf.transpose(tf.reshape(z, [out_h, out_w, n, filter_n]), [2, 0, 1, 3])

To compute convolution easily, we do a simple trick called flattening. After flattening, input data will be transformed into a 2D matrix, which allows for matrix multiplication with a filter (which is also flattened into 2D).

In [10]:
def flatten(X, window_h, window_w, window_c, out_h, out_w, stride=1, padding=0):
    
    X_padded = tf.pad(X, [[0,0], [padding, padding], [padding, padding], [0,0]])

    windows = []
    for y in range(out_h):
        for x in range(out_w):
            window = tf.slice(X_padded, [0, y*stride, x*stride, 0], [-1, window_h, window_w, -1])
            windows.append(window)
    stacked = tf.stack(windows) # shape : [out_h, out_w, n, filter_h, filter_w, c]

    return tf.reshape(stacked, [-1, window_c*window_w*window_h])

In [11]:
conv_layer = convolution(X, W1, b1, padding=2, stride=1)
conv_layer

<tf.Tensor 'transpose:0' shape=(100, 28, 28, 30) dtype=float32>

## ReLU

In [12]:
def relu(X):
    return tf.maximum(X, tf.zeros_like(X))

In [13]:
conv_activation_layer = relu(conv_layer)
conv_activation_layer

<tf.Tensor 'Maximum:0' shape=(100, 28, 28, 30) dtype=float32>

## Max pooling

In [14]:
def max_pool(X, pool_h, pool_w, padding, stride):
    n, h, w, c = [d.value for d in X.get_shape()]
    
    out_h = (h + 2*padding - pool_h)//stride + 1
    out_w = (w + 2*padding - pool_w)//stride + 1

    X_flat = flatten(X, pool_h, pool_w, c, out_h, out_w, stride, padding)

    pool = tf.reduce_max(tf.reshape(X_flat, [out_h, out_w, n, pool_h*pool_w, c]), axis=3)
    return tf.transpose(pool, [2, 0, 1, 3])

In [15]:
pooling_layer = max_pool(conv_activation_layer, pool_h=2, pool_w=2, padding=0, stride=2)
pooling_layer

<tf.Tensor 'transpose_1:0' shape=(100, 14, 14, 30) dtype=float32>

## Affine layer 1

In [16]:
batch_size, pool_output_h, pool_output_w, filter_n = [d.value for d in pooling_layer.get_shape()]

In [17]:
# number of nodes in the hidden layer
hidden_size = 100

In [18]:
W2 = tf.Variable(tf.random_normal([pool_output_h*pool_output_w*filter_n, hidden_size], stddev=0.01))
b2 = tf.Variable(tf.zeros([hidden_size]))

In [19]:
def affine(X, W, b):
    n = X.get_shape()[0].value # number of samples
    X_flat = tf.reshape(X, [n, -1])
    return tf.matmul(X_flat, W) + b

In [20]:
affine_layer1 = affine(pooling_layer, W2, b2)
affine_layer1

<tf.Tensor 'add_1:0' shape=(100, 100) dtype=float32>

In [21]:
init = tf.global_variables_initializer()
init.run()
affine_layer1.eval({X:example_X, t:example_ys})[0]

array([ 0.0007422 ,  0.00012643, -0.00155881,  0.01961509, -0.00859066,
       -0.00033119,  0.00343027, -0.00958689,  0.00936494,  0.00826222,
       -0.01193877,  0.00906352, -0.00564169,  0.02327353,  0.00395058,
        0.00056864,  0.02171693, -0.00322108, -0.00098383, -0.01839806,
       -0.002252  ,  0.00477656, -0.01111817, -0.01797424, -0.01943753,
        0.02208962,  0.00555941, -0.00505572,  0.00183791, -0.00555582,
        0.01254911, -0.0074481 ,  0.02289553, -0.02554271, -0.01616727,
       -0.00184879, -0.00088189,  0.01034675,  0.00582147,  0.01361462,
       -0.02374761,  0.00341097,  0.01175604,  0.00232195, -0.00484839,
       -0.01782613, -0.00154479, -0.0191502 , -0.00408084,  0.00103437,
       -0.02297377, -0.01956699,  0.00276269, -0.01176494, -0.00460986,
        0.01078459, -0.0049027 ,  0.00941862, -0.00548418, -0.00375865,
        0.01463018, -0.00229884,  0.01189869, -0.01217241,  0.0034869 ,
        0.02009911,  0.00302576,  0.00205424, -0.01815282,  0.01

The above result shows the representation of the first example as a 100 dimention vector in the hidden layer.

In [22]:
affine_activation_layer1 = relu(affine_layer1)
affine_activation_layer1

<tf.Tensor 'Maximum_1:0' shape=(100, 100) dtype=float32>

In [23]:
affine_activation_layer1.eval({X:example_X, t:example_ys})[0]

array([0.0007422 , 0.00012643, 0.        , 0.01961509, 0.        ,
       0.        , 0.00343027, 0.        , 0.00936494, 0.00826222,
       0.        , 0.00906352, 0.        , 0.02327353, 0.00395058,
       0.00056864, 0.02171693, 0.        , 0.        , 0.        ,
       0.        , 0.00477656, 0.        , 0.        , 0.        ,
       0.02208962, 0.00555941, 0.        , 0.00183791, 0.        ,
       0.01254911, 0.        , 0.02289553, 0.        , 0.        ,
       0.        , 0.        , 0.01034675, 0.00582147, 0.01361462,
       0.        , 0.00341097, 0.01175604, 0.00232195, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.00103437,
       0.        , 0.        , 0.00276269, 0.        , 0.        ,
       0.01078459, 0.        , 0.00941862, 0.        , 0.        ,
       0.01463018, 0.        , 0.01189869, 0.        , 0.0034869 ,
       0.02009911, 0.00302576, 0.00205424, 0.        , 0.01093012,
       0.00721059, 0.00286405, 0.00022887, 0.        , 0.01679

This is after applying ReLU to the above representation. You can see that we set all the negative numbers to 0.

## Affine layer 2

In [24]:
output_size = 10

In [25]:
W3 = tf.Variable(tf.random_normal([hidden_size, output_size], stddev=0.01))
b3 = tf.Variable(tf.zeros([output_size]))

In [26]:
affine_layer2 = affine(affine_activation_layer1, W3, b3)

In [27]:
# because you have new variables, you need to initialize them.
init = tf.global_variables_initializer()
init.run()

In [28]:
affine_layer2.eval({X:example_X, t:example_ys})[0]

array([-2.8602933e-04,  8.0156862e-04, -1.8080018e-04, -3.0994401e-05,
        7.0591079e-05,  4.7835533e-04,  1.7216617e-03, -3.0160867e-04,
        3.4927719e-04,  7.8509547e-05], dtype=float32)

## Softmax

In [29]:
def softmax(X):
    X_centered = X - tf.reduce_max(X) # to avoid overflow
    X_exp = tf.exp(X_centered)
    exp_sum = tf.reduce_sum(X_exp, axis=1)
    return tf.transpose(tf.transpose(X_exp) / exp_sum)

In [30]:
softmax_layer = softmax(affine_layer2)

In [31]:
softmax_layer.eval({X:example_X, t:example_ys})[0]

array([0.09994438, 0.10005315, 0.0999549 , 0.09996988, 0.09998003,
       0.10002081, 0.10014524, 0.09994283, 0.1000079 , 0.09998082],
      dtype=float32)

We got somewhat evenly distributed probabilities over 10 digits. This is as expected because we haven't trained our model yet.

## Cross entropy error

In [32]:
def cross_entropy_error(y, t):
    return -tf.reduce_mean(tf.log(tf.reduce_sum(y * t, axis=1)))

In [33]:
loss = cross_entropy_error(softmax_layer, t)

In [34]:
loss.eval({X:example_X, t:example_ys})

2.3024163

In [35]:
learning_rate = 0.1
trainer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

In [36]:
# number of times to iterate over training data
training_epochs = 2

In [37]:
# number of batches
num_batch = int(mnist.train.num_examples/batch_size)
num_batch

550

In [38]:
from tqdm import tqdm_notebook

In [39]:
for epoch in range(training_epochs):
    avg_cost = 0
    for _ in tqdm_notebook(range(num_batch)):
        train_X, train_ys = mnist.train.next_batch(batch_size)
        trainer.run(feed_dict={X:train_X, t:train_ys})
        avg_cost += loss.eval(feed_dict={X:train_X, t:train_ys}) / num_batch

    print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost), flush=True)


Epoch: 0001 cost= 0.801890662



Epoch: 0002 cost= 0.109185281


In [41]:
test_x = mnist.test.images[:batch_size]
test_t = mnist.test.labels[:batch_size]

In [46]:
def accuracy(network, t):
    
    t_predict = tf.argmax(network, axis=1)
    t_actual = tf.argmax(t, axis=1)

    return tf.reduce_mean(tf.cast(tf.equal(t_predict, t_actual), tf.float32))

In [47]:
accuracy(softmax_layer, t).eval(feed_dict={X:test_x, t:test_t})

0.98000002

We got an accuracy of 98%. Awesome!

In [48]:
session.close()

---
**dreamgonfly@gmail.com**

<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-91026007-1', 'auto');
  ga('send', 'pageview');

</script>