# Convolutional Neural Network (CNN)

Name: Yuantong Ding

### Convolutional Neural Network

A 2-layer CNN for MNIST digit classfication:

Image -> convolution (32 5x5 filters) -> nonlinearity (ReLU) ->  (2x2 max pool) -> convolution (64 5x5 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> fully connected (256 hidden units) -> nonlinearity (ReLU) -> fully connected (10 hidden units) -> softmax


In [2]:

import tensorflow as tf
from tqdm import trange
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

# Import data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print(mnist.train.images.shape)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
(55000, 784)


In [3]:
# Reshape data
#mnist_train_images_reshaped = tf.reshape(mnist.train.images, [-1,28,28,1])
#print(mnist_train_images_reshaped.shape)

(55000, 28, 28, 1)


In [25]:
# Create the model
tf.reset_default_graph()
g = tf.get_default_graph()

# Create image input placeholder
X = tf.placeholder(tf.float32,[None,784])
x_cnn = tf.reshape(X, [-1, 28, 28, 1])
#x_cnn = tf.placeholder(tf.float32,[None,28,28,1]) # height, width, color

y = tf.placeholder(tf.float32,[None,10]) # 10 classes of digits

# Convolutional layer 1
W1 = tf.Variable(tf.truncated_normal([5,5,1,32],stddev=0.1)) # 5x5 filter, 32 filters
b1 = tf.Variable(tf.zeros([32])) # one per filter
conv1_preact = tf.nn.conv2d(x_cnn, W1, strides=[1,1,1,1], padding="SAME" ) + b1

# ReLu 1
conv1 = tf.nn.relu(conv1_preact)

# 2x2 max pool 1
max_pool1 = tf.nn.max_pool(conv1, ksize=[1,2,2,1],strides=[1,2,2,1],padding="SAME") #only to height and width, usually strides are the same with ksize
print("Shape of conv1 feature maps after max pooling:{0}".format(max_pool1.shape))

# Convolutional layer 2
W2 = tf.Variable(tf.truncated_normal([5,5,32,64],stddev=0.1)) # 5x5 filter, 32 features, 64 filters
b2 = tf.Variable(tf.zeros([64])) # one per filter

# Apply 2nd convolutional layer
conv2_preact = tf.nn.conv2d(max_pool1, W2, strides=[1,1,1,1], padding="SAME" ) + b2
conv2 = tf.nn.relu(conv2_preact)

# 2x2 max pool 2
max_pool2 = tf.nn.max_pool(conv2, ksize=[1,2,2,1],strides=[1,2,2,1],padding="SAME") #only to height and width, usually strides are the same with ksize
print("Shape of conv2 feature maps after max pooling:{0}".format(max_pool2.shape))

# fully connected 1
flat = tf.reshape(max_pool2, [-1, 7*7*64])
print("Shape of flattened feature maps after max pooling 2:{0}".format(flat.shape))
W3 = tf.Variable(tf.truncated_normal([7*7*64, 256], dtype=tf.float32, stddev=1e-1), name='weights')
b3 = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
latent_scores_1 = tf.nn.relu(tf.add(tf.matmul(flat,W3),b3)) 
print("Shape of latent_scores_1:{0}".format(latent_scores_1.shape))

# fully connected 2
W4 = tf.Variable(tf.truncated_normal([256, 10], dtype=tf.float32, stddev=1e-1), name='weights')
b4 = tf.Variable(tf.constant(0.0, shape=[10], dtype=tf.float32), trainable=True, name='biases')
scores = tf.nn.relu(tf.add(tf.matmul(latent_scores_1,W4),b4)) 
print("Shape of final scores:{0}".format(scores.shape))

loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=scores, labels=y) # softmax
avg_loss = tf.reduce_mean(loss)

train_step = tf.train.AdamOptimizer(0.0001).minimize(loss)


Shape of conv1 feature maps after max pooling:(?, 14, 14, 32)
Shape of conv2 feature maps after max pooling:(?, 7, 7, 64)
Shape of flattened feature maps after max pooling 2:(?, 3136)
Shape of latent_scores_1:(?, 256)
Shape of final scores:(?, 10)


In [39]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

In [40]:

for iter in trange(10):
    for i in range(550):
        batch_xs = mnist.train.images[i*100:(i+1)*100]
        batch_ys = mnist.train.labels[i*100:(i+1)*100]
        sess.run(train_step, feed_dict={X: batch_xs, y: batch_ys})

# sess.run(train_step, feed_dict={X:mnist.train.images[0:100],
#               y:mnist.test.labels[0:100]})



100%|██████████| 10/10 [09:32<00:00, 57.23s/it]


In [41]:
# Accuracy
computed_scores = sess.run(scores, feed_dict={X:mnist.test.images,
                      y:mnist.test.labels})
np.argmax(computed_scores,axis=1)

sum(np.argmax(computed_scores,axis=1)==np.argmax(mnist.test.labels,axis=1))



9891

### Short answer

1\. How does the CNN compare in accuracy with yesterday's logistic regression and MLP models? How about training time?

CNN has slightly higher accuracy (98.8%) than MLP (97.8%). However, CNN takes much longer time (~10min) to trian for the same number of iterations than MLP (~10s)

2\. How many trainable parameters are there in the CNN you built for this assignment?

*Note: By trainable parameters, I mean individual scalars. For example, a weight matrix that is 10x5 has 50.*

Convolution layer 1: weights + bias <br>
5 \* 5 \* 32 + 32 = 832

Convolution layer 2: weights + bias<br>
5 \* 5 \* 32 \* 64 + 64 = 51,264

Fully connected layer 1: weights + bias<br>
7 \* 7 \* 64 \* 256 + 256 = 803,072

Fully connected layer 1: weights + bias<br>
256 \* 10 + 10 = 2570

Total: 857,738

3\. When would you use a CNN versus a logistic regression model or an MLP?

If there are some repeated/shared sturctures exist in the data, like an image, it may be more reseanable to chose CNN over a simple MLP. Because CNN has translational invariance. Also, for large input, MLP model parameters would become impractically large while CNN's limited connections and weight sharing mean it can scale up much better than a pure MLP.