# TensorFlow Assignment: Multilayer Perceptron (MLP) and Convolutional Neural Network (CNN)

**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**

Name: [Long Tian]

Now that you've run through a simple logistic regression model on MNIST, let's see if we can do better (Hint: we can). For this assignment, you'll build a multilayer perceptron (MLP) and a convolutional neural network (CNN), two popular types of neural networks, and compare their performance. Some potentially useful code:

### Multilayer Perceptron

Build a multilayer perceptron for MNIST digit classfication. Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image -> fully connected (500 hidden units) -> nonlinearity (Sigmoid/ReLU) -> fully connected (10 hidden units) -> softmax

Skeleton framework for you to fill in (Code you need to provide is marked by `###`):

Using ReLu activation function,

In [25]:
import tensorflow as tf
# Model Inputs
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

n_sample = mnist.train.images.shape[0]
n_input = mnist.train.images.shape[1]
n_hidden = 500
n_class = mnist.train.labels.shape[1]

x = tf.placeholder('float', [None, mnist.train.images.shape[1]]) ### MNIST images enter graph here ###
y_ = tf.placeholder('float', [None, mnist.train.labels.shape[1]]) ### MNIST labels enter graph here ###

learning_rate = 1
training_epochs = 4000
batch_size = 50

weight = {
    'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden], stddev = 0.1)),
    'h2': tf.Variable(tf.truncated_normal([n_hidden, n_class], stddev = 0.1))
}
bias = {
    'h1': tf.Variable(tf.constant(0.1, shape = [n_hidden, ])),
    'h2': tf.Variable(tf.constant(0.1, shape = [n_class, ]))
}
# Define the graph
def multiplayer_perceptron(x, weight, bias):

    layerin = tf.add(tf.matmul(x, weight['h1']), bias['h1'])
    layerout = tf.nn.relu(layerin)
    
    layerin = tf.add(tf.matmul(layerout, weight['h2']), bias['h2'])
    y_mlp = tf.nn.softmax(layerin) # can only use softmax here!!!???

    return y_mlp


### Create your MLP here##
### Make sure to name your MLP output as y_mlp ###
y_mlp = multiplayer_perceptron(x, weight, bias)

# Loss 
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_mlp))

# Optimizer
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

# Evaluation
correct_prediction = tf.equal(tf.argmax(y_mlp, 1), tf.argmax(y_, 1)) #evaluation on train data!
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    # Initialize all variables
    sess.run(tf.global_variables_initializer())
    
    # Training regimen
    for i in range(training_epochs):
        # Validate every 250th batch
        if i % 250 == 0:
            validation_accuracy = 0
            for v in range(10):
                batch = mnist.validation.next_batch(100)
                validation_accuracy += (1/10) * accuracy.eval(feed_dict={x: batch[0], y_: batch[1]})
            print('step %d, validation accuracy %g' % (i, validation_accuracy))
        
        # Train    
        batch = mnist.train.next_batch(batch_size)
        train_step.run(feed_dict={x: batch[0], y_: batch[1]})

    print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0, validation accuracy 0.097
step 250, validation accuracy 0.909
step 500, validation accuracy 0.912
step 750, validation accuracy 0.942
step 1000, validation accuracy 0.96
step 1250, validation accuracy 0.963
step 1500, validation accuracy 0.964
step 1750, validation accuracy 0.961
step 2000, validation accuracy 0.955
step 2250, validation accuracy 0.97
step 2500, validation accuracy 0.966
step 2750, validation accuracy 0.962
step 3000, validation accuracy 0.965
step 3250, validation accuracy 0.948
step 3500, validation accuracy 0.975
step 3750, validation accuracy 0.975
test accuracy 0.9673


Using sigmoid activation,

In [2]:
import tensorflow as tf
# Model Inputs
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

n_sample = mnist.train.images.shape[0]
n_input = mnist.train.images.shape[1]
n_hidden = 500
n_class = mnist.train.labels.shape[1]

x = tf.placeholder('float', [None, mnist.train.images.shape[1]]) ### MNIST images enter graph here ###
y_ = tf.placeholder('float', [None, mnist.train.labels.shape[1]]) ### MNIST labels enter graph here ###

learning_rate = 1
training_epochs = 4000
batch_size = 50

weight = {
    'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden], stddev = 0.1)),
    'h2': tf.Variable(tf.truncated_normal([n_hidden, n_class], stddev = 0.1))
}
bias = {
    'h1': tf.Variable(tf.constant(0.1, shape = [n_hidden, ])),
    'h2': tf.Variable(tf.constant(0.1, shape = [n_class, ]))
}
# Define the graph
def multiplayer_perceptron(x, weight, bias):

    layerin = tf.add(tf.matmul(x, weight['h1']), bias['h1'])
    layerout = tf.nn.sigmoid(layerin)
    
    layerin = tf.add(tf.matmul(layerout, weight['h2']), bias['h2'])
    y_mlp = tf.nn.softmax(layerin) # can only use softmax here!!!???

    return y_mlp


### Create your MLP here##
### Make sure to name your MLP output as y_mlp ###
y_mlp = multiplayer_perceptron(x, weight, bias)

# Loss 
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_mlp))

# Optimizer
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

# Evaluation
correct_prediction = tf.equal(tf.argmax(y_mlp, 1), tf.argmax(y_, 1)) #evaluation on train data!
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    # Initialize all variables
    sess.run(tf.global_variables_initializer())
    
    # Training regimen
    for i in range(training_epochs):
        # Validate every 250th batch
        if i % 250 == 0:
            validation_accuracy = 0
            for v in range(10):
                batch = mnist.validation.next_batch(100)
                validation_accuracy += (1/10) * accuracy.eval(feed_dict={x: batch[0], y_: batch[1]})
            print('step %d, validation accuracy %g' % (i, validation_accuracy))
        
        # Train    
        batch = mnist.train.next_batch(batch_size)
        train_step.run(feed_dict={x: batch[0], y_: batch[1]})

    print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0, validation accuracy 0.109
step 250, validation accuracy 0.682
step 500, validation accuracy 0.746
step 750, validation accuracy 0.757
step 1000, validation accuracy 0.794
step 1250, validation accuracy 0.852
step 1500, validation accuracy 0.846
step 1750, validation accuracy 0.816
step 2000, validation accuracy 0.823
step 2250, validation accuracy 0.834
step 2500, validation accuracy 0.841
step 2750, validation accuracy 0.845
step 3000, validation accuracy 0.853
step 3250, validation accuracy 0.849
step 3500, validation accuracy 0.847
step 3750, validation accuracy 0.846
test accuracy 0.8484


#### Comparison

How do the sigmoid and rectified linear unit (ReLU) compare?

***

Sigmoid activation function converges more slowly than ReLu comparing each convergence procedure.Meanwhile,recognition accuracy has a big gap.That's to say,ReLu can achieve almost 97% while Sigmoid only get 85% or so.
It may caused by different properties of two activation functions.ReLu can give a much more sparse feature space than sigmoid,and such kind of features may be effective enough for MNIST to recognize.When inputs are far away from 0,the gradient of ReLu always be bigger than sigmoid,this can help ReLu converge faster than sigmoid. 

***

### Convolutional Neural Network

Build a simple 2-layer CNN for MNIST digit classfication. Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image -> CNN (32 5x5 filters) -> nonlinearity (ReLU) ->  (2x2 max pool) -> CNN (64 5x5 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> fully connected (1024 hidden units) -> nonlinearity (ReLU) -> fully connected (10 hidden units) -> softmax

Some additional functions that you might find helpful:

In [1]:
import tensorflow as tf
# Model Inputs
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Model Inputs
x = tf.placeholder('float', [None, mnist.train.images.shape[1]])### MNIST images enter graph here ###
y_ = tf.placeholder('float', [None, mnist.train.labels.shape[1]])### MNIST labels enter graph here ###

# Helper functions for creating weight variables
def weight_variable(shape):
    """weight_variable generates a weight variable of a given shape."""
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    """bias_variable generates a bias variable of a given shape."""
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)
                             
# Convolutional neural network functions
def conv2d(x, W):
    """conv2d returns a 2d convolution layer with full stride."""
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    """max_pool_2x2 downsamples a feature map by 2X."""
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
  
x_image = tf.reshape(x, [-1, 28, 28, 1])    
                             
# Define the graph
def CNN(x_image):
    W = tf.Variable(tf.zeros([784,10]))  
    b = tf.Variable(tf.zeros([10]))  

    W_conv1 = weight_variable([5, 5, 1, 32])  
    b_conv1 = bias_variable([32])
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)  
    h_pool1 = max_pool_2x2(h_conv1)  

    W_conv2 = weight_variable([5, 5, 32, 64])  
    b_conv2 = bias_variable([64])  

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)  
    h_pool2 = max_pool_2x2(h_conv2)

    # Now image size is reduced to 7*7  
    W_fc1 = weight_variable([7 * 7 * 64, 1024])  
    b_fc1 = bias_variable([1024])  

    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])  
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)  

    W_fc2 = weight_variable([1024, 10])  
    b_fc2 = bias_variable([10])

    y_conv = tf.nn.softmax(tf.matmul(h_fc1, W_fc2) + b_fc2)
    return y_conv

### Create your CNN here##
### Make sure to name your CNN output as y_conv ###
y_conv = CNN(x_image)

# Loss 
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))

# Optimizer
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

# Evaluation
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    # Initialize all variables
    sess.run(tf.global_variables_initializer())
    
    # Training regimen
    for i in range(10000):
        # Validate every 250th batch
        if i % 250 == 0:
            validation_accuracy = 0
            for v in range(10):
                batch = mnist.validation.next_batch(50)
                validation_accuracy += (1/10) * accuracy.eval(feed_dict={x: batch[0], y_: batch[1]})
            print('step %d, validation accuracy %g' % (i, validation_accuracy))
        
        # Train    
        batch = mnist.train.next_batch(50)
        train_step.run(feed_dict={x: batch[0], y_: batch[1]})

    print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0, validation accuracy 0.106
step 250, validation accuracy 0.732
step 500, validation accuracy 0.808
step 750, validation accuracy 0.766
step 1000, validation accuracy 0.81
step 1250, validation accuracy 0.784
step 1500, validation accuracy 0.874
step 1750, validation accuracy 0.848
step 2000, validation accuracy 0.862
step 2250, validation accuracy 0.9
step 2500, validation accuracy 0.886
step 2750, validation accuracy 0.872
step 3000, validation accuracy 0.898
step 3250, validation accuracy 0.882
step 3500, validation accuracy 0.862
step 3750, validation accuracy 0.902
step 4000, validation accuracy 0.9
step 4250, validation accuracy 0.892
step 4500, validation accuracy 0.888
step 4750, validation accuracy 0.892
step 5000, validation accuracy 0.896
step 5250, validation accuracy 0.868
ste

Some differences from the logistic regression model to note:

- The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

- The logistic regression model we used previously was pretty basic, and as such, we were able to get away with using the GradientDescentOptimizer, which performs implements the gradient descent algorithm. For more difficult optimization spaces (such as the ones deep networks pose), we might want to use more sophisticated algorithms. Prof David Carlson has a lecture on this later.
    
- Because of the larger size of our network, notice that our minibatch size has shrunk.
    
- We've added a validation step every 250 minibatches. This let's us see how our model is doing during the training process, rather than sit around twiddling our thumbs and hoping for the best when training finishes. This becomes especially significant as training regimens start approaching days and weeks in length. Normally, we validate on the entire validation set, but for the sake of time we'll just stick to 10 validation minibatches (500 images) for this homework assignment.

#### Comparison

How do the MLP and CNN compare in accuracy? Training time? Why would you use one vs the other? Is there a problem you see with MLPs when applied to other image datasets?

***

Train time:MLP consumes less than 1 min;CNN takes as long as half of the class time,around 30 to 35 mins.


***

***

Accuracy:MLP with ReLu is 97.7%;CNN is 98.9%，more than 1% of MLP.


***

***

I also tried MLP with CIFAR10 dataset,which performs aweful!While CNN does well in that dataset as other people's blogs say.The reason caused this probabily is inherent differences between MLP and CNN.CNN is much more proper to handle 2D data such as image.


***