# Solution of logistic regression model for MNIST in TensorFlow
MNIST dataset: yann.lecun.com/exdb/mnist/
Author: Chip Huyen<br/>
Jupyter scribe: Jiageng Liu<br/>
Prepared for the class CS 20SI: "TensorFlow for Deep Learning Research"<br/>
[cs20si.stanford.edu](https://cs20si.stanford.edu)<br/>

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import time

Define paramaters for the model

In [2]:
learning_rate = 0.01
batch_size = 128
n_epochs = 10

## Phase 1: build the graph

**Step 1: Read in data**<br/>
using TF Learn's built in function to load MNIST data to the folder data/mnist

In [3]:
mnist = input_data.read_data_sets('/data/mnist', one_hot=True) 

Extracting /data/mnist\train-images-idx3-ubyte.gz
Extracting /data/mnist\train-labels-idx1-ubyte.gz
Extracting /data/mnist\t10k-images-idx3-ubyte.gz
Extracting /data/mnist\t10k-labels-idx1-ubyte.gz


**Step 2**: create placeholders for features and labels.<br/>
Each image in the MNIST data is of shape 28*28 = 784.<br/>
Therefore, each image is represented with a 1x784 tensor.<br/>
There are 10 classes for each image, corresponding to digits 0 - 9.<br/>
Features are of the type float, and labels are of the type int, one-hot vectors.<br/>

In [4]:
X = tf.placeholder(shape=[batch_size, 784], dtype=tf.float32, name='image')
Y = tf.placeholder(shape=[batch_size, 10], dtype=tf.int16, name='label')

**Step 3**: create weights and bias.<br/>
w is initialized to random variables with mean of 0, stddev of 0.01.<br/>
b is initialized to 0.<br/>
Shape of w depends on the dimension of X and Y so that Y = tf.matmul(X, w).<br/>
Shape of b depends on Y.<br/>

In [5]:
w = tf.Variable(tf.zeros([784, 10]), dtype=tf.float32, name='weights')
b = tf.Variable(tf.zeros([1, 10]), dtype=tf.float32, name='bias')

**Step 4**: build model.<br/>
The model that returns the logits.<br/>
This logits will be later passed through softmax layer<br/>
to get the probability distribution of possible label of the image.<br/>
DO NOT DO SOFTMAX HERE.<br/>

In [6]:
logits = tf.matmul(X, w) + b

**Step 5**: define loss function.<br/>
Use cross entropy loss of the real labels with the softmax of logits.<br/>
Use *tf.nn.softmax_cross_entropy_with_logits*<br/>
Then use *tf.reduce_mean* to get the mean loss of the batch<br/>

In [7]:
entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y, name='loss')
loss = tf.reduce_mean(entropy)

**Step 6**: define training op.
Use Adam gradient descent to minimize loss.

In [8]:
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)

**Optional**: visualize with TensorBoard.
![TensorBoard](https://github.com/jiagengliu/stanford-tensorflow-tutorials/blob/master/examples/graphs/tensorboard_logistic_regression.png?raw=true)

In [9]:
from show_tf_graph import show_graph
show_graph(tf.get_default_graph())

## Phase 2: train the model

**Step 7**: Run the optimizer and fetch loss_batch<br/>
**Step 8**: test the model on the test dataset

In [10]:
with tf.Session() as sess:
    
    # Step 7: training
    start_time = time.time()
    sess.run(tf.global_variables_initializer())
    n_batches = int(mnist.train.num_examples/batch_size)
    for i in range(n_epochs):
        total_loss = 0
        
        for _ in range(n_batches):
            X_batch, Y_batch = mnist.train.next_batch(batch_size)
            # Step 7: run optimizer + fetch loss_batch
            _, loss_batch = sess.run([optimizer, loss], feed_dict={X: X_batch, Y: Y_batch})
            
            total_loss += loss_batch
        print('Average loss epoch {0}: {1}'.format(i, total_loss/n_batches))
        
    print('Total time: {0} seconds'.format(time.time() - start_time))
    
    print('Optimization Finished!') # should be around 0.35 after 25 epochs
    
    # Step 8: testing
    preds = tf.nn.softmax(logits)
    correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(Y, 1))
    accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32)) # need numpy.count_nonzero(boolarr) :(

    n_batches = int(mnist.test.num_examples/batch_size)
    total_correct_preds = 0

    for i in range(n_batches):
        X_batch, Y_batch = mnist.test.next_batch(batch_size)
        accuracy_batch = sess.run(accuracy, feed_dict={X: X_batch, Y:Y_batch}) 
        total_correct_preds += accuracy_batch

    print('Accuracy {0}'.format(total_correct_preds/mnist.test.num_examples))

Average loss epoch 0: 0.36556799174883425
Average loss epoch 1: 0.2955341628277218
Average loss epoch 2: 0.28100448987492316
Average loss epoch 3: 0.27937978964585525
Average loss epoch 4: 0.27470361113131464
Average loss epoch 5: 0.27102239622727975
Average loss epoch 6: 0.26880325427352686
Average loss epoch 7: 0.2672448804675838
Average loss epoch 8: 0.26855715437160504
Average loss epoch 9: 0.26259008433782693
Total time: 5.927762746810913 seconds
Optimization Finished!
Accuracy 0.9234
