# Homework 3
In this homework you will train your first deep network on actual high-dimensional data, namely images from supertux. You will extend the non-linear multi-layer perceptron trained in the previous assignment.

Development notes: 

1) If you are doing your homework in a Jupyter/iPython notebook you may need to 'Restart & Clear Output' after making a change and re-running a cell.  TensorFlow will not allow you to create multiple variables with the same name, which is what you are doing when you run a cell that creates a variable twice.<br/><br/>
2) Be careful with your calls to global_variables_initializer(). If you call it after training one network it will re-initialize your variables erasing your training.  In general, double check the outputs of your model after all training and before turning your model in. Ending a session will discard all your variable values.

## Part 0: Setup

In [1]:
import tensorflow as tf
import numpy as np
import util

# Load the data we are giving you
def load(filename, W=64, H=64):
    data = np.fromfile(filename, dtype=np.uint8).reshape((-1, W*H*3+1))
    images, labels = data[:, :-1].reshape((-1,H,W,3)), data[:, -1]
    return images, labels

image_data, label_data = load('tux_train.dat')

print('Input shape: ' + str(image_data.shape))
print('Labels shape: ' + str(label_data.shape))

num_classes = 6

# Set up your input placeholder
inputs = tf.placeholder(tf.float32, (None,64,64,3), name='input')

# Whenever you deal with image data it's important to mean center it first and subtract the standard deviation
white_inputs = (inputs - 100.) / 72.

# Next let's flatten the inputs
flat_inputs = tf.contrib.layers.flatten(white_inputs)


# Set up your label placeholders
labels = tf.placeholder(tf.int64, (None), name='labels')
float_labels = tf.cast(labels, tf.float32)
onehot_labels = tf.one_hot(labels, num_classes, name='onehot_labels')

Input shape: (12257, 64, 64, 3)
Labels shape: (12257,)


## Part 1: Regression to Scalar
Set up a compute graph that does regression from the inputs to
the scalar value of the label.

i.e. frame$_i$ --> y, where y is some value [0,5]

In [2]:
# The scope allows you to copy the network architecture between different parts without accidentially sharing some weights
with tf.name_scope('sr'):
    # Step 1: define the compute graph of your regressor from images to
    #     scalar value. The input should be 'flat_inputs'
    # Hint: The first fully_connected layer should have a ReLU activation, the second should not have any activation_fn=None
    ### Your code here ###
    hidden = tf.contrib.layers.fully_connected(flat_inputs, 100)
    hidden2 = tf.contrib.layers.fully_connected(hidden, 100, activation_fn=None)
    output = tf.contrib.layers.fully_connected(hidden2, 1, activation_fn=None)
    output = tf.identity(output, name='output') # due to the scope this variable is called 'sr/output'
    
    # Step 2: use a loss function over your predictions and the ground truth, penalize the mean squared distance between output and 'float_labels'.
    ### Your code here ###
    sr_loss = tf.reduce_mean(tf.squared_difference(output, float_labels))
    
    # Step 3: create an optimizer (MomentumOptimizer with learning rate 0.0001 and momentum 0.9 works well)
    ### Your code here ###
    sr_optimizer = tf.train.MomentumOptimizer(0.0001, 0.9).minimize(sr_loss)

    # Step 4: use that optimizer on your loss function
    ### Your code here ###

    correct = tf.abs(output-float_labels) < 0.5
    sr_accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

## Part 2: Regression to One-Hot Vector
Set up a compute graph that does regresion from the inputs to a 
one-hot representation of the labels.  Use the onehot_labels variable we have provided you.

i.e. frame$_i$ --> y, where y is [0,...,1,..0] with 1 in the index of the label value.  
ex: (frame$_i$, 2) becomes (frame$_i$, [0, 0, 1, 0, 0, 0])

In [3]:
with tf.name_scope('ohr'):
    # Step 1: define the compute graph of your regressor from images
    #     to one-hot labels (python variable onehot_labels with TF
    #     name 'onehot_lables')
    # Note: use the same input as your first regressor 'flat_inputs',
    #       and different labels, python variable onehot_labels.
    #       You will have three branches off the input, your two regressors
    #           and a classifier.
    ### Your code here ###
    hidden = tf.contrib.layers.fully_connected(flat_inputs, 100)
    hidden2 = tf.contrib.layers.fully_connected(hidden, 100, activation_fn=None)
    output = tf.contrib.layers.fully_connected(hidden2, 6, activation_fn=None)
    output = tf.identity(output, name='output') # due to the scope this variable is called 'sr/output'
    
    # Step 2: use a loss function over your predictions and the ground truth.
    ### Your code here ###
    ohr_loss = tf.reduce_mean(tf.squared_difference(output, onehot_labels))
    
    # Step 3: create an optimizer (MomentumOptimizer with learning rate 0.0001 and momentum 0.9 works well)
    ### Your code here ###
    ohr_optimizer = tf.train.MomentumOptimizer(0.0001, 0.9).minimize(ohr_loss)

    # Step 4: use that optimizer on your loss function
    ### Your code here ###
    correct = tf.equal(tf.argmax(output, 1), labels)
    ohr_accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

## Part 3: Softmax + Log-Likelihood

In [4]:
with tf.name_scope('ll'):
    # Step 1: define your compute graph
    ### Your code here ###
    hidden = tf.contrib.layers.fully_connected(flat_inputs, 100)
    hidden2 = tf.contrib.layers.fully_connected(hidden, 100, activation_fn=None)
    output = tf.contrib.layers.fully_connected(hidden2, 6, activation_fn=None)
    output = tf.identity(output, name='output')

    # Step 2: use a classification loss function
    ### Your code here ###
    l1_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output, labels=labels))
    
    # Step 3: create an optimizer
    ### Your code here ###
    l1_optimizer = tf.train.MomentumOptimizer(0.0001, 0.9).minimize(l1_loss)

    # Step 4: use that optimizer on your loss function
    ### Your code here ###
    correct = tf.equal(tf.argmax(output, 1), labels)
    l1_accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

## Part 4: Softmax + L2-Regression

In [None]:
with tf.name_scope('l2'):
    # Step 1: define your compute graph
    ### Your code here ###
    hidden = tf.contrib.layers.fully_connected(flat_inputs, 100)
    hidden2 = tf.contrib.layers.fully_connected(hidden, 100, activation_fn=None)
    output = tf.contrib.layers.fully_connected(hidden2, 6, activation_fn=None)
    output = tf.identity(output, name='output')

    # Step 2: use a classification loss function
    ### Your code here ###
    l2_loss = tf.reduce_mean(tf.squared_difference(tf.nn.softmax(output), onehot_labels))
    
    # Step 3: create an optimizer
    ### Your code here ###
    l2_optimizer = tf.train.MomentumOptimizer(0.0001, 0.9).minimize(l2_loss)

    # Step 4: use that optimizer on your loss function
    ### Your code here ###
    correct = tf.equal(tf.argmax(output, 1), labels)
    l2_accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

## Part 5: Training

In [None]:
# Batch size
BS = 32

# Start a session
sess = tf.Session()

# Set up training (initialize the variables)
### Your code here ###
init = tf.global_variables_initializer()
sess.run(init)


# This is a helper function that trains your model for several epochs un shuffled data
# train_func should take a single step in the optmimzation and return accuracy and loss
#   accuracy, loss = train_func(batch_images, batch_labels)
# HINT: train_func should call sess.run
def train(train_func):
    # An epoch is a single pass over the training data
    for epoch in range(20):
        # Let's shuffle the data every epoch
        np.random.seed(epoch)
        np.random.shuffle(image_data)
        np.random.seed(epoch)
        np.random.shuffle(label_data)
        # Go through the entire dataset once
        accs, losss = [], []
        for i in range(0, image_data.shape[0]-BS+1, BS):
            # Train a single batch
            batch_images, batch_labels = image_data[i:i+BS], label_data[i:i+BS]
            acc, loss = train_func(batch_images, batch_labels)
            accs.append(acc)
            losss.append(loss)
        print('[%3d] Accuracy: %0.3f  \t  Loss: %0.3f'%(epoch, np.mean(accs), np.mean(losss)))

# Train scalar regressor network
print('Scalar regressor')
### Your code here ###
def sr_trainer(img, lbl):
    a, l, _ = sess.run([sr_accuracy, sr_loss, sr_optimizer], feed_dict={inputs:img, labels:lbl}) 
    return a, l
train(sr_trainer)

# Train onehot regressor network
print('\nOnehot regressor')
### Your code here ###
def ohr_trainer(img, lbl):
    a, l, _ = sess.run([ohr_accuracy, ohr_loss, ohr_optimizer], feed_dict={inputs:img, labels:lbl}) 
    return a, l
train(ohr_trainer)

# Train classifier
print('\nSoftmax+ll regressor')
### Your code here ###
def l1_trainer(img, lbl):
    a, l, _ = sess.run([l1_accuracy, l1_loss, l1_optimizer], feed_dict={inputs:img, labels:lbl}) 
    return a, l
train(l1_trainer)

# Train classifier
print('\nSoftmax+L2 regressor')
### Your code here ###
def l2_trainer(img, lbl):
    a, l, _ = sess.run([l2_accuracy, l2_loss, l2_optimizer], feed_dict={inputs:img, labels:lbl}) 
    return a, l
train(l2_trainer)


Scalar regressor
[  0] Accuracy: 0.218  	  Loss: 2.524
[  1] Accuracy: 0.220  	  Loss: 2.140
[  2] Accuracy: 0.221  	  Loss: 2.123
[  3] Accuracy: 0.222  	  Loss: 2.109
[  4] Accuracy: 0.221  	  Loss: 2.109
[  5] Accuracy: 0.221  	  Loss: 2.103
[  6] Accuracy: 0.221  	  Loss: 2.103
[  7] Accuracy: 0.221  	  Loss: 2.094
[  8] Accuracy: 0.221  	  Loss: 2.091
[  9] Accuracy: 0.221  	  Loss: 2.093
[ 10] Accuracy: 0.221  	  Loss: 2.090
[ 11] Accuracy: 0.221  	  Loss: 2.089
[ 12] Accuracy: 0.221  	  Loss: 2.088
[ 13] Accuracy: 0.221  	  Loss: 2.090
[ 14] Accuracy: 0.221  	  Loss: 2.084
[ 15] Accuracy: 0.221  	  Loss: 2.087
[ 16] Accuracy: 0.221  	  Loss: 2.083
[ 17] Accuracy: 0.221  	  Loss: 2.083
[ 18] Accuracy: 0.221  	  Loss: 2.086
[ 19] Accuracy: 0.221  	  Loss: 2.088

Onehot regressor
[  0] Accuracy: 0.634  	  Loss: 0.258
[  1] Accuracy: 0.813  	  Loss: 0.104
[  2] Accuracy: 0.861  	  Loss: 0.080
[  3] Accuracy: 0.884  	  Loss: 0.068
[  4] Accuracy: 0.900  	  Loss: 0.060
[  5] Accuracy:

## Part 6: Evaluation

### See your model

In [None]:
# Show the current graph
util.show_graph(tf.get_default_graph().as_graph_def())

### Compute the valiation accuracy
You'll see some massive overfitting here, but don't worry we will deal with that in the coming weeks.

In [None]:
image_val, label_val = load('tux_val.dat')

print('Input shape: ' + str(image_data.shape))
print('Labels shape: ' + str(label_data.shape))

print("Scalar regressor", sess.run([sr_accuracy, sr_loss], feed_dict={inputs: image_val, labels: label_val}))

# Train onehot regressor network
print("Onehot regressor", sess.run([ohr_accuracy, ohr_loss], feed_dict={inputs: image_val, labels: label_val}))

# Train classifier
print("Softmax+ll regressor", sess.run([l1_accuracy, l1_loss], feed_dict={inputs: image_val, labels: label_val}))

# Train classifier
print("Softmax+l2 regressor", sess.run([l2_accuracy, l2_loss], feed_dict={inputs: image_val, labels: label_val}))

## Part 7: Save Model
Like homework 1 you are turning in your TensorFlow graph.  This time, however, you are saving the trained weights along with the structure.

In [None]:
util.save('assignment3.tfg', session=sess)