# Homework 3
In this homework you will train your first deep network on actual high-dimensional data, namely images from supertux. You will extend the non-linear multi-layer perceptron trained in the previous assignment.

Development notes: 

1) If you are doing your homework in a Jupyter/iPython notebook you may need to 'Restart & Clear Output' after making a change and re-running a cell.  TensorFlow will not allow you to create multiple variables with the same name, which is what you are doing when you run a cell that creates a variable twice.<br/><br/>
2) Be careful with your calls to global_variables_initializer(). If you call it after training one network it will re-initialize your variables erasing your training.  In general, double check the outputs of your model after all training and before turning your model in. Ending a session will discard all your variable values.

## Part 0: Setup

In [1]:
import tensorflow as tf
import numpy as np
import util

# Load the data we are giving you
def load(filename, W=64, H=64):
    data = np.fromfile(filename, dtype=np.uint8).reshape((-1, W*H*3+1))
    images, labels = data[:, :-1].reshape((-1,H,W,3)), data[:, -1]
    return images, labels

image_data, label_data = load('tux_train.dat')

print('Input shape: ' + str(image_data.shape))
print('Labels shape: ' + str(label_data.shape))

num_classes = 6

# Set up your input placeholder
inputs = tf.placeholder(tf.float32, (None,64,64,3), name='input')

# Whenever you deal with image data it's important to mean center it first and subtract the standard deviation
white_inputs = (inputs - 100.) / 72.

# Next let's flatten the inputs
flat_inputs = tf.contrib.layers.flatten(white_inputs)


# Set up your label placeholders
labels = tf.placeholder(tf.int64, (None), name='labels')
float_labels = tf.cast(labels, tf.float32)
onehot_labels = tf.one_hot(labels, num_classes, name='onehot_labels',dtype='int32')

Input shape: (12257, 64, 64, 3)
Labels shape: (12257,)


## Part 1: Regression to Scalar
Set up a compute graph that does regression from the inputs to
the scalar value of the label.

i.e. frame$_i$ --> y, where y is some value [0,5]

In [2]:
# The scope allows you to copy the network architecture between different parts without accidentially sharing some weights
with tf.name_scope('sr'):
    # Step 1: define the compute graph of your regressor from images to
    #     scalar value. The input should be 'flat_inputs'
    # Hint: The first fully_connected layer should have a ReLU activation, the second should not have any activation_fn=None
    ### Your code here ###
    
    h1= tf.layers.dense(flat_inputs,100,activation=tf.nn.relu)
    h2= tf.layers.dense(h1,1,activation=None)
    
    output = tf.identity(h2, name='output') # due to the scope this variable is called 'sr/output'

    # Step 2: use a loss function over your predictions and the ground truth, penalize the mean squared distance between output and 'float_labels'.
    ### Your code here ###
    sr_loss = tf.reduce_mean(tf.losses.mean_squared_error(float_labels,output))
    # Step 3: create an optimizer (MomentumOptimizer with learning rate 0.0001 and momentum 0.9 works well)
    ### Your code here ###
    sr_optimizer= tf.train.MomentumOptimizer(0.0001,0.9)
    # Step 4: use that optimizer on your loss function
    ### Your code here ###
    sr_minimizer= sr_optimizer.minimize(sr_loss)
    
    correct = tf.abs(output-float_labels) < 0.5
    sr_accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

## Part 2: Regression to One-Hot Vector
Set up a compute graph that does regresion from the inputs to a 
one-hot representation of the labels.  Use the onehot_labels variable we have provided you.

i.e. frame$_i$ --> y, where y is [0,...,1,..0] with 1 in the index of the label value.  
ex: (frame$_i$, 2) becomes (frame$_i$, [0, 0, 1, 0, 0, 0])

In [3]:
with tf.name_scope('ohr'):
    # Step 1: define the compute graph of your regressor from images
    #     to one-hot labels (python variable onehot_labels with TF
    #     name 'onehot_lables')
    # Note: use the same input as your first regressor 'flat_inputs',
    #       and different labels, python variable onehot_labels.
    #       You will have three branches off the input, your two regressors
    #           and a classifier.
    ### Your code here ###
    h1= tf.layers.dense(flat_inputs,100,activation=tf.nn.relu)
    h2= tf.layers.dense(h1,num_classes,activation =None)    
    output = tf.identity(h2, name='output')

    # Step 2: use a loss function over your predictions and the ground truth.
    ### Your code here ###
    ohr_loss = tf.reduce_mean(tf.losses.mean_squared_error(onehot_labels,output))
    # Step 3: create an optimizer (MomentumOptimizer with learning rate 0.0001 and momentum 0.9 works well)
    ### Your code here ###
    ohr_optimizer = tf.train.MomentumOptimizer(0.0001,0.9)
    # Step 4: use that optimizer on your loss function
    ### Your code here ###
    ohr_minimizer = ohr_optimizer.minimize(ohr_loss)
    correct = tf.equal(tf.argmax(output, 1), labels)
    ohr_accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    

## Part 3: Softmax + Log-Likelihood

In [4]:
with tf.name_scope('ll'):
    # Step 1: define your compute graph
    ### Your code here ###
    h1= tf.contrib.layers.fully_connected(flat_inputs,100,activation_fn=tf.nn.relu)
    h2= tf.contrib.layers.fully_connected(h1,num_classes,activation_fn=None)
    output = tf.identity(h2, name='output')

    # Step 2: use a classification loss function
    ### Your code here ###
    ll_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=output))
    # Step 3: create an optimizer
    ### Your code here ###
    ll_optimizer = tf.train.MomentumOptimizer(0.0001,0.9)
    # Step 4: use that optimizer on your loss function
    ll_minimizer = ll_optimizer.minimize(ll_loss)
    ### Your code here ###
    correct = tf.equal(tf.argmax(output, 1), labels)
    ll_accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

## Part 4: Softmax + L2-Regression

In [5]:
with tf.name_scope('l2'):
    # Step 1: define your compute graph
    ### Your code here ###
    h1= tf.contrib.layers.fully_connected(flat_inputs,100,activation_fn=tf.nn.relu)
    h2= tf.contrib.layers.fully_connected(h1,num_classes,activation_fn=tf.nn.softmax)
    output = tf.identity(h2, name='output')

    # Step 2: use a classification loss function
    ### Your code here ###
    l2_loss= tf.reduce_mean(tf.losses.mean_squared_error(onehot_labels,output))
    # Step 3: create an optimizer
    ### Your code here ###
    l2_optimizer = tf.train.MomentumOptimizer(0.0001,0.9)
    # Step 4: use that optimizer on your loss function
    l2_minimizer = l2_optimizer.minimize(l2_loss)
    ### Your code here ###
    correct = tf.equal(tf.argmax(output, 1), labels)
    l2_accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

## Part 5: Training

In [6]:
def train_func_l2(batch_images,batch_labels):
    _,acc, loss = sess.run([l2_minimizer,l2_accuracy,l2_loss],feed_dict={inputs:batch_images,labels: batch_labels})
    return acc,loss

def train_func_sr(batch_images,batch_labels):
    _,acc, loss = sess.run([sr_minimizer,sr_accuracy,sr_loss],feed_dict={inputs:batch_images,labels: batch_labels})
    return acc,loss

def train_func_ll(batch_images,batch_labels):
    _,acc, loss = sess.run([ll_minimizer,ll_accuracy,ll_loss],feed_dict={inputs:batch_images,labels: batch_labels})
    return acc,loss

def train_func_ohr(batch_images,batch_labels):
    _,acc, loss = sess.run([ohr_minimizer,ohr_accuracy,ohr_loss],feed_dict={inputs:batch_images,labels: batch_labels})
    return acc,loss

In [7]:
# Batch size
BS = 32

# Start a session
sess = tf.Session()

# Set up training (initialize the variables)
### Your code here ###
sess.run(tf.global_variables_initializer())

# This is a helper function that trains your model for several epochs un shuffled data
# train_func should take a single step in the optmimzation and return accuracy and loss
#   accuracy, loss = train_func(batch_images, batch_labels)
# HINT: train_func should call sess.run
def train(train_func):
    # An epoch is a single pass over the training data
    for epoch in range(20):
        # Let's shuffle the data every epoch
        np.random.seed(epoch)
        np.random.shuffle(image_data)
        np.random.seed(epoch)
        np.random.shuffle(label_data)
        # Go through the entire dataset once
        accs, losss = [], []
        for i in range(0, image_data.shape[0]-BS+1, BS):
            # Train a single batch
            batch_images, batch_labels = image_data[i:i+BS], label_data[i:i+BS]
            acc, loss = train_func(batch_images, batch_labels)
            accs.append(acc)
            losss.append(loss)
        print('[%3d] Accuracy: %0.3f  \t  Loss: %0.3f'%(epoch, np.mean(accs), np.mean(losss)))


# Train scalar regressor network
print('Scalar regressor')
### Your code here ###
train(train_func_sr)

# Train onehot regressor network
print('\nOnehot regressor')
### Your code here ###
train(train_func_ohr)

# Train classifier
print('\nSoftmax+ll regressor')
### Your code here ###
train(train_func_ll)

# Train classifier
print('\nSoftmax+L2 regressor')
### Your code here ###
train(train_func_l2)

Scalar regressor

Onehot regressor

Softmax+ll regressor
[  0] Accuracy: 0.830  	  Loss: 0.520
[  1] Accuracy: 0.925  	  Loss: 0.253
[  2] Accuracy: 0.940  	  Loss: 0.202
[  3] Accuracy: 0.948  	  Loss: 0.173
[  4] Accuracy: 0.953  	  Loss: 0.154
[  5] Accuracy: 0.958  	  Loss: 0.139
[  6] Accuracy: 0.962  	  Loss: 0.127
[  7] Accuracy: 0.966  	  Loss: 0.117
[  8] Accuracy: 0.968  	  Loss: 0.110
[  9] Accuracy: 0.971  	  Loss: 0.103
[ 10] Accuracy: 0.973  	  Loss: 0.096
[ 11] Accuracy: 0.975  	  Loss: 0.090
[ 12] Accuracy: 0.977  	  Loss: 0.085
[ 13] Accuracy: 0.978  	  Loss: 0.081
[ 14] Accuracy: 0.980  	  Loss: 0.078
[ 15] Accuracy: 0.982  	  Loss: 0.074
[ 16] Accuracy: 0.983  	  Loss: 0.070
[ 17] Accuracy: 0.984  	  Loss: 0.068
[ 18] Accuracy: 0.985  	  Loss: 0.065
[ 19] Accuracy: 0.987  	  Loss: 0.062

Softmax+L2 regressor


## Part 6: Evaluation

### See your model

In [11]:
# Show the current graph
util.show_graph(tf.get_default_graph().as_graph_def())

### Compute the valiation accuracy
You'll see some massive overfitting here, but don't worry we will deal with that in the coming weeks.

In [12]:
image_val, label_val = load('tux_val.dat')

print('Input shape: ' + str(image_data.shape))
print('Labels shape: ' + str(label_data.shape))

print("Scalar regressor", sess.run([sr_accuracy, sr_loss], feed_dict={inputs: image_val, labels: label_val}))

# Train onehot regressor network
print("Onehot regressor", sess.run([ohr_accuracy, ohr_loss], feed_dict={inputs: image_val, labels: label_val}))

# Train classifier
print("Softmax+ll regressor", sess.run([ll_accuracy, ll_loss], feed_dict={inputs: image_val, labels: label_val}))

# Train classifier
print("Softmax+l2 regressor", sess.run([l2_accuracy, l2_loss], feed_dict={inputs: image_val, labels: label_val}))

Input shape: (12257, 64, 64, 3)
Labels shape: (12257,)
Scalar regressor [0.09586674, 8.3760815]
Onehot regressor [0.18456033, 1.4744716]
Softmax+ll regressor [0.93481594, 0.20853794]
Softmax+l2 regressor [0.16411042, 0.19269691]


## Part 7: Save Model
Like homework 1 you are turning in your TensorFlow graph.  This time, however, you are saving the trained weights along with the structure.

In [13]:
util.save('assignment3.tfg', session=sess)