# Building an Architectural Classifier - Notebook 2 - DNNs

The goal in this notebook is to take our pre-processed dataset of interior architectural imagery (containing images of kitchens, bathrooms, bedrooms, living rooms, etc...) and build a machine learning model that can accurately classify when it is looking at an image of a kitchen. This is the second notebook in a series, so I'll omit some of the explanatory notes on the boilerplate from before.
#### Model:
Having tried logistic regression and not being satisfied with the results, now we'll look at using deep neural nets

In [1]:
import math
import time
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

%matplotlib inline

# Load in the data

In [2]:
x = np.load('./all_X_shuffled_85_x4686.npy')
y = np.load('./all_Y_shuffled_2_x4686.npy')

print('x: %s | y: %s' % (x.shape, y.shape))

x: (4686, 85, 85, 3) | y: (4686, 2)


### Split the data up into train / test / validation (80% / 10% / 10%)

In [3]:
def split(x, y, test=0.1, train=0.8, validation=0.1):
    assert(len(x) == len(y))
    test_size = int(len(x) * test)
    train_size = int(len(x) * train)
    valid_size = int(len(x) * validation)
    
    x_train = np.array(x[:train_size])
    y_train = np.array(y[:train_size])
    x_val = np.array(x[train_size:train_size + valid_size])
    y_val = np.array(y[train_size:train_size + valid_size])
    x_test = np.array(x[train_size + valid_size:])
    y_test = np.array(y[train_size + valid_size:])
    
    return (x_train, y_train, x_val, y_val, x_test, y_test)

In [4]:
x_train, y_train, x_val, y_val, x_test, y_test = split(x,y)

print('x_train: ', x_train.shape)
print('y_train: ', y_train.shape)
print('x_val:   ', x_val.shape)
print('y_val:   ', y_val.shape)
print('x_test:  ', x_test.shape)
print('y_test:  ', y_test.shape)

x_train:  (3748, 85, 85, 3)
y_train:  (3748, 2)
x_val:    (468, 85, 85, 3)
y_val:    (468, 2)
x_test:   (470, 85, 85, 3)
y_test:   (470, 2)


### Balancing
The numpy files we just imported were built in a seperate notebook, and they have already been shuffled randomly and balanced across classes, but lets check just to be sure, and to establish a baseline error:

In [5]:
print('Training balance:   ', np.sum(y_train, axis=0)[1] / len(y_train))
print('Validation balance: ', np.sum(y_val, axis=0)[1] / len(y_val))
print('Testing balance:    ', np.sum(y_test, axis=0)[1] / len(y_test))

Training balance:    0.482924226254
Validation balance:  0.480769230769
Testing balance:     0.436170212766


# Define a model
## Lets start with a fairly small model, 2 hidden layers each with 100 neurons

First a placeholder variable for our inputs to TF

In [6]:
x_input = tf.placeholder(tf.float32, [None, 85, 85, 3], name='x_input')
y_input = tf.placeholder(tf.float32, [None, 2], name='y_input')
# We'll use keep prob to feed in the dropout hyper-parameter
keep_prob = tf.placeholder(tf.float32, name='keep_prob')

Now variables to hold the weights and biases

In [7]:
# Hidden layer one
W1 = tf.get_variable("W1", [85 * 85 * 3, 100], initializer= tf.contrib.layers.xavier_initializer())
b1 = tf.get_variable("b1", [100], initializer= tf.zeros_initializer())

# Hidden layer two
W2 = tf.get_variable('W2', [100, 20], initializer= tf.contrib.layers.xavier_initializer())
b2 = tf.get_variable('b2', [20], initializer= tf.zeros_initializer())

# Output layer
W3 = tf.get_variable('W3', [20, 2], initializer= tf.contrib.layers.xavier_initializer())
b3 = tf.get_variable('b3', [2], initializer= tf.zeros_initializer())

The model is mathematically very similar to the logistic regression model, this time we apply the same matmul operation across multiple layers successively, feeding the output of one into the input of the next. Notice that our weights matrix is much larger, reflecting the full connection of all weights to all neurons. Finally note the tf.nn.relu() function, this applies a non-linearity to each neuron allowing it to model more flexible decision boundaries.
$$ logits = x \boldsymbol{\cdot} W + b $$

In [8]:
# flatten the input
flat_input = tf.reshape(x_input, [-1, 85 * 85 * 3])

# Hidden layer one
activations_one = tf.nn.relu(tf.add(tf.matmul(flat_input, W1), b1), name='activations_one')
dropout_one = tf.nn.dropout(activations_one, keep_prob, name='dropout_one')

# Hidden layer two
activations_two = tf.nn.relu(tf.add(tf.matmul(dropout_one, W2), b2), name='activations_two')
dropout_two = tf.nn.dropout(activations_two, keep_prob, name='dropout_two')

# Output layer
logits = tf.add(tf.matmul(dropout_two, W3), b3)

We'll still be using softmax cross-entropy for our cost function. Since we're using dropout regularization we'll eliminate the L2 penalty, this isn't a rule, it may come back into play later but its easier to manage one hyper-parameter at a time.

In [9]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = y_input))

In [10]:
# The argument to Adam here is the learning rate and it can (and should) be experimented with
training_step = tf.train.AdamOptimizer(1e-5).minimize(cross_entropy)

Last, I'll set up an accuracy metric to validate on and add a summary so we can watch it train on tensorboard

In [11]:
predictions = tf.argmax(logits, axis=1)
truths = tf.argmax(y_input, axis=1)
correct_predictions = tf.equal(predictions, truths)
accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float32))
tf.summary.scalar('accuracy', accuracy)

<tf.Tensor 'accuracy:0' shape=() dtype=string>

## Finally lets start a session and begin training

In [12]:
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter('logs/deep_softmax/train', sess.graph)
valid_writer = tf.summary.FileWriter('logs/deep_softmax/valid', sess.graph)

In [13]:
num_epochs = 5000

start_time = time.time()
for epoch in range(num_epochs):
    if epoch % 10 == 0:
        # every 10th epoch write out accuracy on training and validation
        summary_train, train_acc = sess.run([merged, accuracy], {x_input: x_train, y_input: y_train, keep_prob: 1.0})
        summary_valid, valid_acc = sess.run([merged, accuracy], {x_input: x_val, y_input: y_val, keep_prob: 1.0})
        train_writer.add_summary(summary_train, epoch)
        valid_writer.add_summary(summary_valid, epoch)
        if epoch != 0: 
            time_taken = round((time.time() - start_time) / epoch, 3)
            print('Epoch %s | Time per epoch: %s | Train Accuracy: %s | Validation Accuracy: %s' % (epoch, time_taken, train_acc, valid_acc))
    sess.run([training_step], {x_input: x_train, y_input: y_train, keep_prob: 0.5})

train_writer.close()
print('Time Taken: ', time.time() - start_time)

Epoch 10 | Time per epoch: 0.726 | Train Accuracy: 0.525347 | Validation Accuracy: 0.508547
Epoch 20 | Time per epoch: 0.689 | Train Accuracy: 0.532818 | Validation Accuracy: 0.504274
Epoch 30 | Time per epoch: 0.674 | Train Accuracy: 0.537353 | Validation Accuracy: 0.482906
Epoch 40 | Time per epoch: 0.667 | Train Accuracy: 0.534685 | Validation Accuracy: 0.529915
Epoch 50 | Time per epoch: 0.664 | Train Accuracy: 0.541889 | Validation Accuracy: 0.547009

...Edited for length in this gist, see tensorboard summary below for full training details...
Epoch 4950 | Time per epoch: 0.647 | Train Accuracy: 1.0 | Validation Accuracy: 0.619658
Epoch 4960 | Time per epoch: 0.647 | Train Accuracy: 1.0 | Validation Accuracy: 0.621795
Epoch 4970 | Time per epoch: 0.647 | Train Accuracy: 1.0 | Validation Accuracy: 0.619658
Epoch 4980 | Time per epoch: 0.647 | Train Accuracy: 1.0 | Validation Accuracy: 0.619658
Epoch 4990 | Time per epoch: 0.647 | Train Accuracy: 1.0 | Validation Accuracy: 0.621795


### Looking better! :)
[You can view the tensorboard summary here.](https://raw.githubusercontent.com/McCulloughRT/machine-learning/master/dnn_training.png) Purple is the validation accuracy and blue training.

We reached 100% training accuracy, so the model is easily complex enough. However while our validation error did improve significantly, it didn't keep pace with the training error, showing a big overfitting issue even with dropout regularization.
#### Lets check the models accuracy against the training set

In [15]:
sess.run(accuracy, {x_input: x_test, y_input: y_test, keep_prob:1.0})

0.6510638

In [16]:
sess.close()