# Homework 3, Part 2: Neural Networks

In this homework, we're given already determined weights of a feed-forward (forward propogation) neural network. This neural network consists of an input layer, a single hidden layer, and an output layer. The input layer contains 400 units, which is the size of any one of our training samples. The hidden layer is 25 units. The output layer is 10 units, since there are 10 numbers.

Since this is a multi-class classification problem, our output layer is going to be viewed as a vector, where the single 1 in the vector represents our actual prediction.

In [3]:
from scipy.io import loadmat
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [11]:
raw_data = loadmat("ex3weights.mat")

Theta1 = raw_data["Theta1"]
print(Theta1.shape)
Theta2 = raw_data["Theta2"]

(25, 401)


In [5]:
# Visualizing Theta1 and Theta2
print("ϴ1 -- Contains parameters for 25 examples of the 20 x 20 images to be used for prediction, pre-trained for us\nshape: {}: \ncontents:\n{}\n\n".format(Theta1.shape, Theta1))
print("ϴ2 -- Contains parameters for 10 guesses on 25 hidden-layer-transformed features\nshape: {}\ncontents:\n{}".format(Theta2.shape, Theta2))

ϴ1 -- Contains parameters for 25 examples of the 20 x 20 images to be used for prediction, pre-trained for us
shape: (25, 401): 
contents:
[[ -2.25623899e-02  -1.05624163e-08   2.19414684e-09 ...,  -1.30529929e-05
   -5.04175101e-06   2.80464449e-09]
 [ -9.83811294e-02   7.66168682e-09  -9.75873689e-09 ...,  -5.60134007e-05
    2.00940969e-07   3.54422854e-09]
 [  1.16156052e-01  -8.77654466e-09   8.16037764e-09 ...,  -1.20951657e-04
   -2.33669661e-06  -7.50668099e-09]
 ..., 
 [ -1.83220638e-01  -8.89272060e-09  -9.81968100e-09 ...,   2.35311186e-05
   -3.25484493e-06   9.02499060e-09]
 [ -7.02096331e-01   3.05178374e-10   2.56061008e-09 ...,  -8.61759744e-04
    9.43449909e-05   3.83761998e-09]
 [ -3.50933229e-01   8.85876862e-09  -6.57515140e-10 ...,  -1.80365926e-06
   -8.14464807e-06   8.79454531e-09]]


ϴ2 -- Contains parameters for 10 guesses on 25 hidden-layer-transformed features
shape: (10, 26)
contents:
[[-0.76100352 -1.21244498 -0.10187131 -2.36850085 -1.05778129 -2.2082362

In [8]:
# Extracting the training data (we are actually predicting on it, since we were given weights)
training_data = loadmat("ex3data1.mat")

X = training_data["X"]
y = training_data["y"]

print(X.shape)
print(y.shape)

(5000, 400)
(5000, 1)


In [56]:
y[3812]

array([7], dtype=uint8)

## Predicting a Single Example

Instead of vectorizing and predicting all of our examples in one foul swoop, let's look at one specific training example from the 5000-size dataset.

In [57]:
data_point = np.append(np.array([1.0]), X[3812])
data_point = np.expand_dims(data_point, axis=0).T
print(data_point.shape)

(401, 1)


In [63]:
# Pass the input features through the hidden layer -- (25 x 401) (401 x 1)
print("Dotting the following: Theta {} by X {}".format(Theta1.shape, data_point.shape), end="\n\n")

hidden_layer_vals = np.dot(Theta1, data_point)

# Add the bias unit
hidden_layer_vals = np.append(np.array([[1.0]]), hidden_layer_vals, axis=0)
print("a(2) shape: {}".format(hidden_layer_vals.shape))
print(hidden_layer_vals)

Dotting the following: Theta (25, 401) by X (401, 1)

a(2) shape: (26, 1)
[[ 1.        ]
 [ 2.31599094]
 [-2.19745992]
 [-4.05277214]
 [-7.95914969]
 [-1.89579245]
 [-3.13976114]
 [-1.15582907]
 [-7.84729598]
 [ 7.10932691]
 [ 3.85164199]
 [ 2.91201551]
 [ 5.13771229]
 [-9.68106941]
 [ 1.56192124]
 [ 4.00918859]
 [ 2.07040675]
 [ 3.04964752]
 [-1.46478672]
 [-8.94944647]
 [ 1.65808403]
 [ 2.49795108]
 [ 0.68739891]
 [ 2.59157764]
 [ 5.61723477]
 [ 1.06699791]]


In [59]:
# Here, the highest number is our prediction. Note that index 0 --> NUMBER 1, index 1 --> NUMBER 2, and so forth, to index 9 --> NUMBER 0 
# np.dot(Theta2, hidden_layer_vals)
print("Theta2 shape: {}".format(Theta2.shape))
print("a(2) shape: {}".format(hidden_layer_vals.shape))

print(np.dot(Theta2, hidden_layer_vals))

prediction_index = np.argmax(np.dot(Theta2, hidden_layer_vals))


if prediction_index == 9:
    print("Prediction, the number is: 0")
else:
    print("Prediction, the number is: {}".format(prediction_index + 1))

Theta2 shape: (10, 26)
a(2) shape: (26, 1)
[[-33.02020298]
 [-21.32414679]
 [  6.58689058]
 [-28.02536415]
 [-44.6335533 ]
 [-73.31043753]
 [ 78.82747973]
 [ 57.07325691]
 [ 38.72332937]
 [ -0.98641174]]
Prediction, the number is: 7


## Vectorized Implementation

In [146]:
X_mod = np.append(np.ones((5000, 1)), X, axis=1).T
print(X_mod.shape)

(401, 5000)
[[ 1.  1.  1. ...,  1.  1.  1.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]


In [145]:
print(Theta1.shape)
hidden_layer_vec = np.dot(Theta1, X_mod)
print(hidden_layer_vec)
print(hidden_layer_vec.shape)

(25, 401)
[[ -2.93684669  -4.81302157  -4.24056958 ...,  -0.86267303   1.74408423
    3.55683614]
 [ -2.45058587  -2.92257775  -3.68698052 ...,   1.00939507  -0.58216518
  -12.11330792]
 [  4.95510333   2.6445065    5.99656398 ...,  -1.67526051  -1.49164167
    5.01096205]
 ..., 
 [  3.56635593   2.10497303   1.54599347 ...,   1.8185898    4.17481481
    7.17585008]
 [  2.81388641   4.69948787   3.08971226 ...,  -3.18203449  -0.96739536
    2.15484114]
 [ -2.1195223   -2.76096862  -2.32990819 ...,  -1.72539781  -3.08906563
   -2.9424052 ]]
(25, 5000)


In [151]:
# Add a bias unit to every single example we want to predict (vectorized)
hidden_layer_vec_mod = np.append(np.ones((1, 5000)), hidden_layer_vec, axis=0)

print("Dotting Theta2 {} by a2 {}".format(Theta2.shape, hidden_layer_vec_mod.shape))

# DOT the bias-unit'ed examples with the parameters (includes Theta unit0)
predicted_indices = np.dot(Theta2, hidden_layer_vec_mod).T

final_predictions = []
for number in np.argmax(predicted_indices, axis=1):
    if number == 9:
        final_predictions.append(10)
    else:
        final_predictions.append(number + 1)

[ 1.         -0.03136708 -3.22581328  5.29932574  4.22742674  3.58295805
  3.2346937  -2.82338367  2.29061189 -2.19212986 -4.02366895  1.96137975
  2.75790319  2.63538245 -3.9115225  -4.76091198  1.09641258 -4.67803377
 -3.21945004  0.71602053 -3.98601791 -2.95028002  2.36358227  1.60855604
 -3.54186751 -0.48885916]
Dotting Theta2 (10, 26) by a2 (26, 5000)


In [116]:
pri

array([1], dtype=uint8)

In [127]:
np.mean(np.expand_dims(np.array(final_predictions), axis=0) == y.T)

0.69620000000000004

In [124]:
y.shape

(5000, 1)

In [162]:
truth_count = 0
false_count = 0

for i in range(len(final_predictions)):
    if final_predictions[i] == y[i][0]:
        truth_count += 1
    else:
        false_count += 1

print(truth_count / (truth_count + false_count))

0.6962


## Note: 69.92% is lower than expected accuracy, 97.5%. 
Later, I will investigate as to what could be causing this.

In [159]:
y[2][0]

10

In [175]:
predicted_indices[321]

array([ -89.31046982,  -36.26828031,    3.72578413,  -41.22298479,
         15.80455727,    7.55327607,  -59.98363457,   65.5423274 ,
          4.19458733,  109.90894172])

In [171]:
y[3823]

array([7], dtype=uint8)

In [178]:
z = predicted_indices[321]
z

array([ -89.31046982,  -36.26828031,    3.72578413,  -41.22298479,
         15.80455727,    7.55327607,  -59.98363457,   65.5423274 ,
          4.19458733,  109.90894172])