"In the previous part of this exercise, you implemented multi-class logistic regression to recognize handwritten digits. However, logistic regression cannot form more complex hypotheses as it is only a linear classifier.3
In this part of the exercise, you will implement a neural network to recognize handwritten digits using the same training set as before. The neural network will be able to represent complex models that form non-linear hypotheses. For this week, you will be using parameters from a neural network that we have already trained. Your goal is to implement the feedforward propagation algorithm to use our weights for prediction."

"Neural Network has 3 layers – an input layer, a hidden layer and an output layer. ...You have been provided with a set of network parameters ($\theta^1$, $\theta^2$)) already trained by us. These are stored in ex3weights.mat"

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [10]:
import scipy.io as sio
weights = sio.loadmat('../data/ex3weights.mat')
theta1, theta2 = weights['Theta1'], weights['Theta2']
theta = [theta1, theta2]

Let's check the dimensions of theta1 and theta2:

In [11]:
theta1.shape, theta2.shape

((25, 401), (10, 26))

Also import the data itself as before:

In [56]:
data = sio.loadmat('../data/ex3data1.mat')
X = np.float64(data["X"])
y = np.float64(data["y"])
X.shape, y.shape

((5000, 400), (5000, 1))

So we have three layers. First one is input with 400 elements. Then we have a middle layer with 25 elements. Finally 10 nods are for the result. This will be a simple case of computation of probabilities.

In [58]:
def sigmoid(z):
    return 1.0 / (1.0 + np.exp(-z))

def h(theta, X):
    """
    Hypothesis Function where
    X is an n x k_prev dimensional array of explanatory variables
    theta is k_prev x k_new elements vector
    Result will be one dimensional vector of n variables
    """
    return sigmoid(np.dot(X, theta))

Define the forward propagation mechanism. It will take in theta, which has a dimensionality of $k_{new}$x$k_{old}$ 

In [68]:
def forward_propagation(X, theta):
    """
    A function that will take input matrix X, and move it one layer on 
    in the neural network.
    X will be a n x k_prev matrix - n entries, k_prev properties
    theta will be a k_new x k_prev matrix. Function will transpose this 
    to use in hypothesis functions
    """
    #insert ones to data matrix
    X = np.insert(X, 0, 1, axis=1)
    #transpose theta to feed into hypothesis function
    theta = theta.T
    return h(theta, X)

Calculate the mid and final nodes and store them in a list called Xs:

In [70]:
Xs = [X]
for i in range(2):
    iterated = forward_propagation(Xs[i], theta[i])
    Xs.append(iterated)

Last element of Xs will provide the probabilities. Let's locate the max probability it assigns for each row:

In [71]:
Prob_argmax = np.argmax(X3, axis=1)

Because of the way the data is arranged we have a mismatch with y and final element of list Xs. 1 in the dataset correspond to 0; 2 to 1; 3 to 2 and so on. Also 0 in images, which correspond to 10 in the dataset here correspond to 9. Good news is that, I can add one to each number and it will solve my problem.

In [72]:
Prob_argmax += 1

Finally calculate the accuracy rate:

In [73]:
total_corrects = np.sum( (y.flatten() == np.float64(Prob_argmax)) )
total_dpoints = X.shape[0]

accuracy_rate = total_corrects/total_dpoints
accuracy_rate

0.9752

We have a 97.5% accuracy rate. Which is slight improvement over our logistic regression. In Andrew Ng's calculations Logistic regression only gave 94.5% accuracy, which was due to the library of optimization used by him and me are different. On the other hand real test should be done on a separate test set. We probably are overfitting here. 