## Machine Learning Online Class - Exercise 4 Neural Network Learning

###  Instructions
In this exercise, you will implement the backpropagation algorithm for neural
networks and apply it to the task of hand-written digit recognition. Before
starting on the programming exercise, we strongly recommend watching the
video lectures and completing the review questions for the associated topics.
To get started with the exercise, you will need to download the starter
code and unzip its contents to the directory where you wish to complete the
exercise. If needed, use the cd command in Octave/MATLAB to change to
this directory before starting this exercise.
You can also find instructions for installing Octave/MATLAB in the “En-
vironment Setup Instructions” of the course website.

```
------------
Files included in this exercise
    ex4.m - Octave/MATLAB script that steps you through the exercise
    ex4data1.mat - Training set of hand-written digits
    ex4weights.mat - Neural network parameters for exercise 4
    submit.m - Submission script that sends your solutions to our servers
    displayData.m - Function to help visualize the dataset
    fmincg.m - Function minimization routine (similar to fminunc)
    sigmoid.m - Sigmoid function
    computeNumericalGradient.m - Numerically compute gradients
    checkNNGradients.m - Function to help check your gradients
    debugInitializeWeights.m - Function for initializing weights
    predict.m - Neural network prediction function
    [*] sigmoidGradient.m - Compute the gradient of the sigmoid function
    [*] randInitializeWeights.m - Randomly initialize weights
    [*] nnCostFunction.m - Neural network cost function
```

Setup the parameters you will use for this exercise
```
input_layer_size  = 400;  % 20x20 Input Images of Digits
hidden_layer_size = 25;   % 25 hidden units
num_labels = 10;          % 10 labels, from 1 to 10   
                          % (note that we have mapped "0" to label 10)
```                          

In [1]:
input_layer_size  = 400
hidden_layer_size = 25
num_labels = 10

## ======== Part 1: Loading and Visualizing Data ==========
```
%  We start the exercise by first loading and visualizing the dataset. 
%  You will be working with a dataset that contains handwritten digits.
%

% Load Training Data
fprintf('Loading and Visualizing Data ...\n')

load('ex4data1.mat');
m = size(X, 1);

% Randomly select 100 data points to display
sel = randperm(size(X, 1));
sel = sel(1:100);

displayData(X(sel, :));

fprintf('Program paused. Press enter to continue.\n');
pause;
```

In [2]:
import numpy as np
import scipy.io
data = scipy.io.loadmat('ex4data1.mat')

In [3]:
X = data['X']
y = data['y']

In [4]:
X.shape

(5000, 400)

In [5]:
y.shape

(5000, 1)

## ============= Part 2: Loading Parameters =============
```
% In this part of the exercise, we load some pre-initialized 
% neural network parameters.

fprintf('\nLoading Saved Neural Network Parameters ...\n')

% Load the weights into variables Theta1 and Theta2
load('ex4weights.mat');

% Unroll parameters 
nn_params = [Theta1(:) ; Theta2(:)];
```

In [6]:
weights = scipy.io.loadmat('ex4weights.mat')

In [7]:
Theta1= weights['Theta1']
Theta2= weights['Theta2']

In [8]:
Theta1.shape

(25, 401)

In [9]:
Theta2.shape

(10, 26)

In [10]:
nn_params = np.concatenate((Theta1.ravel(), Theta2.ravel()))

In [11]:
nn_params.shape

(10285,)

## ============ Part 3: Compute Cost (Feedforward) ============
```
%  To the neural network, you should first start by implementing the
%  feedforward part of the neural network that returns the cost only. You
%  should complete the code in nnCostFunction.m to return cost. After
%  implementing the feedforward to compute the cost, you can verify that
%  your implementation is correct by verifying that you get the same cost
%  as us for the fixed debugging parameters.
%
%  We suggest implementing the feedforward cost *without* regularization
%  first so that it will be easier for you to debug. Later, in part 4, you
%  will get to implement the regularized cost.
%
fprintf('\nFeedforward Using Neural Network ...\n')

% Weight regularization parameter (we set this to 0 here).
lambda = 0;

J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ...
                   num_labels, X, y, lambda);

fprintf(['Cost at parameters (loaded from ex4weights): %f '...
         '\n(this value should be about 0.287629)\n'], J);

fprintf('\nProgram paused. Press enter to continue.\n');
pause;
```

In [12]:
from nnCostFunction import nnCostFunction

In [17]:
lbd = 0

J, grad = nnCostFunction(nn_params, input_layer_size, hidden_layer_size,
                   num_labels, X, y, lbd)
print('Cost at parameters (loaded from ex4weights): {:.6f}\n(this value should be about 0.287629)'.format(J));

Cost at parameters (loaded from ex4weights): 0.287629
(this value should be about 0.287629)


## ============ Part 4: Implement Regularization ============
```
%  Once your cost function implementation is correct, you should now
%  continue to implement the regularization with the cost.
%

fprintf('\nChecking Cost Function (w/ Regularization) ... \n')

% Weight regularization parameter (we set this to 1 here).
lambda = 1;

J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ...
                   num_labels, X, y, lambda);

fprintf(['Cost at parameters (loaded from ex4weights): %f '...
         '\n(this value should be about 0.383770)\n'], J);

fprintf('Program paused. Press enter to continue.\n');
pause;
```

In [16]:
lbd = 1
J, grad = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lbd)

print('Cost at parameters (loaded from ex4weights): {:.6f}\n(this value should be about 0.383770)'.format (J))

Cost at parameters (loaded from ex4weights): 0.383770
(this value should be about 0.383770)


## ============ Part 5: Sigmoid Gradient  =============
```
%  Before you start implementing the neural network, you will first
%  implement the gradient for the sigmoid function. You should complete the
%  code in the sigmoidGradient.m file.
%

fprintf('\nEvaluating sigmoid gradient...\n')

g = sigmoidGradient([-1 -0.5 0 0.5 1]);
fprintf('Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:\n  ');
fprintf('%f ', g);
fprintf('\n\n');

fprintf('Program paused. Press enter to continue.\n');
pause;
```

In [18]:
from sigmoidGradient import sigmoidGradient

In [19]:
g = sigmoidGradient(np.array([-1, -0.5, 0, 0.5, 1]))
print('Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:\n  ')
print(g)

Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:
  
[0.19661193 0.23500371 0.25       0.23500371 0.19661193]


## ============= Part 6: Initializing Pameters =============
```
%  In this part of the exercise, you will be starting to implment a two
%  layer neural network that classifies digits. You will start by
%  implementing a function to initialize the weights of the neural network
%  (randInitializeWeights.m)

fprintf('\nInitializing Neural Network Parameters ...\n')

initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);

% Unroll parameters
initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];
```

In [20]:
from randInitializeWeights import randInitializeWeights

In [21]:
initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size)
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels)

# Unroll parameters
initial_nn_params = np.concatenate((initial_Theta1.ravel() , initial_Theta2.ravel()))

In [22]:
initial_nn_params.shape

(10285,)

## ============ Part 7: Implement Backpropagation ============
```
%  Once your cost matches up with ours, you should proceed to implement the
%  backpropagation algorithm for the neural network. You should add to the
%  code you've written in nnCostFunction.m to return the partial
%  derivatives of the parameters.
%
fprintf('\nChecking Backpropagation... \n');

%  Check gradients by running checkNNGradients
checkNNGradients;

fprintf('\nProgram paused. Press enter to continue.\n');
pause;
```

In [23]:
from checkNNGradients import checkNNGradients

In [24]:
checkNNGradients(lambda_reg=0)

Numerical Gradient       Analytical Gradient
2.8383186911895564e-05   2.838318795382669e-05
0.00015401021569161344   0.00015401021555703823
0.00044129794485314733   0.00044129794658824615
0.00032285838003076606   0.000322858380675971
0.0008000839057942244    0.0008000839034693133
0.0002456484216040167    0.00024564842185084484
0.0003892400224358994    0.000389240023517967
0.00017496614379552966   0.00017496614263500808
-0.002644809895535616    -0.0026448098991406865
-0.0002919238517584688   -0.0002919238506639142
0.00024311816337885261   0.00024311816212281044
0.0005546384551635697    0.0005546384578506715
-0.002312306377483253    -0.002312306379408663
-0.00021125534210852948  -0.00021125534212601012
0.00034176947716346717   0.00034176947746973407
0.0005805730163288558    0.0005805730156305033
-0.0017960278420048326   -0.0017960278440286234
-0.0001138126459743205   -0.00011381264664858063
0.00041312829379691607   0.00041312829143803616
0.0005602409847149659    0.0005602409836152521
0.1

## ============ Part 8: Implement Regularization ============
```
%  Once your backpropagation implementation is correct, you should now
%  continue to implement the regularization with the cost and gradient.
%

fprintf('\nChecking Backpropagation (w/ Regularization) ... \n')

%  Check gradients by running checkNNGradients
lambda = 3;
checkNNGradients(lambda);

% Also output the costFunction debugging values
debug_J  = nnCostFunction(nn_params, input_layer_size, ...
                          hidden_layer_size, num_labels, X, y, lambda);

fprintf(['\n\nCost at (fixed) debugging parameters (w/ lambda = %f): %f ' ...
         '\n(for lambda = 3, this value should be about 0.576051)\n\n'], lambda, debug_J);

fprintf('Program paused. Press enter to continue.\n');
pause;
```

In [26]:
# Check gradients by running checkNNGradients
lbd = 3
checkNNGradients(lbd)
# Also output the costFunction debugging values
debug_J, grad  = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lbd)

print('\nCost at (fixed) debugging parameters (for lambda = 3, this value should be about 0.576051)\n')
print('For lambda= {} this cost value will be {:.6f}'.format(lbd, debug_J))

Numerical Gradient       Analytical Gradient
2.8383186911895564e-05   2.838318795382669e-05
-0.0452541395024042      -0.04525413950291865
0.059802792742313926     0.05980279274399115
-0.031871516699144564    -0.03187151669935012
0.0008000839057942244    0.0008000839034693133
0.050733907510647214     0.05073390751032463
-0.05714621645669382     -0.05714621645627034
0.024902075257404732     0.0249020752571404
-0.002644809895535616    -0.0026448098991406865
-0.05797577336430493     -0.057975773363437316
0.054800963773526945     0.05480096377166371
-0.016210291438056856    -0.01621029143408488
-0.002312306377483253    -0.002312306379408663
0.05922518599987825      0.059225185999566214
-0.04471746532841436     -0.04471746532883083
0.009047773499304412     0.009047773499222535
-0.0017960278420048326   -0.0017960278440286234
-0.06011322503995942     -0.06011322503969079
0.039430398703910186     0.03943039870086505
0.00955287356552148      0.009552873563392392
0.11405284848109432      0.114052

## ================ Part 8: Training NN ================
```
%  You have now implemented all the code necessary to train a neural 
%  network. To train your neural network, we will now use "fmincg", which
%  is a function which works similarly to "fminunc". Recall that these
%  advanced optimizers are able to train our cost functions efficiently as
%  long as we provide them with the gradient computations.
%
fprintf('\nTraining Neural Network... \n')

%  After you have completed the assignment, change the MaxIter to a larger
%  value to see how more training helps.
options = optimset('MaxIter', 50);

%  You should also try different values of lambda
lambda = 1;

% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction(p, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, X, y, lambda);

% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);

% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

fprintf('Program paused. Press enter to continue.\n');
pause;
```

In [29]:
from scipy.optimize import minimize


In [30]:
maxiter = 50
lambda_reg = 1s
myargs = (input_layer_size, hidden_layer_size, num_labels, X, y, lambda_reg)
results = minimize(nnCostFunction, x0=nn_params, args=myargs, options={'disp': True, 'maxiter':maxiter}, method="L-BFGS-B", jac=True)

nn_params = results["x"]

Theta1 = nn_params[0: hidden_layer_size*(input_layer_size+1)].reshape(hidden_layer_size, input_layer_size + 1) #25x401

Theta2 = nn_params[Theta1.shape[0]*Theta1.shape[1]:].reshape(num_labels, hidden_layer_size + 1) #10x26

In [31]:
results

      fun: 0.31722116453066795
 hess_inv: <10285x10285 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 2.31088680e-04, -6.79088496e-13,  1.41068089e-13, ...,
        1.73932611e-04,  1.31729249e-04,  2.21663569e-05])
  message: b'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
     nfev: 55
      nit: 50
   status: 1
  success: False
        x: array([-6.38865427e-02, -3.39544248e-09,  7.05340443e-10, ...,
       -3.51186836e-01,  1.95303935e+00, -1.74870182e+00])

## =============== Part 10: Implement Predict ===============
```
%  After training the neural network, we would like to use it to predict
%  the labels. You will now implement the "predict" function to use the
%  neural network to predict the labels of the training set. This lets
%  you compute the training set accuracy.

pred = predict(Theta1, Theta2, X);

fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);
```

In [32]:
from predict import predict

In [33]:
pred = predict(Theta1, Theta2, X)
print('Training Set Accuracy:', np.mean(pred == y.flatten()) * 100)

Training Set Accuracy: 99.24
