# Putting It All Together
### by Long Nguyen

Congratulations for making it this far! In this notebook, we put together some of the functions implemented in the previous notebooks as well as finally implementing gradient descent to recognize handwritten digits. 

Feel free to copy and paste from your work on previous notebooks. 

In [35]:
from mnist_loader import load_data_wrapper
import numpy as np
import random
import matplotlib.pyplot as plt

In [36]:
training_data, validation_data, test_data = load_data_wrapper()

In [37]:
def plot_images(images):
    "Plot a list of MNIST images."
    fig, axes = plt.subplots(nrows=1, ncols=len(images))
    for j, ax in enumerate(axes):
        ax.matshow(images[j][0].reshape(28,28), cmap = plt.cm.binary)
        ax.set_xticks([])
        ax.set_yticks([])
    plt.show()

#### Implement $\sigma(x)$. 

In [38]:
def sigmoid(x):
    """The sigmoid function."""


#### Implement the derivative of $\sigma$. (Hint: $\sigma'(x)=\sigma(x)(1-\sigma(x))$)

In [39]:
def sigmoid_prime(x):
    """Derivative of the sigmoid function."""


#### Implement the score function.

In [40]:
def f(x, W1, W2, B1, B2):
    """Return the output of the network if ``x`` is input image and
    W1, W2, B1 and B2 are the learnable weights. """
    


In [41]:
def predict(images, W1, W2, B1, B2):
    predictions = []
    for im in images:
        a = f(img[0], W1, W2, B1, B2)
        predictions.append(np.argmax(a))
    return predictions

#### Implement vectorize_mini_batch.

In [42]:
def vectorize_mini_batch(mini_batch, size):
    """Given a minibatch of (image,lable) tuples of a certain size
    return the tuple X,Y where X contains all of the images and Y contains
    all of the labels stacked horizontally """
   

    return X, Y

Suppose we have an $L$-layer neural network. For an $m\times n$ matrix $A$, let i-th column of A be denoted by $A[i]$. 

Let $\cdot$ denote matrix multiplication and $\odot$ denote element-wise multiplication. 

These are the four equations of backpropagation. 

\begin{align}
\frac{\partial J}{\partial Z_L}&=\frac{1}{m}(A_L-Y)\odot\sigma'(Z_L)\\
\frac{\partial J}{\partial Z_i}&=\frac{1}{m}W_{i+1}^T\cdot \frac{\partial J}{\partial Z_{i+1}}\odot\sigma'(Z_i)\\
\frac{\partial J}{\partial W_i}
&=\frac{\partial J}{\partial Z_i}\cdot A_{i-1}^T\\
\frac{\partial J}{\partial B_i}
&=\displaystyle\sum_i \frac{\partial J}{\partial Z_i}[i]
\end{align}

#### Implement gradient descent. 

In [43]:
def SGD(training_data, epochs, mini_batch_size, eta, test_data):
    """Gradient descent. 
    Epochs: the number of times the entire training_data is examined.
    mini_batch_size: the number of images used to approximate the gradient 
    each step of gradient descent.
    eta: the learning rate or the step size.
    test_data: check accuracy of the model against the test_data every epoch.
    """
    n = len(training_data)
    n_test = len(test_data)
    
    # randomize the learnable parameters
    # use np.random.randn(m,n) for appropriate (m,n)
    # use 2-layer neural network with 30-dimensional hidden layer
    W1 = 
    W2 = 
    B1 = 
    B2 = 
    
    for j in range(epochs):
        random.shuffle(training_data)
        for k in range(0, n, mini_batch_size):
            # mini_batch of size mini_batch_size
            mini_batch = 
            
            # create vectorized input X and labels Y
            X, Y = 
            
            
            #feed forward(vectorized)
            Z1 = 
            A1 =  
            Z2 = 
            A2 = 
                    
            # backpropagate(vectorized) 
            # use the four equations of backpropagation
            dZ2 = 
            dW2 = 
            
            # for dB1,dB2 use np.sum with the third argument keepdims=True
            # so that the dimensions do not collapse.
            dB2 = 
            dZ1 = 
            dW1 = 
            dB1 =
            # update parameters by making a gradient step
            W1 = 
            W2 = 
            B1 =
            B2 = 
            
            
        # after every epoch, check the accuracy of the model    
        test_results = [(np.argmax(f(x, W1, W2, B1, B2)), y) for (x, y) in test_data]
        num_correct = sum(int(x == y) for (x, y) in test_results)
        print("Epoch {} : {} / {}".format(j, num_correct, n_test));
    return W1, B1, W2, B2


In [44]:
W1, B1, W2, B2 = SGD(training_data, 30, 10, 3, test_data)

Epoch 0 : 7333 / 10000
Epoch 1 : 8562 / 10000
Epoch 2 : 8764 / 10000
Epoch 3 : 8894 / 10000
Epoch 4 : 8991 / 10000


KeyboardInterrupt: 