## Lecture 1: Introduction to Neural Networks

### by Long Nguyen

This homework notebook is supplemental to Lecture 1 of the series "Image Recognition with Neural Networks".


### The MNIST Dataset

In [1]:
from mnist_loader import load_data_wrapper
import numpy as np
import matplotlib.pyplot as plt

In [2]:
training_data, validation_data, test_data = load_data_wrapper()

#### The variables training_data, validation_data and test_data above are each a list of (image, label) tuples where image and label are numpy arrays. We won't use the validation_data for this course. How many images are in training_data? 


#### Unpack the first image into two variables img1, lb1. What is the shape of each image? What is the shape of each label? 

Both the image and its label are rank 2 numpy arrays of shape (781,1) and (10,1), respectively. A label is a one-hot encoding of the digit.  

#### Print out the lb1. What digit is this first image of the training set? The 100th image? (Answers: 5 and 1.) 

The function plot_mnist_digit below draw an MNIST image using the matplotlib library.  

In [5]:
def plot_image(image):
    """ Plot a single MNIST image."""
    image = image.reshape(28,28)
    fig, axes = plt.subplots()
    axes.matshow(image, cmap=plt.cm.binary)
    plt.show()

#### Call the function to plot the first image.

In [None]:
def plot_images(images):
    "Plot a list of MNIST images."
    fig, axes = plt.subplots(nrows=1, ncols=len(images))
    for j, ax in enumerate(axes):
        ax.matshow(images[j][0].reshape(28,28), cmap = plt.cm.binary)
        ax.set_xticks([])
        ax.set_yticks([])
    plt.show()

#### Call the function above to plot first 10 images.

#### The sigmoid function is define as $$\sigma(x)=\frac{1}{1+e^{-x}}.$$ Implement the sigmoid function. Hint: Use np.exp() for the exponential function. 

In [None]:
def sigmoid(x):
    """Returns the output of the sigmoid or logistic function."""

    

Given a vector $\vec{x}\in\mathbb{R}^n$, the sigmoid function $\sigma:\mathbb{R}\rightarrow\mathbb{R}$ can be extended to a vector-valued function $\sigma:\mathbb{R}^n\rightarrow\mathbb{R}^n$ by applying $\sigma$ elementwise. 
That is, if
$$\vec{x}=\left[ \begin{array}{cccc}
x_{1} \\
x_{2} \\
\vdots \\
x_{m} 
\end{array} \right]$$
then
$$\sigma(\vec{x})=\left[ \begin{array}{cccc}
\sigma(x_{1}) \\
\sigma(x_{2}) \\
\vdots \\
\sigma(x_{m}) 
\end{array} \right].$$

Similarly, $\sigma$ can be applied to a $m\times n$ matrix elementwise. 
For example, if $$\vec{x}=\left[ \begin{array}{cccc}
1 \\
2 \\
3 
\end{array} \right]$$
then
$$\sigma(\vec{x})=\left[ \begin{array}{cccc}
\sigma(1) \\
\sigma(2) \\
\sigma(3) 
\end{array} \right]\approx\left[ \begin{array}{cccc}
0.73 \\
0.88 \\
0.95 
\end{array} \right]$$

Define $p_1(\vec{x})=W_1\vec{x}+\vec{b}_1$ and $p_2(\vec{x})=W_2\vec{x}+\vec{b}_2$ for some $W_1, W_2, \vec{b}_1, \text{and } \vec{b}_2.$ 

Consider the classifier or score function $f=\sigma\circ p_2\circ\sigma\circ p_1:\mathbb{R}^{784}\rightarrow\mathbb{R}^{10}.$ This is a two-layer neural network. Assume that the hidden layer is 30-dimensional. The score function takes a flattened MNIST image of shape `(784,1)` and output a one-hot vector of shape `(10,1)`. The class with the highest score is the label predicted by the classifier. 

The training a neural network amounts to producing a set parameters $W_1, W_2, \vec{b}_1, \text{and } \vec{b}_1$ whose score function $f(x; W_1, W_2, \vec{b}_1, \vec{b}_2)$ can accurately classify unseen images. 


#### What are the dimensions of $W_1,W_2,b_1,b_2$? Write your answer in this cell using markdown.

Answer:

To demonstrate an example of such a score function, let's load up a set of parameters that has been trained. 

In [None]:
with open("parameters.npy", mode="rb") as r:
    parameters = np.load(r)
    W1, B1, W2, B2 = parameters

#### Implement the score function with these set of parameters. 

In [None]:
def f(x, W1, W2, B1, B2):
    """Return the output of the network if ``x`` is input image and
    W1, W2, B1 and B2 are the learnable weights. """
    #Z1 = W1*x+B1  (* represents matrix multiplication, np.dot())
    Z1 = 
    #A1 = sigmoid(Z1)
    A1 = 
    #Z2 = W2*A1+B2 (* represents matrix multiplication)
    Z2 = 
    #A2 = sigmoid(Z2)
    A2 = 
    return A2


#### Apply your score function above to the first two images. Does it classify them correctly?  

#### The predict function below predict a list of images. It is missing one line of code. Fill in the code. 

In [None]:
def predict(images, W1, W2, B1, B2):
    predictions = []  #empty list
    for im in images:
        # fill in one line of code here
        # this line calls f above with the correct parameters
        
        predictions.append(np.argmax(a)) # add prediction to predictions list
    return predictions

#### Call predict above with the first 10 images of the training data. Does it classify them correctly? (Answer 10 out of 10). But this is not surprising since the algorithm was trained on these images, that is, the algorithm has already seen these pictures. Repeat this problem by predicting the first 10 images of the test data which contains images that the algorithm has NOT seen. 

#### Find the the first TWO images in the test dataset that are classified incorrectly by the score function above. Write code to do this, do not use trial and error. Remember that unlike the training data, the test data label is NOT in one hot encoding format. This means you don't need to call np.argmax on your predictions.

#### Run the following cell to see the difference in label encoding. 

In [None]:
print(training_data[0][1]) # label for the first image of training data (5)
print(test_data[0][1])     # label for the first image of testing data (7)

#### In the first 100 images of the test data, the function classify incorrectly images at index 8, 38. For example, call the predict function on the image at index = 8. The algorithm thinks that the image at index = 8 is a 6 but it is actually a 5. Plot the image to see that you can forgive the algorithm for making this mistake. 