# Neural Networks
In the previous part of this exercise, you implemented multi-class logistic re gression to recognize handwritten digits. However, logistic regression cannot form more complex hypotheses as it is only a linear classifier.<br><br>

In this part of the exercise, you will implement a neural network to recognize handwritten digits using the same training set as before. The <strong>neural network</strong> will be able to represent complex models that form <strong>non-linear hypotheses</strong>. For this week, you will be using parameters from <strong>a neural network that we have already trained</strong>. Your goal is to implement the <strong>feedforward propagation algorithm to use our weights for prediction</strong>. In next week’s exercise, you will write the backpropagation algorithm for learning the neural network parameters.<br><br>

The file <strong><em>ex3data1</em></strong> contains a training set.<br>
The structure of the dataset described blow:<br>
1. X array = <strong>400 columns describe the values of pixels of 20*20 images in flatten format for 5000 samples</strong>
2. y array = <strong>Value of image (number between 0-9)</strong>


<br><br>
<strong>
Our assignment has these sections:
1. Visualizing the Data
    1. Converting .mat to .csv
    2. Loading Dataset and Trained Neural Network Weights
    3. Ploting Data
2. Model Representation
3. Feedforward Propagation and Prediction
</strong>

In each section full description provided.

## 1. Visualizing the Dataset
Before starting on any task, it is often useful to understand the data by visualizing it.<br>

### 1.A Converting .mat to .csv
In this specific assignment, the instructor added a .mat file as training set and weights of trained neural network. But we have to convert it to .csv to use in python.<br>
After all we now ready to import our new csv files to pandas dataframes and do preprocessing on it and make it ready for next steps.

In [2]:
# import libraries
import scipy.io
import numpy as np

data = scipy.io.loadmat("ex3data1")
weights = scipy.io.loadmat('ex3weights')

Now we extract X and y variables from the .mat file and save them into .csv file for further usage. After running the below code <strong>you should see X.csv and y.csv files</strong> in your directory.

In [3]:
for i in data:
    if '__' not in i and 'readme' not in i:
        np.savetxt((i+".csv"),data[i],delimiter=',')
        
for i in weights:
    if '__' not in i and 'readme' not in i:
        np.savetxt((i+".csv"),weights[i],delimiter=',')

### 1.B Loading Dataset and Trained Neural Network Weights
First we import .csv files into pandas dataframes then save them into numpy arrays.<br><br>
There are <strong>5000 training examples</strong> in ex3data1.mat, where each training example is a <strong>20 pixel by 20 pixel <em>grayscale</em> image of the digit</strong>. Each pixel is represented by a floating point number indicating the <strong>grayscale intensity</strong> at that location. The 20 by 20 grid of pixels is <strong>"flatten" into a 400-dimensional vector</strong>. <strong>Each of these training examples becomes a single row in our data matrix X</strong>. This gives us a 5000 by 400 matrix X where every row is a training example for a handwritten digit image.<br><br>
The second part of the training set is a <strong>5000-dimensional vector y that contains labels</strong> for the training set.<br><br>
<strong>Notice: In dataset, the digit zero mapped to the value ten. Therefore, a "0" digit is labeled as "10", while the digits "1" to "9" are labeled as "1" to "9" in their natural order.<br></strong>
But this make thing harder so we bring it back to natural order for 0!

In [4]:
# import library
import pandas as pd

# saving .csv files to pandas dataframes
x_df = pd.read_csv('X.csv',names= np.arange(0,400))
y_df = pd.read_csv('y.csv',names=['label'])

In [5]:
# saving .csv files to pandas dataframes
Theta1_df = pd.read_csv('Theta1.csv',names = np.arange(0,401))
Theta2_df = pd.read_csv('Theta2.csv',names = np.arange(0,26))

In [6]:
# saving x_df and y_df into numpy arrays
x = x_df.iloc[:,:].values
y = y_df.iloc[:,:].values

m, n = x.shape

# bring back 0 to 0 !!!
y = y.reshape(m,)
y[y==10] = 0
y = y.reshape(m,1)

print('#{} Number of training samples, #{} features per sample'.format(m,n))

#5000 Number of training samples, #400 features per sample


In [7]:
# saving Theta1_df and Theta2_df into numpy arrays
theta1 = Theta1_df.iloc[:,:].values
theta2 = Theta2_df.iloc[:,:].values

### 1.C Plotting Data
You will begin by visualizing a subset of the training set. In first part, the code <strong>randomly selects selects 100 rows from X</strong> and passes those rows to the <strong>display_data</strong> function. This function maps each row to a 20 pixel by 20 pixel grayscale image and displays the images together.<br>
After plotting, you should see an image like this:<img src='img/plot.jpg'>

In [8]:
import numpy as np
import matplotlib.pyplot as plt
import random

amount = 100
lines = 10
columns = 10
image = np.zeros((amount, 20, 20))
number = np.zeros(amount)

for i in range(amount):
    rnd = random.randint(0,4999)
    image[i] = x[rnd].reshape(20, 20)
    y_temp = y.reshape(m,)
    number[i] = y_temp[rnd]
fig = plt.figure(figsize=(8,8))

for i in range(amount):
    ax = fig.add_subplot(lines, columns, 1 + i)
    
    # Turn off tick labels
    ax.set_yticklabels([])
    ax.set_xticklabels([])
    plt.imshow(image[i], cmap='binary')
plt.show()
print(number)

<Figure size 800x800 with 100 Axes>

[ 5.  0.  3.  1.  4.  2.  3.  3.  6.  5.  3.  0.  1.  2.  4.  1.  9.  0.
  6.  5.  4.  2.  2.  4.  4.  7.  6.  7.  4.  5.  6.  5.  2.  4.  5.  9.
  3.  2.  8.  4.  1.  4.  0.  0.  8.  5.  7.  4.  6.  4.  1.  8.  9.  7.
  0.  1.  2.  9.  5.  8.  2.  7.  5.  5.  8.  7.  6.  6.  8.  4.  3.  4.
  9.  2.  1.  1.  3.  3.  7.  0.  8.  6.  2.  3.  8.  1.  6.  1.  9.  0.
  1.  6.  6.  5.  0.  8.  5.  6.  4.  7.]


# 2. Model Representation
Our neural network is shown in below figure. It has <strong>3 layers an input layer, a hidden layer and an output layer</strong>. Recall that our <strong>inputs are pixel</strong> values of digit images. Since the images are of <strong>size 20×20</strong>, this gives us <strong>400 input layer units</strong> (excluding the extra bias unit which always outputs +1).<br><br><img src='img/nn.jpg'><br>
You have been provided with a set of <strong>network parameters (Θ<sup>(1)</sup>; Θ<sup>(2)</sup>)</strong> already trained by instructor.<br><br>
<strong>Theta1 and Theta2 The parameters have dimensions that are sized for a neural network with 25 units in the second layer and 10 output units (corresponding to the 10 digit classes).</strong>

In [9]:
print('theta1 shape = {}, theta2 shape = {}'.format(theta1.shape,theta2.shape))

theta1 shape = (25, 401), theta2 shape = (10, 26)


It seems our weights are transposed, so we transpose them to have them in a way our neural network is.

In [10]:
theta1 = theta1.transpose()
theta2 = theta2.transpose()
print('theta1 shape = {}, theta2 shape = {}'.format(theta1.shape,theta2.shape))

theta1 shape = (401, 25), theta2 shape = (26, 10)


# 3. Feedforward Propagation and Prediction
Now you will implement feedforward propagation for the neural network.<br>
You should implement the <strong>feedforward computation</strong> that computes <strong>h<sub>θ</sub>(x<sup>(i)</sup>)</strong> for every example i and returns the associated predictions. Similar to the one-vs-all classification strategy, the prediction from the neural network will be the <strong>label</strong> that has the <strong>largest output <strong>h<sub>θ</sub>(x)<sub>k</sub></strong></strong>.

<strong>Implementation Note:</strong> The matrix X contains the examples in rows. When you complete the code, <strong>you will need to add the column of 1’s</strong> to the matrix. The matrices <strong>Theta1 and Theta2 contain the parameters for each unit in rows.</strong> Specifically, the first row of Theta1 corresponds to the first hidden unit in the second layer. <br>
You must get <strong>a<sup>(l)</sup></strong> as a column vector.<br><br>
You should see that the <strong>accuracy is about 97.5%</strong>.

In [11]:
# adding column of 1's to x
x = np.append(np.ones(shape=(m,1)),x,axis = 1)

<strong>h = hypothesis(x,theta)</strong> will compute <strong>sigmoid</strong> function on <strong>θ<sup>T</sup>X</strong> and return a number which <strong>0<=h<=1</strong>.<br>
You can use <a href='https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.special.expit.html'>this</a> library for calculating sigmoid.

In [12]:
def sigmoid(z):
    return 1/(1+np.exp(-z))

In [13]:
def lr_hypothesis(x,theta):
    return np.dot(x,theta)

<strong>predict(theta1, theta2, x):</strong> outputs the predicted label of x given the trained weights of a neural network (theta1, theta2).

In [14]:
layers = 3
num_labels = 10

<strong>Becuase the initial dataset has changed and mapped 0 to "10", so the weights also are changed. So we just rotate columns one step to right, to predict correct values.<br>
Recall we have changed mapping 0 to "10" to 0 to "0" but we cannot detect this mapping in weights of neural netwrok. So we have to this rotation on final output of probabilities.</strong>

In [34]:
def rotate_column(array):
    array_ = np.zeros(shape=(m,num_labels))
    temp = np.zeros(num_labels,)
    temp= array[:,9]
    array_[:,1:10] = array[:,0:9]
    array_[:,0] = temp
    return array_

In [35]:
def predict(theta1,theta2,x):
    z2 = np.dot(x,theta1) # hidden layer
    a2 = sigmoid(z2) # hidden layer

    # adding column of 1's to a2
    a2 = np.append(np.ones(shape=(m,1)),a2,axis = 1)
    z3 = np.dot(a2,theta2)
    a3 = sigmoid(z3)
    
    # mapping problem. Rotate left one step
    y_prob = rotate_column(a3)
    
    # prediction on activation a2
    y_pred = np.argmax(y_prob, axis=1).reshape(-1,1)
    return y_pred

In [36]:
y_pred = predict(theta1,theta2,x)
y_pred.shape

(5000, 1)

Now we will compare our predicted result to the true one with <a href='http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html'>confusion_matrix</a> of numpy library.

In [37]:
from sklearn.metrics import confusion_matrix

# Function for accuracy
def acc(confusion_matrix):
    t = 0
    for i in range(num_labels):
        t += confusion_matrix[i][i]
    f = m-t
    ac = t/(m)
    return (t,f,ac)

In [38]:
#import library
from sklearn.metrics import confusion_matrix
cm_train = confusion_matrix(y.reshape(m,),y_pred.reshape(m,))
t,f,ac = acc(cm_train)
print('With #{} correct, #{} wrong ==========> accuracy = {}%'
          .format(t,f,ac*100))



In [39]:
cm_train

array([[496,   0,   0,   0,   1,   0,   1,   0,   1,   1],
       [  0, 491,   1,   1,   2,   0,   0,   1,   3,   1],
       [  3,   1, 485,   0,   3,   1,   3,   1,   2,   1],
       [  0,   2,   2, 480,   0,   8,   1,   4,   1,   2],
       [  0,   2,   2,   0, 484,   0,   3,   0,   1,   8],
       [  0,   0,   1,   4,   1, 492,   2,   0,   0,   0],
       [  2,   2,   0,   0,   0,   3, 493,   0,   0,   0],
       [  1,   3,   2,   1,   4,   0,   0, 485,   0,   4],
       [  0,   4,   1,   1,   2,   1,   0,   0, 491,   0],
       [  3,   2,   0,   4,   2,   1,   1,   5,   3, 479]], dtype=int64)