<a href="https://colab.research.google.com/github/PranjalShukla2004/NN_from_Scratch/blob/main/NeuralNetwork_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
data = pd.read_csv('train.csv')

In [None]:
data.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
data = np.array(data)
m,n = data.shape
np.random.shuffle(data)

data_dev = data[0:1000].T     # This is the validation set
Y_dev = data_dev[0]
X_dev = data_dev[1:n]

data_train = data[1000:m].T   #.T is used to transpose the matrix
Y_train = data_train[0]       # First row which is our target
X_train = data_train[1:n]     # This contains the features of the traing dataset

### Forward Propogation
In this we run our image through our neural network and compute what output it gives. The different parameters of the following will be -



*   $A ^{[0]}$ = X    (This is just the input layer)
*   $Z^{[1]}$ = $W^{[1]}$ * $A ^{[0]}$ + $b^{[1]}$   (This is the hidden layer of the NN and is unactivated)

*   $A^{[1]}$ = ReLU($Z^{[1]}$)
(This is the activated hidden layer and the ReLU function is basically a linear function if its greater than 0)

Now when we move to the output Layer we will do the following -



*   $Z^{[2]}$ = $W^{[2]}$ * $A ^{[1]}$ + $b^{[2]}$
*   $A^{[2]}$ = softMax($Z^{[2]}$)

Our softMax function is $e^{z_i}/sum(e^{z_j})$ where j = [1,k]. Yhe softMax function is to normalize the data to (0,1)





In [None]:
def init_params():
    W1 = np.random.rand(10, 784) - 0.5
    b1 = np.random.rand(10, 1) - 0.5
    W2 = np.random.rand(10, 10) - 0.5
    b2 = np.random.rand(10, 1) - 0.5
    return W1, b1, W2, b2

def ReLU(Z):
    return np.maximum(0, Z)

def softMax(Z):
    A = np.exp(Z) / sum(np.exp(Z))
    return A

def forward_prop(W1, b1, W2, b2, X):
    Z1 = W1.dot(X) + b1           # Z1 has shape (n_hidden, m)
    A1 = ReLU(Z1)                     # A1 has shape (n_hidden, m)
    Z2 = W2.dot(A1) + b2          # Z2 has shape (n_output, m)
    A2 = softMax(Z2)                  # A2 has shape (n_output, m)
    return Z1, A1, Z2, A2

###Backwards Propogation

In backwards propogation we are going to compute the difference between our target and our prediction and check how much does previous weights and biases contribute to the error, then we adjust the weights accordingly.

`Basically we go in the opposite direction.`

The different parameters we are going to use for the following are -


*   $dZ^{[2]}$ = $A^{[2]}$ - Y <br><br>
$dZ^{[2]}$ is the difference between our prediction and actual target which is Y. We will one-hot encode the target value to a matrix for us to use.<br><br>

*   $dW^{[2]}$ = 1 / m * $dZ^{[2]}$$A ^{[1]T}$<br><br>
In the context of backpropagation in a neural network, dW2 represents the gradient of the loss function with respect to the weights in the second layer (W2). It quantifies how much a small change in the weights of the second layer would affect the overall error of the network.<br><br>

*   $db^{[2]}$ = 1 / m * sum($dZ^{[2]}$)<br><br>


*   $dZ^{[1]}$ = $W^{[2]T}$$dZ^{[2]}$ * ReLU'($Z^{[1]}$)<br><br>

Here the ReLU' function is the anti activation function. $dZ^{[1]}$ is the error that occurs in the first/hidden layer.<br><br>


*   $dW^{[1]}$ = 1 / m * $dZ^{[1]}$$X^T$<br><br>
*   $db^{[1]}$ = 1 / m * sum($dZ^{[1]}$)

After this we update our parameters appropriatly...





In [None]:
def one_hot(Y):
    one_hot_Y = np.zeros((Y.size, Y.max() + 1))
    one_hot_Y[np.arange(Y.size), Y] = 1
    one_hot_Y = one_hot_Y.T
    return one_hot_Y

def der_ReLU(Z):
    return Z > 0

def backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y):
    one_hot_Y = one_hot(Y)
    dZ2 = A2 - one_hot_Y
    dW2 = 1 / m * dZ2.dot(A1.T)
    db2 = 1 / m * np.sum(dZ2)
    dZ1 = W2.T.dot(dZ2) * der_ReLU(Z1)
    dW1 = 1 / m * dZ1.dot(X.T)
    db1 = 1 / m * np.sum(dZ1)
    return dW1, db1, dW2, db2

def update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha):
    W1 = W1 - alpha * dW1
    b1 = b1 - alpha * db1
    W2 = W2 - alpha * dW2
    b2 = b2 - alpha * db2
    return W1, b1, W2, b2

$W^{[1]}$ = $W^{[1]}$ - α*$dW^{[1]}$<br>
$W^{[2]}$ = $W^{[2]}$ - α*$dW^{[2]}$<br>
$b^{[1]}$ = $b^{[1]}$ - α*$db^{[1]}$<br>
$b^{[2]}$ = $b^{[2]}$ - α*$db^{[2]}$<br>

Here `α` is the learning parameter set by us.

In [None]:
def get_preds(A2):
    return np.argmax(A2, 0)

def get_accuracy(preds , Y):
    print(preds, Y)
    return np.sum(preds == Y) / Y.size

def gradient_descent(X, Y, alpha, iterations):
  W1, b1, W2, b2 = init_params()
  for i in range(iterations):
    Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
    dW1, db1, dW2, db2 = backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y)
    W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha)
    if i % 10 == 0:
      print("iteration:", i)
      print("Accuracy" , get_accuracy(get_preds(A2),Y))
  return W1, b1, W2, b2

In [None]:
W1, b1, W2, b2 = gradient_descent(X_train, Y_train, 0.10, 500)

  A = np.exp(Z) / sum(np.exp(Z))
  A = np.exp(Z) / sum(np.exp(Z))


iteration: 0
[7 0 2 ... 2 0 2] [9 6 3 ... 6 2 6]
Accuracy 0.12197560975609756
iteration: 10
[0 0 0 ... 0 0 0] [9 6 3 ... 6 2 6]
Accuracy 0.09875609756097561
iteration: 20
[0 0 0 ... 0 0 0] [9 6 3 ... 6 2 6]
Accuracy 0.09875609756097561
iteration: 30
[0 0 0 ... 0 0 0] [9 6 3 ... 6 2 6]
Accuracy 0.09875609756097561
iteration: 40
[0 0 0 ... 0 0 0] [9 6 3 ... 6 2 6]
Accuracy 0.09875609756097561
