<a href="https://colab.research.google.com/github/afaale/ML/blob/ML/MNIST_only_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Attempt at predicting 28*28 px handwritten digits without tensorflow or keras, apart from loading the mnist dataset. The guide I followed is 
https://towardsdatascience.com/mnist-handwritten-digits-classification-from-scratch-using-python-numpy-b08e401c4dab

I calculated the derivatives of loss with respect to weights, $\frac{\partial L}{\partial w_i}$ , by hand for practice, the steps are as follows:

1. $\frac{\partial L}{\partial y_{hat}}=\frac{y_{hat}-y}{y_{hat}(1-y_{hat})}$

2. $\frac{\partial y_{hat}}{\partial z}=\sigma(z)(1-\sigma(z))=y_{hat}(1-y_{hat})$

In [108]:
from keras.datasets import mnist
import numpy as np
from matplotlib import pyplot as plt
import keras

In [109]:
(X,y), (X_test, y_test) = mnist.load_data()
X.shape

(60000, 28, 28)

In [110]:
def init(x,y):
    #initialize weight matrices for the 1st and 2nd layer
    layer = np.random.uniform(-1,1, size=(x,y))/np.sqrt(x*y)
    return layer.astype(np.float32)

l1 = init(28*28, 128)
l2 = init(128, 10)

In [111]:
sigmoid = lambda z: 1 / (1+np.e**-z)
dsigmoid = lambda z: sigmoid(z)*(1-sigmoid(z))

In [112]:
#altered softmax e^z/sum_1_k(e^z) for computational efficiency
def softmax(x):
    exp_element=np.exp(x-x.max())
    return exp_element/np.sum(exp_element,axis=0)

#derivative of softmax
def d_softmax(x):
    exp_element=np.exp(x-x.max())
    return exp_element/np.sum(exp_element,axis=0)*(1-exp_element/np.sum(exp_element,axis=0))

In [113]:
def forward_backward_pass(x,y):
    #1-hot encode labels
    targets = np.zeros((len(y), 10), np.float32)
    targets[0, y] = 1

    #forward pass: input-->hidden_layer-->output
    #hidden layer activated with sigmoid,
    #output_layer activated(normalized) with softmax
    x_l1 = x@l1
    x_sigmoid = sigmoid(x_l1)
    x_l2 = x_sigmoid@l2
    out = softmax(x_l2)

    #backprop: calculate error y-y_hat, calculate gradient, update weights

    #derivative of output(softmax) w/r to x_l2.
    error = 2*(out-targets)/out.shape[0]*d_softmax(x_l2)
    #matrix multiply to get matrix of same dimensions as l2, which can be 
    #subtracted from actual l2 weights matrix to update its weights
    update_l2 = x_sigmoid.T@error

    #derivative of x_sigmoid, i.e. layer1, w/r to x_l1
    error = (l2@error.T).T*dsigmoid(x_l1)
    #same idea as above
    update_l1 = x.T@error

    return out, update_l1, update_l2



In [114]:
ex = y[2].reshape(1,1)
ex_out = np.zeros((1, 10), np.float32)
ex_out[0,ex] = 1
ex_out

array([[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]], dtype=float32)