# Neural Quest Assignment-1
*  In this assignment, we will build a classifier for MNIST from scratch using just [NumPy](https://numpy.org/)

*  [MNIST](http://yann.lecun.com/exdb/mnist/) dataset contains images of handwritten digits of size 28x28

*  The dataset that you are expected to use for training can be found [here](https://drive.google.com/file/d/1DF-OWSP803x34FrvaJ4XeDm_QZUevu32/view?usp=sharing)

*   Our model will have 1 hidden layer, like the one below (not our recommendation to use 256 in the hidden layer though, try various values out)

**Feel free to redefine any function signatures below, just make sure the final cell remains the same.**

<center>
<img src="https://user-images.githubusercontent.com/81357954/166119893-4ca347b8-b1a4-40b8-9e0a-2e92b5f164ae.png">
</center>

## Import libraries here
NumPy, Matplotlib, ...

Also remember to initialize the seed for reproducibility of results

In [18]:
import numpy as np
import math
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import pickle
from sklearn.preprocessing import MinMaxScaler
np.random.seed(0)

## Load *Dataset*
Load data from the given pickle file

In [24]:
# mount Google Drive to access the dataset
from google.colab import drive
drive.mount('/content/drive')

# load the data set
with open('/content/drive/MyDrive/train_data.pkl', 'rb') as fuk:
    dataset = pickle.load(fuk)

X = dataset['X']
y = dataset['y']

# normalize
MMS = MinMaxScaler()
X_trans = MMS.fit_transform(X)

# Split into X_train, y_train, X_test, y_test
# you can use stratified splitting from sklearn library

X_tr, X_te, y_tr, y_te = train_test_split(X_trans, y, test_size=0.2, random_state=666)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# display a 4x4 grid,
grid = (4,4)
# choose 16 images randomly, display the images as well as corresponding labels
indices = np.random.choice(X_tr.shape[0], size=grid[0]*grid[1], replace=False)
images = X_tr[indices]
labels = y_tr[indices]

fig, axes = plt.subplots(grid[0], grid[1], figsize=(8, 8))
for i, ax in enumerate(axes.flat):
    ax.imshow(images[i].reshape(28, 28), cmap='gray')
    ax.set_axis_off()
    ax.set_title(f"Label : {labels[i]}")

fig.tight_layout

plt.show()

In [None]:
print(X_tr.shape, X_te.shape, y_tr.shape, y_te.shape)

In [None]:
X_tr = X_tr.T
y_tr = y_tr.T
X_te = X_te.T
y_te = y_te.T

## Building up parts of our classifier

**Activation functions**

In [None]:
def relu(z):
    return np.maximum(0,z)

def softmax(Z):
    A = np.exp(Z) / sum(np.exp(Z))
    return A

**Notes about the Neural Network**
*   Input size is (784,) because 28x28 = 784
*   Output size will be 10, each element represeting probability of the image representing that digit
*   Size of the hidden layer is a hyperparameter



**Initialize the layers weights**

Generally, we follow the convention that weights are drawn from a standard normal distribution, while the bias vectors are initialized to zero. But you can try everything out :)

In [None]:
def init_params(l2):
    W1 = np.random.rand(l2,784) -0.5
    W2 = np.random.rand(10,l2) - 0.5
    b1 = np.zeros((l2,1))
    b2 = np.zeros((10,1))
    return W1,b1,W2,b2

**Forward Propagation**

In [None]:
def forward_prop(W1,b1,W2,b2,X):
    Z1 = W1.dot(X) + b1
    A1 = relu(Z1)
    Z2 = W2.dot(Z1) + b2
    A2 = softmax(Z2)
    return Z1,A1,Z2,A2

**Backward Propagation**


You may use stochastic gradient descent or batch gradient descent here. Feel free to use any loss function.

In [None]:
def y_data(Y):
    one_hot_Y = np.zeros((Y.size, Y.max() + 1))
    one_hot_Y[np.arange(Y.size), Y] = 1
    one_hot_Y = one_hot_Y.T
    return one_hot_Y

def d_relu(Z):
    return Z>0

def back_prop(Z1,A1,Z2,A2,W2,X,Y):
    y= y_data(Y)
    dz2 = A2 - y
    dw2 = dz2.dot(A1.T)/Y.size
    db2 = np.sum(dz2)/Y.size
    dz1 = W2.T.dot(dz2)*d_relu(Z1)
    dw1 = dz1.dot(X.T)/Y.size
    db1 = np.sum(dz1)/Y.size
    return dw1,db1,dw2,db2

def update_params(w1,dw1,w2,dw2,b1,db1,b2,db2,a):
    w1 = w1 - a*dw1
    b1 = b1 - a*db1
    w2 = w2 - a*dw2
    b2 = b2 - a*db2
    return w1,b1,w2,b2

In [None]:
def cost_func(y_pred, y):
    epsilon = 1e-8
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    loss = -np.mean(y * np.log(y_pred))
    return loss


## Integrate everything

In [None]:
def pred(A2):
    return np.argmax(A2,axis=0)

def accuracy(predict,y):
    return (np.sum(predict==y)/y.size)

def train(X,Y,epoch,a,hid_dim):
    w1,b1,w2,b2 = init_params(hid_dim)
    for i in range(epoch):
        z1,a1,z2,a2 = forward_prop(w1,b1,w2,b2,X)
        dw1,db1,dw2,db2 = back_prop(z1,a1,z2,a2,w2,X,Y)
        w1,b1,w2,b2 = update_params(w1,dw1,w2,dw2,b1,db1,b2,db2,a)
        if i%50 ==0:
            print("Iteration: ",i)
            print("Loss: ",cost_func(pred(a2),Y.reshape(-1)))
    return w1,b1,w2,b2



In [17]:
w1,b1,w2,b2 = train(X_tr,y_tr,1001,0.05,256)

Iteration:  0
Loss:  29.56097121655411
Iteration:  50
Loss:  1.872769252513908
Iteration:  100
Loss:  1.3266728215886117
Iteration:  150
Loss:  1.1175213424218609
Iteration:  200
Loss:  1.026569231297971
Iteration:  250
Loss:  0.945594988904972
Iteration:  300
Loss:  0.8653882748758878
Iteration:  350
Loss:  0.8504214717795514
Iteration:  400
Loss:  0.8201041014049215
Iteration:  450
Loss:  0.7966944863055236
Iteration:  500
Loss:  0.7775062772076565
Iteration:  550
Loss:  0.7625394741113203
Iteration:  600
Loss:  0.7475726710149839
Iteration:  650
Loss:  0.7364435097382211
Iteration:  700
Loss:  0.7157202439125248
Iteration:  750
Loss:  0.7049748468177192
Iteration:  800
Loss:  0.6999859124522738
Iteration:  850
Loss:  0.6919268646311696
Iteration:  900
Loss:  0.6807977033544066
Iteration:  950
Loss:  0.6723548913513452
Iteration:  1000
Loss:  0.6681334853498144


In [20]:
def predict(X, w1,b1,w2,b2):
    _,_,_,a2 = forward_prop(w1,b1,w2,b2,X)
    return np.argmax(a2, axis=0)

In [21]:
def accuracy(predictions, y):
    num_correct = np.sum(predictions == y)
    accuracy = 100 * num_correct / y.shape[0]
    print(f"Accuracy: {accuracy}%")

In [22]:
prediction=predict(X_te,w1,b1,w2,b2)
accuracy(prediction,y_te.reshape(-1))

Accuracy: 86.78333333333333%


### Save as pickle

In [23]:
import pickle
import random
from google.colab import files

roll_num = "22B0973" # enter ldap
hidden_dim = 256 # replace with your own hidden dimension

model_dict = {
    'z': hidden_dim, # hidden dimension of your model
    'layer_0_wt': w1, # layer 0 weight (784, z)
    'layer_0_bias': b1, # layer 0 bias (z, 1)
    'layer_1_wt': w2, # layer 1 weight (z, 10)
    'layer_1_bias': b2 # layer 1 bias (10, 1)
}

assert model_dict['layer_0_wt'].shape == (hidden_dim, 784)
assert model_dict['layer_0_bias'].shape == (hidden_dim, 1)
assert model_dict['layer_1_wt'].shape == (10, hidden_dim)
assert model_dict['layer_1_bias'].shape == (10, 1)

predictions = predict(X_te,w1,b1,w2,b2 )
accuracy(predictions, y_te.reshape(-1))

with open(f'model_{roll_num}.pkl', 'wb') as f:
    pickle.dump(model_dict, f)
    files.download(f'model_{roll_num}.pkl')

Accuracy: 86.78333333333333%


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>