# Neural Quest Assignment-1
*  In this assignment, we will build a classifier for MNIST from scratch using just [NumPy](https://numpy.org/)

*  [MNIST](http://yann.lecun.com/exdb/mnist/) dataset contains images of handwritten digits of size 28x28

*  The dataset that you are expected to use for training can be found [here](https://drive.google.com/file/d/1DF-OWSP803x34FrvaJ4XeDm_QZUevu32/view?usp=sharing)

*   Our model will have 1 hidden layer, like the one below (not our recommendation to use 256 in the hidden layer though, try various values out)

**Feel free to redefine any function signatures below, just make sure the final cell remains the same.**

<center>
<img src="https://user-images.githubusercontent.com/81357954/166119893-4ca347b8-b1a4-40b8-9e0a-2e92b5f164ae.png">
</center>

## Import libraries here
NumPy, Matplotlib, ...

Also remember to initialize the seed for reproducibility of results

In [None]:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)  # Setting random seed for reproducibility

In [None]:
import os

# Get the current working directory
current_dir = os.getcwd()

# Join the current directory with the file name or relative file path
file_path = os.path.join(current_dir, 'train_data.pkl')

# Print the file path
print(file_path)


/content/train_data.pkl


## Load *Dataset*
Load data from the given pickle file

In [None]:
import pickle

# Define the path to your dataset file
dataset_path = '/Users/pranavsaireddy/Downloads/train_data.pkl'

# Load the dataset
with open(dataset_path, 'rb') as f:
    data = pickle.load(f)

# Access the data
X, y = data['X'], data['y']

# Normalize the data
X = X / 255.0

# Split into train and test sets
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)


FileNotFoundError: ignored

In [None]:
# display a 4x4 grid, 
# choose 16 images randomly, display the images as well as corresponding labels
fig, axes = plt.subplots(4, 4, figsize=(8, 8))

for i, ax in enumerate(axes.flat):
    idx = np.random.randint(len(X_train))
    ax.imshow(X_train[idx].reshape(28, 28), cmap='gray')
    ax.set_title(f"Label: {y_train[idx]}")
    ax.axis('off')

plt.tight_layout()
plt.show()



NameError: ignored

## Building up parts of our classifier

**Activation functions**

In [None]:
def relu(z):
    return np.maximum(0, z)

def softmax(z):
    exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
    return exp_z / np.sum(exp_z, axis=1, keepdims=True)


**Notes about the Neural Network** 
*   Input size is (784,) because 28x28 = 784
*   Output size will be 10, each element represeting probability of the image representing that digit
*   Size of the hidden layer is a hyperparameter



**Initialize the layers weights**

Generally, we follow the convention that weights are drawn from a standard normal distribution, while the bias vectors are initialized to zero. But you can try everything out :)

In [None]:
def init_params(input_dim, hidden_dim, output_dim):
    params = {}

    params['W1'] = np.random.randn(input_dim, hidden_dim) * 0.01
    params['b1'] = np.zeros((1, hidden_dim))

    params['W2'] = np.random.randn(hidden_dim, output_dim) * 0.01
    params['b2'] = np.zeros((1, output_dim))

    return params


**Forward Propagation**

In [None]:
def forward_prop(X, params):
    Z1 = np.dot(X, params['W1']) + params['b1']
    A1 = relu(Z1)

    Z2 = np.dot(A1, params['W2']) + params['b2']
    A2 = softmax(Z2)

    return A2, {'Z1': Z1, 'A1': A1, 'Z2': Z2, 'A2': A2}


**Backward Propagation**


You may use stochastic gradient descent or batch gradient descent here. Feel free to use any loss function.

In [None]:
def backward_prop(X, y, params, cache):
    m = X.shape[0]
    dZ2 = cache['A2'] - y
    dW2 = np.dot(cache['A1'].T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m
    dA1 = np.dot(dZ2, params['W2'].T)
    dZ1 = np.multiply(dA1, np.int64(cache['Z1'] > 0))
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m
    return {'dW1': dW1, 'db1': db1, 'dW2': dW2, 'db2': db2}


In [None]:
def cost_func(A2, y):
    m = y.shape[0]
    loss = -np.sum(np.multiply(y, np.log(A2))) / m
    return loss



## Integrate everything

In [None]:
def train(X, y, hidden_nodes, epochs=1000, lr=1e-5):
    weights = init_params(hidden_nodes)
    
    for i in range(epochs):
        logits, y_pred, layer_0 = forward_propg(X, weights)
        weights = backward_propg(weights, X, y, y_pred, layer_0, lr)
        
        if i % 100 == 0:
            loss = cost_func(y_pred, y)
            print(f"Epoch {i}: Loss = {loss:.2f}")
    
    return weights

In [None]:
def predict(X, weights):
    _, y_pred, _ = forward_propg(X, weights)
    return np.argmax(y_pred, axis=1)

In [None]:

def accuracy(y_true, y_pred):
    return np.mean(y_true == y_pred)

### Save as pickle

In [None]:
import pickle
import random
from google.colab import files

roll_num = "22b0654" # enter your ldap
hidden_dim = 128 # replace with your own hidden dimension

model_dict = {
    'z': hidden_dim, # hidden dimension of your model
    'layer_0_wt': np.random.randn(784, hidden_dim), # layer 0 weight (784, z)
    'layer_0_bias': np.zeros((hidden_dim, 1)), # layer 0 bias (z, 1)
    'layer_1_wt': np.random.randn(hidden_dim, 10), # layer 1 weight (z, 10)
    'layer_1_bias': np.zeros((10, 1)) # layer 1 bias (10, 1)
}

assert model_dict['layer_0_wt'].shape == (784, hidden_dim)
assert model_dict['layer_0_bias'].shape == (hidden_dim, 1)
assert model_dict['layer_1_wt'].shape == (hidden_dim, 10)
assert model_dict['layer_1_bias'].shape == (10, 1)

with open(f'model_{roll_num}.pkl', 'wb') as f:
    pickle.dump(model_dict, f)
    files.download(f'model_{roll_num}.pkl') # download the file from the Colab session for submission


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>