# Neural Networks Implementation from Scratch

In this notebook, we implement a simple 2-layer Multi-Layer Perceptron (MLP) from scratch using only NumPy. We will focus on the core components: initialization, forward propagation, backpropagation, and parameter updates (Gradient Descent).

## Network Architecture
- **Input Layer**: 784 neurons (for 28x28 images)
- **Hidden Layer**: 64 neurons with ReLU activation
- **Output Layer**: 10 neurons with Softmax activation (for 10 classes)

## Mathematical Equations
### Forward Propagation
1. $Z_1 = W_1 X + b_1$
2. $A_1 = \text{ReLU}(Z_1)$
3. $Z_2 = W_2 A_1 + b_2$
4. $A_2 = \sigma(Z_2)$ (Softmax)

### Backpropagation
1. $dZ_2 = A_2 - Y$ (assuming cross-entropy loss)
2. $dW_2 = \frac{1}{m} dZ_2 A_1^T$
3. $db_2 = \frac{1}{m} \sum dZ_2$
4. $dZ_1 = W_2^T dZ_2 \cdot \text{ReLU}'(Z_1)$
5. $dW_1 = \frac{1}{m} dZ_1 X^T$
6. $db_1 = \frac{1}{m} \sum dZ_1$

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
%matplotlib inline

## Neural Network Class Implementation

In [None]:
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Xavier/Kaiming-like initialization
        self.W1 = np.random.randn(hidden_size, input_size) * 0.01
        self.b1 = np.zeros((hidden_size, 1))
        self.W2 = np.random.randn(output_size, hidden_size) * 0.01
        self.b2 = np.zeros((output_size, 1))

    def relu(self, Z):
        return np.maximum(0, Z)

    def softmax(self, Z):
        exp_Z = np.exp(Z - np.max(Z, axis=0, keepdims=True))
        return exp_Z / np.sum(exp_Z, axis=0, keepdims=True)

    def relu_derivative(self, Z):
        return Z > 0

    def forward(self, X):
        self.Z1 = np.dot(self.W1, X) + self.b1
        self.A1 = self.relu(self.Z1)
        self.Z2 = np.dot(self.W2, self.A1) + self.b2
        self.A2 = self.softmax(self.Z2)
        return self.A2

    def backward(self, X, Y, A2, m):
        dZ2 = A2 - Y
        dW2 = (1 / m) * np.dot(dZ2, self.A1.T)
        db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)
        
        dZ1 = np.dot(self.W2.T, dZ2) * self.relu_derivative(self.Z1)
        dW1 = (1 / m) * np.dot(dZ1, X.T)
        db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)
        
        return dW1, db1, dW2, db2

    def update_params(self, dW1, db1, dW2, db2, alpha):
        self.W1 -= alpha * dW1
        self.b1 -= alpha * db1
        self.W2 -= alpha * dW2
        self.b2 -= alpha * db2

## Training and Evaluation Utilities

In [None]:
def one_hot(Y, classes):
    one_hot_Y = np.zeros((classes, Y.size))
    one_hot_Y[Y.astype(int), np.arange(Y.size)] = 1
    return one_hot_Y

def get_predictions(A2):
    return np.argmax(A2, axis=0)

def get_accuracy(predictions, Y):
    return np.sum(predictions == Y) / Y.size

def train(X, Y, iterations, alpha, hidden_size):
    input_size = X.shape[0]
    output_size = len(np.unique(Y))
    nn = NeuralNetwork(input_size, hidden_size, output_size)
    m = X.shape[1]
    Y_one_hot = one_hot(Y, output_size)
    
    for i in range(iterations):
        A2 = nn.forward(X)
        dW1, db1, dW2, db2 = nn.backward(X, Y_one_hot, A2, m)
        nn.update_params(dW1, db1, dW2, db2, alpha)
        if i % 100 == 0:
            print("Iteration: ", i)
            predictions = get_predictions(A2)
            print(get_accuracy(predictions, Y))
    return nn