<a href="https://colab.research.google.com/github/Rohitd922/Coding-Neural-Networks-from-Scratch/blob/master/Neural_Network_from_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **This notebook implements a multi-layer perceptron (MLP) from scratch using numpy, with Leaky ReLU activation for hidden layers and a linear output layer.**

### **1. Constants Explanation**
- `LEARNING_RATE = 0.1`: Controls how much the weights are adjusted during training.
- `ALPHA = 0.01`: This is the slope for **Leaky ReLU** when  $z \leq 0$.
- `NUM_SAMPLES = 3`: We have **3 training examples**.


In [31]:
import numpy as np

# Constants
LEARNING_RATE = 0.1  # Controls how much weights are adjusted during training
ALPHA = 0.01  # Leaky ReLU coefficient (slope for negative values)
NUM_SAMPLES = 3  # Number of training examples

### **2. Training Data Explanation**
- `x_train.shape = (1,3)`:  
  - **1 feature (input dimension)**.
  - **3 samples (columns represent different training samples)**.
- `y_train.shape = (1,3)`:  
  - **1 output per sample**.

### **Why Use This Shape?**
- Ensures **correct matrix multiplication** with weight matrices during forward propagation.


In [32]:
# Training Data (features x samples)
x_train = np.array([[2], [-1], [3]]).T  # Shape (1, 3) - 1 feature, 3 samples
y_train = np.array([[3], [1], [5]]).T  # Shape (1, 3) - 1 output per sample


### **3. Activation Functions**
- **Leaky ReLU Function**:
  - If $ z > 0 $ → return $ z $ (linear behavior).
  - If $ z \leq 0 $ → return $ ALPHA \times z $ (small slope instead of 0).
  
  **This prevents vanishing gradients for negative values!**

- **Leaky ReLU Derivative**:
  - If $ z > 0 $ → derivative = **1**.
  - If $ z \leq 0 $ → derivative = **ALPHA** (small positive value).


In [33]:
# Leaky ReLU Activation Function
def leaky_relu(z):
    """Applies the Leaky ReLU activation function element-wise.

    Args:
        z: Input value or array.

    Returns:
        The result of applying Leaky ReLU to the input.
    """
    return np.where(z > 0, z, ALPHA * z)  # If z > 0, return z; else, return ALPHA * z

# Derivative of Leaky ReLU
def leaky_relu_derivative(z):
    """Calculates the derivative of the Leaky ReLU function element-wise.

    Args:
        z: Input value or array.

    Returns:
        The derivative of Leaky ReLU at the input.
    """
    return np.where(z > 0, 1, ALPHA)  # If z > 0, derivative is 1; else, it's ALPHA


### **4. Neural Network Class**
- This `NeuralNetwork` class initializes and trains a **multi-layer perceptron (MLP)**.
- The **layer sizes** are passed as a list.  
  - Example: `[1, 4, 3, 1]` means:
    - **1 input neuron**.
    - **4 neurons in the first hidden layer**.
    - **3 neurons in the second hidden layer**.
    - **1 output neuron**.

### **Weight & Bias Initialization**
1. **Weight Initialization**:
   - Each **weight matrix** has shape `(neurons in next layer, neurons in current layer)`.
   - **Small random values** prevent exploding gradients.
2. **Bias Initialization**:
   - Each **bias vector** has shape `(neurons in next layer, 1)`.
   - Biases start at **zero**.


### **5. Forward Propagation**
1. **Stores input features** as the first activation.
2. **Loop through hidden layers**:
   - Computes: $ Z = W \cdot A + B $
   - Applies **Leaky ReLU** activation for hidden layers.
3. **Output Layer (No activation function)**:
   - Linear activation (for regression).


### **6. Compute Loss (Mean Squared Error)**
- Uses **Mean Squared Error (MSE)**:
  $$
  Loss = \frac{1}{2} (y - \hat{y})^2
  $$


## **7. Backpropagation – Detailed Explanation**
Backpropagation is the process of computing the **gradients** of the loss function with respect to the weights and biases, and then **updating them using gradient descent**.

### **Goal:**  
- Adjust weights and biases to **minimize the loss function** by propagating errors backward through the network.

---

#### **Step 1: Compute Gradients for Output Layer**
For the output neuron, we use **Mean Squared Error (MSE)** as the loss function:

$$
L = \frac{1}{2N} \sum (y - a_2)^2
$$

where:
- $ N $ is the number of samples.
- $ y $ is the true label.
- $ a_2 $ (or $ \hat{y} ) $ is the predicted output.

#### **Derivative of Loss with Respect to Output Activation**
The gradient of loss w.r.t. the output activation $ a_2 $ is:

$$
\frac{dL}{da_2} = (a_2 - y)
$$

Since the **output layer activation is linear**, we also get:

$$
\frac{dL}{dz_2} = \frac{dL}{da_2} = (a_2 - y)
$$

Now, we compute the gradients for the **output layer parameters** \( w_2 \) and \( b_2 \):

$$
\frac{dL}{dw_2} = \frac{1}{N} \sum (a_2 - y) \cdot a_1
$$

$$
\frac{dL}{db_2} = \frac{1}{N} \sum (a_2 - y)
$$

---

#### **Step 2: Compute Gradients for Hidden Layer**
The hidden layer uses **Leaky ReLU activation**:

$$
a_1 = \text{LeakyReLU}(z_1) = max(αz_1, z_1)
$$

The gradient of loss w.r.t. hidden layer activation is:

$$
\frac{dL}{da_1} = \frac{dL}{dz_2} \cdot w_2
$$

Applying the **Leaky ReLU derivative**:

$$
\frac{dL}{dz_1} = \frac{dL}{da_1} \cdot \text{LeakyReLU}'(z_1)
$$

where:

$$
\text{LeakyReLU}'(z_1)
\begin{cases}
1, & \text{if } z_1 > 0 \\
\alpha, & \text{if } z_1 \leq 0
\end{cases}
$$

Now, we compute the gradients for the **hidden layer parameters** $ w_1 $ and $ b_1 $:

$$
\frac{dL}{dw_1} = \frac{1}{N} \sum \frac{dL}{dz_1} \cdot x
$$

$$
\frac{dL}{db_1} = \frac{1}{N} \sum \frac{dL}{dz_1}
$$

---

#### **Step 3: Update Weights & Biases Using Gradient Descent**
Once we have the gradients, we update the weights and biases using **gradient descent**:

$$
w_1 = w_1 - \eta \cdot \frac{dL}{dw_1}
$$

$$
b_1 = b_1 - \eta \cdot \frac{dL}{db_1}
$$

$$
w_2 = w_2 - \eta \cdot \frac{dL}{dw_2}
$$

$$
b_2 = b_2 - \eta \cdot \frac{dL}{db_2}
$$

where:
- $ \eta $ (learning rate) determines the **step size** for the update.
- A **larger** $ \eta $ makes learning **faster** but can cause instability.
- A **smaller** $ \eta $ makes learning **slower** but more stable.

---

### **Summary**
### **What Happens in Backpropagation?**
1. **Compute loss derivative w.r.t output activation** $ a_2 $.
2. **Propagate gradient backward to compute gradients** for $ w_2 $ and $ b_2 $.
3. **Propagate further back** to compute gradients for $ w_1 $ and $ b_1 $ using Leaky ReLU.
4. **Update all parameters** using **gradient descent**.

This ensures that weights and biases move in the **direction that minimizes the loss**, improving the network’s accuracy over time. 🚀



### **8. Training Loop**
- Runs forward and backward pass for `epochs` iterations.
- Prints loss every 5 epochs.


In [34]:
# Neural Network Class
class NeuralNetwork:
    def __init__(self, layer_sizes):
        """Initializes the neural network.

        Args:
            layer_sizes: A list specifying the number of neurons in each layer.
                         Example: [1, 4, 3, 1] represents a network with:
                             - 1 input neuron
                             - 4 neurons in the first hidden layer
                             - 3 neurons in the second hidden layer
                             - 1 output neuron
        """
        self.num_layers = len(layer_sizes) - 1  # Number of weight matrices (layers - 1)
        self.weights = []  # List to store weight matrices
        self.biases = []  # List to store bias vectors

        for i in range(self.num_layers):
            # Initialize weights with small random values to prevent exploding gradients
            self.weights.append(np.random.randn(layer_sizes[i+1], layer_sizes[i]) * 0.1)
            # Initialize biases to zero
            self.biases.append(np.zeros((layer_sizes[i+1], 1)))




    def forward_pass(self, X):
        """Performs the forward propagation step.

        Args:
            X: Input data (features x samples).

        Returns:
            activations: A list of activations for each layer.
            zs: A list of pre-activation values (Z = W*A + b) for each layer.
        """
        activations = [X]  # Input is the first activation
        zs = []  # Store pre-activation values

        for i in range(self.num_layers - 1):  # Loop through hidden layers
            Z = np.dot(self.weights[i], activations[-1]) + self.biases[i]  # Calculate pre-activation
            A = leaky_relu(Z)  # Apply Leaky ReLU activation
            zs.append(Z)
            activations.append(A)

        # Output layer (linear activation)
        Z_out = np.dot(self.weights[-1], activations[-1]) + self.biases[-1]
        zs.append(Z_out)
        activations.append(Z_out)

        return activations, zs




    def compute_loss(self, predictions):
        """Calculates the Mean Squared Error (MSE) loss.

        Args:
            predictions: The model's predictions.

        Returns:
            The MSE loss.
        """
        return np.mean(0.5 * (y_train - predictions) ** 2)




    def backward_pass(self, activations, zs):
        """Performs the backpropagation step to update weights and biases.

        Args:
            activations: A list of activations for each layer.
            zs: A list of pre-activation values for each layer.
        """
        dW = [None] * self.num_layers  # List to store weight gradients
        dB = [None] * self.num_layers  # List to store bias gradients

        # Gradient of the output (MSE derivative)
        dA = activations[-1] - y_train

        for i in reversed(range(self.num_layers)):  # Loop backwards through layers
            current_z = zs[i]  # Current layer's pre-activation

            if i == self.num_layers - 1:
                # Output layer has linear activation, derivative is 1
                dZ = dA
            else:
                # Hidden layers use Leaky ReLU derivative
                dZ = dA * leaky_relu_derivative(current_z)

            # Compute gradients
            dW[i] = np.dot(dZ, activations[i].T) / NUM_SAMPLES
            dB[i] = np.mean(dZ, axis=1, keepdims=True)

            # Propagate gradient to previous layer
            if i > 0:
                dA = np.dot(self.weights[i].T, dZ)

        # Update parameters (weights and biases) using gradient descent
        for i in range(self.num_layers):
            self.weights[i] -= LEARNING_RATE * dW[i]
            self.biases[i] -= LEARNING_RATE * dB[i]




    def train(self, epochs=50):
        """Trains the neural network for a specified number of epochs.

        Args:
            epochs: The number of training iterations.
        """
        for epoch in range(epochs):
            activations, zs = self.forward_pass(x_train)  # Forward pass
            loss = self.compute_loss(activations[-1])  # Calculate loss
            self.backward_pass(activations, zs)  # Backward pass (update parameters)

            if epoch % 5 == 0:
                print(f"Epoch {epoch} - Loss: {loss:.6f}")  # Print loss every 5 epochs


### **9. Train the Model**
- Initializes the network with `[1, 1, 1]` layers.
- Trains for `50` epochs.


In [39]:
# Define network structure and train the model
layer_sizes = [1, 1, 1]  # 1 input, 1 hidden, 1 output
nn = NeuralNetwork(layer_sizes)
nn.train(epochs=100)


Epoch 0 - Loss: 5.833616
Epoch 5 - Loss: 2.902166
Epoch 10 - Loss: 1.879783
Epoch 15 - Loss: 1.522089
Epoch 20 - Loss: 1.394850
Epoch 25 - Loss: 1.344758
Epoch 30 - Loss: 1.313780
Epoch 35 - Loss: 1.271630
Epoch 40 - Loss: 1.189018
Epoch 45 - Loss: 1.035178
Epoch 50 - Loss: 0.815966
Epoch 55 - Loss: 0.610937
Epoch 60 - Loss: 0.482205
Epoch 65 - Loss: 0.412919
Epoch 70 - Loss: 0.375028
Epoch 75 - Loss: 0.353981
Epoch 80 - Loss: 0.342316
Epoch 85 - Loss: 0.335883
Epoch 90 - Loss: 0.332349
Epoch 95 - Loss: 0.330413


In [38]:
# Define network structure and train the model
layer_sizes = [1, 4, 3, 1]  # 1 input, 1 hidden, 1 output
nn = NeuralNetwork(layer_sizes)
nn.train(epochs=100)

Epoch 0 - Loss: 5.823733
Epoch 5 - Loss: 2.717339
Epoch 10 - Loss: 0.823156
Epoch 15 - Loss: 0.320812
Epoch 20 - Loss: 0.202949
Epoch 25 - Loss: 0.134669
Epoch 30 - Loss: 0.105323
Epoch 35 - Loss: 0.088645
Epoch 40 - Loss: 0.076021
Epoch 45 - Loss: 0.066310
Epoch 50 - Loss: 0.058694
Epoch 55 - Loss: 0.052586
Epoch 60 - Loss: 0.047570
Epoch 65 - Loss: 0.043348
Epoch 70 - Loss: 0.039710
Epoch 75 - Loss: 0.036507
Epoch 80 - Loss: 0.033634
Epoch 85 - Loss: 0.031017
Epoch 90 - Loss: 0.028605
Epoch 95 - Loss: 0.026363
