# Modular Neural Network Engine
**Author:** MD Saifullah Baig.A
<br>
**Version:** 3.0

## Overview
This notebook implements a modular, scratch-built Deep Learning framework in Python. It is designed to demystify the internal mechanics of deep learning by implementing backpropagation, optimizers (SGD, Adam), and dynamic layer management without relying on auto-differentiation libraries.

**Key Features:**
* **Modular Architecture:** Dynamic stacking of layers.
* **Advanced Optimizers:** Custom implementation of Adam and SGD.
* **Vectorized Operations:** High-performance matrix computations using NumPy.

## 1. Dependencies
We utilize `NumPy` for high-performance matrix operations (Linear Algebra), which form the backbone of the tensor computations. `Pandas` and `Matplotlib` are employed for data management and visualizing the loss convergence during training.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 2. Activation Functions & Derivatives
This static registry defines the non-linear activation functions ($\phi$) and their derivatives ($\phi'$). These functions are crucial for enabling the network to learn complex, non-linear patterns.

**Implemented Mathematics:**
* **Sigmoid:** $\sigma(x) = \frac{1}{1+e^{-x}}$
* **Tanh:** $\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
* **ReLU:** $f(x) = \max(0, x)$
* **MSE Loss:** $L = \frac{1}{n} \sum (y_{true} - y_{pred})^2$

In [2]:
class Activation():
    @staticmethod
    def sigmoid(x):
        return 1/(1+np.exp(-x))
    @staticmethod
    def tanh(x):
        return (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
    @staticmethod
    def relu(x):
        return np.maximum(0,x)
    @staticmethod
    def mse(true,predicted):
        return np.mean((true-predicted)**2)
    

    @staticmethod
    def del_sigmoid(x):
        return x*(1-x)
    @staticmethod
    def del_tanh(x):
        return 1 - x**2
    @staticmethod
    def del_relu(x):
        return np.where(x > 0, 1, 0)
    @staticmethod
    def del_mse(true,predicted):
        return 2*(predicted - true)/true.size

## 3. Optimization Algorithms
The `Optimizer` base class defines the interface for parameter updates. We implement two distinct strategies to minimize the loss function:

### 3.1. Stochastic Gradient Descent (SGD)
A standard approach updating weights proportional to the negative gradient.
$$\theta_{t+1} = \theta_t - \eta \cdot \nabla J(\theta)$$

### 3.2. Adam (Adaptive Moment Estimation)
A robust optimizer that adapts the learning rate for each parameter by estimating the first ($m_t$) and second ($v_t$) moments of the gradients. We include bias correction to counteract initialization bias.

In [None]:
class Optimiser():
    def update(self,parameters,gradient):
        raise NotImplementedError
class SGD(Optimiser):
    def __init__(self,learning_rate=0.01):
        self.learning_rate=learning_rate
    def update(self,parameters,gradient):
        return parameters-self.learning_rate*gradient
class Adam(Optimiser):
    def __init__(self,learning_rate=0.1,beta1=0.9,beta2=0.999,epsilon=1e-8):
        self.learning_rate=learning_rate
        self.beta1=beta1
        self.beta2=beta2
        self.epsilon=epsilon
        self.m=None
        self.v=None
        self.t=0
    def update(self,parameters,gradient):
        if self.m is None:
            self.m=np.zeros_like(parameters)
            self.v=np.zeros_like(parameters)
        self.t+=1
        self.m=self.m*self.beta1+(1-self.beta1)*gradient
        self.v=self.v*self.beta2+(1-self.beta2)*(gradient**2)
        m_corrected=self.m/(1-self.beta1**self.t)
        v_corrected=self.v/(1-self.beta2**self.t)

        return parameters - self.learning_rate * m_corrected / (np.sqrt(v_corrected) + self.epsilon)

## 4. The Layer Interface
The abstract `Layer` class enforces a strict contract for all network components. To maintain modularity, every layer must implement:

* **`forward(input)`**: Computes the output tensor $Y$.
* **`backward(output_error)`**: Computes the gradient with respect to the input ($\frac{\partial L}{\partial X}$) and updates internal weights if applicable.

In [None]:
class Layer:
    def __init__(self):
        self.input=None
        self.output=None
    def forward(self,input):
        raise NotImplementedError
    def backward(self,output_error):
        raise NotImplementedError

## 5. Fully Connected (Dense) Layer
The core computational unit performing the affine transformation $Y = XW + B$.

**Implementation Details:**
* **Xavier Initialization:** Weights are initialized uniformly within $\pm \sqrt{\frac{1}{n_{in}}}$ to maintain variance consistency across layers and prevent vanishing gradients.
* **Independent Optimizers:** Each layer instance maintains its own optimizer states for weights and biases, ensuring correct momentum tracking for the Adam algorithm.

In [None]:
class Connected_Layers(Layer):
    def __init__(self,input_size,output_size,learning_rate=0.01,optimizer="adam"):
        limit=np.sqrt(1/input_size)
        self.weights=np.random.uniform(-limit,limit,(input_size,output_size))
        self.bias=np.zeros((1,output_size))

        self.optimizer_w=self._get_optimizer(optimizer,learning_rate)
        self.optimizer_b=self._get_optimizer(optimizer,learning_rate)
    def _get_optimizer(self,name,learning_rate):
        if name.lower()=="adam":
            return Adam(learning_rate=learning_rate)
        elif name.lower()=="sgd":
            return SGD(learning_rate=learning_rate)
    def forward(self,input):
        self.input=input
        self.output=np.dot(self.input,self.weights)+self.bias
        return self.output
    def backward(self,output_error):
        input_error = np.dot(output_error, self.weights.T)
        weight_gradient=np.dot(self.input.T,output_error)

        self.weights=self.optimizer_w.update(self.weights,weight_gradient)
        self.bias=self.optimizer_b.update(self.bias,np.sum(output_error, axis=0, keepdims=True))
        return input_error

## 6. Activation Wrapper
A pass-through layer that applies non-linearity element-wise to the input tensor.

* **Forward:** Applies the function chosen from the `Activation` registry.
* **Backward:** Applies the derivative via the Chain Rule: $\delta_{in} = \delta_{out} \odot \phi'(input)$.

In [None]:
class Activation_Layer(Layer):
    def __init__(self,activation):
        self.activation={
                            "tanh":(Activation.tanh,Activation.del_tanh),
                            "sigmoid":(Activation.sigmoid,Activation.del_sigmoid),
                            "relu":(Activation.relu,Activation.del_relu)
                        }
        self.activation_forward,self.activation_backward=self.activation[activation]
    def forward(self,input):
        self.input=input
        self.output = self.activation_forward(self.input)
        return self.activation_forward(self.input)
    def backward(self,error):
        return self.activation_backward(self.output)*error

## 7. Neural Network Engine
The `Neural_Network` class serves as the container and trainer. It orchestrates the flow of tensors through the stack.

**Training Loop Architecture:**
1.  **Forward Propagation:** Sequential processing from Input $\to$ Output.
2.  **Loss Computation:** Evaluating model performance via MSE.
3.  **Backpropagation:** Reverse-mode differentiation to propagate error gradients from Output $\to$ Input.

<br>

### ðŸš€ Major Upgrades in V3.0:
1.  **Vectorized Prediction (`Predict`):**
    * **Old Way:** Looped through input samples one by one (slow).
    * **New Way:** Processes the entire input matrix $X$ in a single operation using NumPy broadcasting.

2.  **Mini-Batch Gradient Descent (`Training_model`):**
    * **Stochastic Gradient Descent (SGD):** Updates weights after *every* sample. Noisy and slow in Python.
    * **Mini-Batch:** Updates weights after a small batch (e.g., 10 samples). This leverages matrix multiplication speed and provides a more stable convergence.
    * **Shuffling:** Data is shuffled every epoch to prevent the model from memorizing the order of samples.

3.  **GUI Hooks (`callback`):**
    * The training loop now accepts a `callback` function. This allows GUI to listen to progress, update progress bars, and send "Stop" signals safely.

In [None]:
class Neural_Network:
    def __init__(self):
        self.layers=[]
        self.loss=Activation.mse
        self.delta_loss=Activation.del_mse
        self.loss_history=[]
        
    def Add(self,user_layer):
        self.layers.append(user_layer)

    def Predict(self,input_data):
        output = input_data
        for layer in self.layers:
            output = layer.forward(output)
        return output
    
    def Training_model(self,x_train,y_train,epochs,callback=None,batch_size=32):
        self.loss_history = []
        samples = len(x_train)
        
        for epoch in range(epochs):
            epoch_loss = 0
            idx= np.arange(samples)
            np.random.shuffle(idx)
            x_train = x_train[idx]
            y_train = y_train[idx]
            for j in range(0,samples,batch_size):
                x_batch=x_train[j:j+batch_size]
                y_batch=y_train[j:j+batch_size]
                output = x_batch
                for layer in self.layers:
                    output = layer.forward(output)
                
                y_true =y_batch
                epoch_loss += self.loss(y_true, output)

                error = self.delta_loss(y_true, output)
                for layer in reversed(self.layers):
                    error = layer.backward(error)
                    
            num_batches= samples/batch_size
            epoch_loss/=num_batches
            self.loss_history.append(epoch_loss)
            
            if callback is not None:
                stop=callback(epoch,epoch_loss)
                if stop is True:
                    break