# Modular Neural Network Main File
**Author:** MD Saifullah Baig.A
**Version:** 1.0

## 1. Import Dependencies
We import standard libraries for mathematics (`numpy`) and visualization (`matplotlib`). 
Crucially, we import our custom `Neural_Network_Engine`, which contains the `Neural_Network`, `Connected_Layers`, and `Activation_Layer` classes built from scratch.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.datasets import load_diabetes
from Neural_Network_Engine import Neural_Network,Connected_Layers,Activation_Layer

## 2. Preprocessing Helper Functions

### Standard Scaler (Z-Score Normalization)
Neural networks converge faster and more stably when input features are on a similar scale. This function normalizes the data to have a **Mean ($\mu$) of 0** and a **Standard Deviation ($\sigma$) of 1**.

$$z = \frac{x - \mu}{\sigma + \epsilon}$$

The term $\epsilon$ (set to `1e-8` in the code) is added for **Numerical Stability**.

1.  **Prevents Division by Zero:**
    * Standard Deviation ($\sigma$) measures the spread of data.
    * If a feature column contains **constant values** (e.g., every row has `Age = 25`), the variance and standard deviation will be **0**.
    * Without $\epsilon$, the computer would attempt to divide by zero ($\frac{0}{0}$), causing the program to crash or resulting in `NaN` (Not a Number) or `Infinity`.

2.  **Safety Net:**
    * By adding a tiny number like $0.00000001$, the denominator becomes slightly larger than zero ($0 + \epsilon$), allowing the calculation to proceed safely even on "flat" data features.

**Returns:**
* `scaled`: The normalized data.
* `mean`, `std`: Stored to inverse-transform predictions later.

### ❓ Why is Scaling Necessary?

In Deep Learning, **Standard Scaling** is not optional—it is mathematically critical for the network to learn effectively. Here are the three main reasons why:

#### 1. Prevents Feature Dominance ("Apples vs. Oranges")
Neural networks use matrix multiplication (`Input * Weight`). If features have vastly different ranges, the larger numbers will dominate the learning process.
* **Example:**
    * *BMI:* Range 18–35
    * *Income:* Range 20,000–100,000
* **Result without Scaling:** The network sees "Income" as 1000x more important than "BMI" simply because the number is bigger. It effectively ignores the smaller feature.
* **With Scaling:** Both features are forced into a similar range (approx. -3 to +3), giving them equal importance.

#### 2. Faster Convergence (The "Bowl" Shape)
The optimizer (Gradient Descent) tries to find the lowest error.
* **Unscaled Data:** The error surface looks like a long, narrow valley. The optimizer zig-zags back and forth, taking a long time to reach the bottom.
* **Scaled Data:** The error surface looks like a symmetrical bowl. The optimizer can take a direct path to the minimum, reducing training time significantly.

#### 3. Avoids Vanishing Gradients (Activation Saturation)
Activation functions like `Tanh` and `Sigmoid` are sensitive to large inputs.
* **The Problem:** `Tanh(100)` is `1.0`. The slope (gradient) at this point is **Zero**.
* **The Consequence:** If you feed raw large numbers (like 150) into the network, the gradients become zero immediately. The weights stop updating, and the network stops learning (Vanishing Gradient Problem).
* **The Solution:** Scaling keeps inputs close to 0 (e.g., -1 to 1), where the activation functions have the steepest slope and strongest gradients.

| Feature | Raw Data | Scaled Data |
| :--- | :--- | :--- |
| **Range** | Wildly different (e.g., 0.001 to 1,000,000) | Standardized (~ -3 to +3) |
| **Training Speed** | Very Slow | Fast |
| **Stability** | Prone to NaN / Infinity errors | Stable |

In [None]:
def Standard_Scaler(data):
    mean=np.mean(data,axis=0)
    std=np.std(data,axis=0)+1e-8
    scaled=(data-mean)/std
    return scaled,mean,std

### Train-Test Split
To evaluate the model fairly, we must test it on data it has never seen before.
This function:
1. Generates a list of indices.
2. **Shuffles** them randomly to remove any ordering bias.
3. Splits the data into **Training (80%)** and **Testing (20%)** sets.

In [None]:
def train_test_split(X,Y,test_size=0.2):
    idx=np.arange(X.shape[0])
    np.random.shuffle(idx)
    split_range=int(X.shape[0]*(1-test_size))
    train_idx,test_idx=idx[:split_range],idx[split_range:]
    return X[train_idx],X[test_idx],Y[train_idx],Y[test_idx]

## 3. Visualization
This function generates two plots to evaluate performance:
1. **Training Convergence:** Plots the MSE Loss over epochs (should decrease).
2. **Prediction Accuracy:** A scatter plot comparing True values vs. Predicted values. A perfect model would align all points on the diagonal line.

In [None]:
def plot(loss_history,true,prediction):
    plt.figure(figsize=(12,5))

    plt.subplot(1,2,1)
    plt.plot(loss_history,label="Training Loss",color="blue")
    plt.title("Training Convergence")
    plt.xlabel("Epochs")
    plt.ylabel("MSE Loss")
    plt.grid(True,linestyle="--",alpha=0.6)
    plt.legend()

    plt.subplot(1,2,2)
    plt.scatter(true,prediction,alpha=0.6,color='red',edgecolors='k')
    least=min(true.min(),prediction.min())
    highest=max(true.max(),prediction.max())
    plt.plot([least,highest],[least,highest],'k--',lw=2,label="Perfect Fit")

    plt.title("True VS Predicted Values")
    plt.xlabel("True labels")
    plt.ylabel("Predicted Value")
    plt.legend()
    plt.grid(True,linestyle='--',alpha=0.6)
    plt.tight_layout()
    plt.show()

## 4. Main Execution Pipeline (`Build`)

This block orchestrates the entire workflow:

1.  **Load-Transform-Scale:** Loads the Diabetes dataset.
2.  **Preprocessing:** Applies our custom `Standard_Scaler` to inputs ($X$) and targets ($y$).
3.  **Splitting:** Divides data into Train/Test sets.
4.  **Architecture:** Defines the Neural Network topology:
    * Input Layer: 10 Features
    * Hidden Layer 1: Fully Connected -> ReLU
    * Hidden Layer 2: Fully Connected -> ReLU
    * Output Layer: Fully Connected (Linear) -> Tanh (Optional)
5.  **Training:** Optimizes weights using Backpropagation over 1000 epochs.
6.  **Evaluation:** Predicts on test data and performs **Inverse Scaling** to get real-world values.

In [None]:
def Build():
    diabetes=load_diabetes()
    X_raw=diabetes.data
    y_raw=diabetes.target.reshape(-1, 1)
    
    X_scaled,mean_x,std_x=Standard_Scaler(X_raw)
    y_scaled,mean_y,std_y=Standard_Scaler(y_raw)

    X_train,X_test,y_train,y_test=train_test_split(X_scaled,y_scaled,test_size=0.2)

    model=Neural_Network()
    
    model.Add(Connected_Layers(10,5,learning_rate=0.01))
    model.Add(Activation_Layer('relu'))
    model.Add(Connected_Layers(5,1,learning_rate=0.01))
    model.Add(Activation_Layer('tanh'))
    
    model.Training_model(X_train,y_train,epochs=100)
    
    preds_scaled=model.Predict(X_test)
    preds_scaled=np.array(preds_scaled).reshape(-1, 1)
    
    preds_actual=(preds_scaled*std_y)+mean_y
    y_test_actual=(y_test*std_y)+mean_y
    
    mse=Activation.mse(y_test_actual,preds_actual)

    plot(model.loss_history,y_test_actual,preds_actual)


### Run the Model
Execute the function below to start training and visualize the results.

In [None]:
if __name__=="__main__":
    Build()