# Section 1 : Gradient Checking

Before training our network, it is important to verify that the **backpropagation implementation** is correct.  

Gradient checking is a method to ensure that the **analytical gradients** computed by backpropagation match the **numerical gradients** estimated using finite differences.  

We use the following approach:

1. **Analytical Gradients:**  
   - Computed using the `backward()` methods of our layers.  

2. **Numerical Gradients:**  
   - Approximated with the formula:  
     \[
     \frac{\partial L}{\partial W} \approx \frac{L(W + \epsilon) - L(W - \epsilon)}{2 \epsilon}
     \]  
   - Here, \( \epsilon \) is a very small number (default \( 1e-5 \)).

3. **Comparison:**  
   - We calculate the maximum difference between analytical and numerical gradients.  
   - If the difference is very small (e.g., less than \(1e-6\)), our backpropagation is likely correct.

This step ensures the **core of our neural network library is functioning properly** before training on the XOR problem.


In [None]:
import sys
import os

# Add the lib folder to Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(project_root)

# Test import
from AROGO.layers import Dense
from AROGO.activations import Tanh, Sigmoid
from AROGO.losses import MSELoss
from AROGO.optimizer import SGD
from AROGO.network import Sequential

import numpy as np
import matplotlib.pyplot as plt


def numerical_gradient(model, X, Y, loss_fn, eps=1e-5):
    grads_num = []

    for layer in model.layers:
        if hasattr(layer, "W"):
            dW_num = np.zeros_like(layer.W)

            for i in range(layer.W.shape[0]):
                for j in range(layer.W.shape[1]):
                    W_old = layer.W[i, j]

                    layer.W[i, j] = W_old + eps
                    loss_plus = loss_fn.forward(Y, model.forward(X))

                    layer.W[i, j] = W_old - eps
                    loss_minus = loss_fn.forward(Y, model.forward(X))

                    dW_num[i, j] = (loss_plus - loss_minus) / (2 * eps)
                    layer.W[i, j] = W_old

            grads_num.append(dW_num)

    return grads_num


# analytical gradients
y_pred = model.forward(X)
loss_fn.forward(Y, y_pred)
grad = loss_fn.backward()
model.backward(grad)

# numerical gradients
num = numerical_gradient(model, X, Y, loss_fn)

dense1 = model.layers[0]
print("Analytical:\n", dense1.dW)
print("Numerical:\n", num[0])
print("Max Diff:", np.max(np.abs(dense1.dW - num[0])))

Analytical:
 [[-0.00232373 -0.00274231 -0.00977339  0.01306674]
 [-0.00434791  0.01133584  0.002093    0.01870656]]
Numerical:
 [[-0.00232373 -0.00274231 -0.00977339  0.01306674]
 [-0.00434791  0.01133584  0.002093    0.01870656]]
Max Diff: 2.9750407866402373e-12


# Section 2 : XOR Problem Using Our Custom Neural Network Library

In this section, we demonstrate how to use the neural network library we implemented from scratch to solve the **XOR problem**.  
  
We train the network using:

- **Mean Squared Error (MSE)** as the loss function  
- **Stochastic Gradient Descent (SGD)** as the optimizer  

The goal is to have the network correctly predict all four XOR outputs after training.


In [None]:

import sys
import os

# Add the lib folder to Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(project_root)

# Test import
from AROGO.layers import Dense
from AROGO.activations import Tanh, Sigmoid
from AROGO.losses import MSELoss
from AROGO.optimizer import SGD
from AROGO.network import Sequential

import numpy as np
import matplotlib.pyplot as plt

# XOR dataset
X = np.array([[0,0],[0,1],[1,0],[1,1]])
Y = np.array([[0],[1],[1],[0]])

model = Sequential()
model.add(Dense(2, 4))
model.add(Tanh())
model.add(Dense(4, 1))
model.add(Sigmoid())

loss_fn = MSELoss()
optimizer = SGD(lr=0.1)

model.train(X, Y, loss_fn, optimizer, epochs=5000)

print("Predictions after training:")
print(model.forward(X))

Epoch 0, Loss: 0.2539001108002241
Epoch 2000, Loss: 0.006986212600061806
Epoch 4000, Loss: 0.002426433168836391
Epoch 6000, Loss: 0.00142500829139407
Epoch 8000, Loss: 0.0009993669268076304
Predictions after training:
[[0.01332864]
 [0.97127211]
 [0.97049072]
 [0.03451367]]


# Section 5 : Baseline Comparison with TensorFlow/Keras

To validate our custom neural network library, we implement the same **2-4-1 XOR network** using TensorFlow/Keras.  

We will compare:

1. **Ease of Implementation** – How quickly the network can be built and trained.  
2. **Training Time** – Approximate time to converge.  
3. **Final Predictions** – The output values for the XOR inputs.


In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
import numpy as np

# XOR dataset
X = np.array([[0,0],[0,1],[1,0],[1,1]], dtype=np.float32)
Y = np.array([[0],[1],[1],[0]], dtype=np.float32)

# Define the Keras model
tf_model = Sequential([
    Dense(4, input_dim=2, activation='tanh'),
    Dense(1, activation='sigmoid')
])

# Compile the model
tf_model.compile(optimizer=SGD(learning_rate=0.1), loss='mse')

# Train the model
tf_model.fit(X, Y, epochs=5000, verbose=0)

# Predictions
preds = tf_model.predict(X)
print("Predictions (Keras XOR):")
print(preds)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 59ms/step
Predictions (Keras XOR):
[[0.01457623]
 [0.9690686 ]
 [0.97115225]
 [0.03383316]]


# comparison 
1- ease of implementaion :
our lib has more ease of implementaion than tensor flow (keras) 

2-training time :
our lib has less training time than tensor flow (keras) 

3-final predections :
 
our lib results :
[[0.01332864]
 [0.97127211]
 [0.97049072]
 [0.03451367]]

 tensor flow (keras) :
 [[0.01457623]
 [0.9690686 ]
 [0.97115225]
 [0.03383316]]