# Project Demo: Neural Network Library (Part 1)

This notebook demonstrates the core functionality of our NumPy-based neural network library by implementing and validating two key tasks:
1. Gradient checking to verify backpropagation correctness
2. Training a network to solve the XOR problem

---

## Section 1: Gradient Checking

Demonstrating that the backpropagation implementation matches numerical gradient approximations.

The gradient check validates that our analytical gradients (computed via backpropagation) are mathematically equivalent to numerical gradients (computed via finite differences). This is a critical sanity check that proves the backpropagation implementation is correct.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append('..')
from lib.layers import Dense
from lib.activations import Tanh, Sigmoid
from lib.losses import MSE
from lib.network import Sequential

def gradient_check():
    print("----- Gradient Check -----")
    # Setup
    np.random.seed(15)  # For reproducibility
    x_sample = np.random.rand(1, 2)
    y_sample = np.array([[1]])
    layer = Dense(2, 1)
    loss_func = MSE()
    epsilon = 1e-5

    # Get analytic gradient
    output = layer.forward(x_sample)
    loss_func.loss(y_sample, output)
    error_grad = loss_func.loss_prime(y_sample, output)
    layer.backward(error_grad)  # No learning_rate parameter
    analytic_grad = layer.grad_weights.copy()

    # Get numerical gradient
    numerical_grad = np.zeros_like(layer.weights)
    it = np.nditer(layer.weights, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        ix = it.multi_index
        
        # J(W + e)
        original_w = layer.weights[ix]
        layer.weights[ix] = original_w + epsilon
        output_plus = layer.forward(x_sample)
        J_plus = loss_func.loss(y_sample, output_plus)

        # J(W - e)
        layer.weights[ix] = original_w - epsilon
        output_minus = layer.forward(x_sample)
        J_minus = loss_func.loss(y_sample, output_minus)
        
        # (J(W+e) - J(W-e)) / 2e
        numerical_grad[ix] = (J_plus - J_minus) / (2 * epsilon)
        layer.weights[ix] = original_w  # Restore
        it.iternext()
    
    # Print gradients
    print("\nðŸ“Š Gradient Comparison:")
    print(f"\nAnalytic Gradient (from backprop):\n{analytic_grad}")
    print(f"\nNumerical Gradient (finite difference):\n{numerical_grad}")
    print(f"\nElement-wise difference:\n{analytic_grad - numerical_grad}")
    
    # Compare
    diff = np.linalg.norm(analytic_grad - numerical_grad) / np.linalg.norm(analytic_grad + numerical_grad)
    print(f"\nðŸ“ˆ Relative Difference (norm-based): {diff:.2e}")
    assert diff < 1e-4, "Gradient check failed!"
    print("âœ… Gradient check PASSED!")

gradient_check()

----- Gradient Check -----

ðŸ“Š Gradient Comparison:

Analytic Gradient (from backprop):
[[-2.07469049]
 [-0.43725958]]

Numerical Gradient (finite difference):
[[-2.07469049]
 [-0.43725958]]

Element-wise difference:
[[ 1.13935528e-11]
 [-3.50836027e-12]]

ðŸ“ˆ Relative Difference (norm-based): 2.81e-12
âœ… Gradient check PASSED!


---

## Section 2: The XOR Problem

Training a 2-layer network to solve the non-linear XOR gate with 100% accuracy.

The XOR problem is a classic benchmark for neural networks. It's non-linearly separable, meaning a single linear layer cannot solve it. We use a 2-layer network with Tanh activation to demonstrate the power of deep learning for solving non-linear problems.

**Network Architecture:** Input (2) â†’ Dense(16) â†’ Tanh â†’ Dense(1) â†’ Sigmoid â†’ Output

**Training Configuration:**
- Learning Rate: 1.0
- Epochs: 10,000
- Weight Initialization: He initialization
- Optimizer: Stochastic Gradient Descent (SGD)

In [2]:
# Reload modules and setup
import sys
for module_name in list(sys.modules.keys()):
    if module_name.startswith('lib'):
        del sys.modules[module_name]

from lib.network import Sequential
from lib.layers import Dense
from lib.activations import Tanh, Sigmoid
from lib.losses import MSE

np.random.seed(15)  # For reproducibility

# XOR Data
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)
y = np.array([[0], [1], [1], [0]], dtype=np.float32)

# Build model
model = Sequential()
model.add(Dense(2, 16))
model.add(Tanh())
model.add(Dense(16, 1))
model.add(Sigmoid())
model.use_loss(MSE())

print("XOR Problem Setup Complete")
print(f"Input shape: {X.shape}, Output shape: {y.shape}")

XOR Problem Setup Complete
Input shape: (4, 2), Output shape: (4, 1)


In [3]:
print("Training XOR Network")
print("=" * 50)
model.train(X, y, epochs=10000, learning_rate=1.0)
print("=" * 50)
print("Training Complete")


Training XOR Network
Epoch 1/10000   error=0.27088240466551206
Epoch 101/10000   error=0.015723148486261444
Epoch 201/10000   error=0.004237517087118354
Epoch 301/10000   error=0.002213345401600802
Epoch 401/10000   error=0.001449367307188018
Epoch 501/10000   error=0.001060875756746421
Epoch 601/10000   error=0.0008293014499887652
Epoch 701/10000   error=0.0006769302574281667
Epoch 801/10000   error=0.0005696886690966046
Epoch 901/10000   error=0.0004904328516777413
Epoch 1001/10000   error=0.0004296509477596682
Epoch 1101/10000   error=0.00038166460133589805
Epoch 1201/10000   error=0.00034288529660422037
Epoch 1301/10000   error=0.00031093941830725275
Epoch 1401/10000   error=0.0002841972000571237
Epoch 1501/10000   error=0.0002615042006613833
Epoch 1601/10000   error=0.0002420208045432319
Epoch 1701/10000   error=0.00022512237999513938
Epoch 1801/10000   error=0.00021033501341900644
Epoch 1901/10000   error=0.00019729290299562585
Epoch 2001/10000   error=0.00018570937349583842
Epoc

In [4]:
# Final Results
print("\n" + "=" * 50)
print("FINAL RESULTS")
print("=" * 50)

predictions = model.predict(X)
pred_values = np.array([p.flatten()[0] for p in predictions])
rounded_preds = np.round(pred_values)

print("\nRaw Predictions:")
for i, (inp, pred, rounded) in enumerate(zip(X, pred_values, rounded_preds)):
    print(f"  Input {inp} â†’ {pred:.6f} â†’ {int(rounded)}")

print(f"\nFinal Prediction Vector: {rounded_preds}")
print(f"Expected XOR Output:    [0. 1. 1. 0.]")
print(f"âœ“ MATCH: {np.array_equal(rounded_preds, np.array([0., 1., 1., 0.]))}")

# Calculate final loss
final_loss = 0
for i in range(len(X)):
    output = model.predict([X[i]])[0]
    final_loss += model.loss.loss(np.array([[y[i][0]]]), output)
final_loss /= len(X)

print(f"\nFinal Loss: {final_loss:.9f}")
print("=" * 50)


FINAL RESULTS

Raw Predictions:
  Input [0. 0.] â†’ 0.003440 â†’ 0
  Input [0. 1.] â†’ 0.994145 â†’ 1
  Input [1. 0.] â†’ 0.994501 â†’ 1
  Input [1. 1.] â†’ 0.006622 â†’ 0

Final Prediction Vector: [0. 1. 1. 0.]
Expected XOR Output:    [0. 1. 1. 0.]
âœ“ MATCH: True

Final Loss: 0.000030051
