# Task 5.4: Testing Suite

**Module:** 5 - Phase 1 Capstone: MicroGrad+  
**Time:** 1.5 hours  
**Difficulty:** ‚≠ê‚≠ê‚≠ê

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Understand why testing is crucial for neural network code
- [ ] Write unit tests for tensor operations
- [ ] Implement gradient checking to verify backpropagation
- [ ] Test neural network layers and training loops

---

## üìö Prerequisites

- Completed: Tasks 5.1-5.3
- Knowledge of: Python testing basics

---

## üåç Real-World Context

Testing neural network code is notoriously difficult because:
1. **Silent failures**: Wrong gradients don't crash - they just train slowly or incorrectly
2. **Randomness**: Neural networks have random initialization and stochastic training
3. **Numerical issues**: Floating-point math can hide bugs

Good tests catch bugs before they waste days of training time!

---

## üßí ELI5: Why Test Neural Networks?

> **Imagine you're building a very complicated LEGO spaceship** from a 1000-page instruction book.
>
> - **Without tests**: You build the whole thing, then discover it doesn't fly because one tiny piece on page 347 was wrong. Now you have to check everything!
>
> - **With tests**: After each section, you test that part works. "Wheels spin? Check! Wings attach? Check!" If something breaks, you know exactly which section to fix.
>
> **In AI terms:** Neural network bugs often don't cause errors - they just make the model learn slowly or learn wrong things. Tests help us catch these subtle bugs early.

In [None]:
# Setup
import numpy as np
import sys
from pathlib import Path

# Robust path resolution - works regardless of working directory
def _find_module_root():
    """Find the module root directory containing micrograd_plus."""
    current = Path.cwd()
    for parent in [current] + list(current.parents):
        if (parent / 'micrograd_plus' / '__init__.py').exists():
            return str(parent)
    return str(Path.cwd().parent)

sys.path.insert(0, _find_module_root())

from micrograd_plus import (
    Tensor, Linear, ReLU, Sigmoid, Softmax, Dropout,
    MSELoss, CrossEntropyLoss, SGD, Adam, Sequential
)
from micrograd_plus.utils import set_seed, numerical_gradient

set_seed(42)

---

## Part 1: Testing Framework

We'll build a simple testing framework. In production, you'd use `pytest`, but this shows the concepts.

In [None]:
class TestResult:
    """Store results of a test run."""
    def __init__(self):
        self.passed = 0
        self.failed = 0
        self.errors = []
    
    def add_pass(self, name):
        self.passed += 1
        print(f"  ‚úÖ {name}")
    
    def add_fail(self, name, message):
        self.failed += 1
        self.errors.append((name, message))
        print(f"  ‚ùå {name}: {message}")
    
    def summary(self):
        total = self.passed + self.failed
        print(f"\n{'='*50}")
        print(f"Results: {self.passed}/{total} tests passed")
        if self.failed > 0:
            print(f"\nFailed tests:")
            for name, msg in self.errors:
                print(f"  - {name}: {msg}")
        return self.failed == 0

def assert_close(actual, expected, name, atol=1e-5):
    """Assert two values are close."""
    if isinstance(actual, Tensor):
        actual = actual.data
    if isinstance(expected, Tensor):
        expected = expected.data
    
    actual = np.array(actual)
    expected = np.array(expected)
    
    if not np.allclose(actual, expected, atol=atol):
        max_diff = np.max(np.abs(actual - expected))
        raise AssertionError(f"max diff = {max_diff:.2e}, expected < {atol}")

def run_test(test_fn, result):
    """Run a single test function."""
    name = test_fn.__name__
    try:
        test_fn()
        result.add_pass(name)
    except AssertionError as e:
        result.add_fail(name, str(e))
    except Exception as e:
        result.add_fail(name, f"Error: {type(e).__name__}: {e}")

---

## Part 2: Testing Tensor Operations

Let's test that basic tensor operations work correctly.

In [None]:
def test_tensor_creation():
    """Test tensor can be created from various inputs."""
    # From list
    t1 = Tensor([1, 2, 3])
    assert t1.shape == (3,)
    
    # From numpy array
    t2 = Tensor(np.array([[1, 2], [3, 4]]))
    assert t2.shape == (2, 2)
    
    # From scalar
    t3 = Tensor(5.0)
    assert t3.shape == ()

def test_tensor_addition():
    """Test element-wise addition."""
    a = Tensor([1, 2, 3])
    b = Tensor([4, 5, 6])
    c = a + b
    assert_close(c, [5, 7, 9], "addition result")

def test_tensor_multiplication():
    """Test element-wise multiplication."""
    a = Tensor([1, 2, 3])
    b = Tensor([4, 5, 6])
    c = a * b
    assert_close(c, [4, 10, 18], "multiplication result")

def test_tensor_matmul():
    """Test matrix multiplication."""
    a = Tensor([[1, 2], [3, 4]])
    b = Tensor([[5, 6], [7, 8]])
    c = a @ b
    expected = np.array([[19, 22], [43, 50]])
    assert_close(c, expected, "matmul result")

def test_tensor_broadcasting():
    """Test broadcasting in operations."""
    a = Tensor([[1, 2, 3], [4, 5, 6]])  # (2, 3)
    b = Tensor([10, 20, 30])  # (3,) broadcasts to (2, 3)
    c = a + b
    expected = np.array([[11, 22, 33], [14, 25, 36]])
    assert_close(c, expected, "broadcast addition")

def test_tensor_sum():
    """Test sum reduction."""
    a = Tensor([[1, 2], [3, 4]])
    assert_close(a.sum(), 10, "total sum")
    assert_close(a.sum(axis=0), [4, 6], "sum axis 0")
    assert_close(a.sum(axis=1), [3, 7], "sum axis 1")

def test_tensor_mean():
    """Test mean reduction."""
    a = Tensor([[1, 2], [3, 4]])
    assert_close(a.mean(), 2.5, "mean")

# Run tensor tests
print("Testing Tensor Operations:")
print("-" * 50)
result = TestResult()

for test in [test_tensor_creation, test_tensor_addition, test_tensor_multiplication,
             test_tensor_matmul, test_tensor_broadcasting, test_tensor_sum, test_tensor_mean]:
    run_test(test, result)

result.summary()

---

## Part 3: Gradient Checking

The most important tests for an autograd system verify that **analytical gradients match numerical gradients**.

### The Idea

Numerical gradient using finite differences:
$$\frac{\partial f}{\partial x} \approx \frac{f(x + \epsilon) - f(x - \epsilon)}{2\epsilon}$$

This is slow but reliable. We compare it to our fast analytical gradient.

In [None]:
def gradient_check(f, x, eps=1e-5, atol=1e-4):
    """
    Check that analytical gradient matches numerical gradient.
    
    Args:
        f: Function that takes a Tensor and returns a scalar Tensor
        x: Input tensor with requires_grad=True
        eps: Perturbation for numerical gradient
        atol: Absolute tolerance for comparison
    
    Returns:
        (passed: bool, max_error: float)
    """
    # Compute analytical gradient
    x.zero_grad()
    y = f(x)
    y.backward()
    analytical = x.grad.copy()
    
    # Compute numerical gradient
    def numpy_f(arr):
        return f(Tensor(arr)).data.item()
    
    numerical = numerical_gradient(numpy_f, x.data.copy(), eps)
    
    # Compare
    max_error = np.max(np.abs(analytical - numerical))
    passed = np.allclose(analytical, numerical, atol=atol)
    
    return passed, max_error

In [None]:
def test_gradient_addition():
    """Test gradient of addition."""
    x = Tensor([1.0, 2.0, 3.0], requires_grad=True)
    passed, error = gradient_check(lambda t: (t + 5).sum(), x)
    assert passed, f"gradient error: {error:.2e}"

def test_gradient_multiplication():
    """Test gradient of element-wise multiplication."""
    x = Tensor([1.0, 2.0, 3.0], requires_grad=True)
    y = Tensor([4.0, 5.0, 6.0])
    passed, error = gradient_check(lambda t: (t * y).sum(), x)
    assert passed, f"gradient error: {error:.2e}"

def test_gradient_power():
    """Test gradient of power operation."""
    x = Tensor([1.0, 2.0, 3.0], requires_grad=True)
    passed, error = gradient_check(lambda t: (t ** 2).sum(), x)
    assert passed, f"gradient error: {error:.2e}"

def test_gradient_matmul():
    """Test gradient of matrix multiplication."""
    np.random.seed(42)
    x = Tensor(np.random.randn(3, 4).astype(np.float32), requires_grad=True)
    w = Tensor(np.random.randn(4, 2).astype(np.float32))
    passed, error = gradient_check(lambda t: (t @ w).sum(), x)
    assert passed, f"gradient error: {error:.2e}"

def test_gradient_relu():
    """Test gradient of ReLU."""
    # Avoid x=0 where gradient is undefined
    x = Tensor([-2.0, -0.5, 0.5, 2.0], requires_grad=True)
    passed, error = gradient_check(lambda t: t.relu().sum(), x)
    assert passed, f"gradient error: {error:.2e}"

def test_gradient_sigmoid():
    """Test gradient of sigmoid."""
    x = Tensor([-2.0, 0.0, 2.0], requires_grad=True)
    passed, error = gradient_check(lambda t: t.sigmoid().sum(), x)
    assert passed, f"gradient error: {error:.2e}"

def test_gradient_softmax():
    """Test gradient of softmax."""
    x = Tensor([[1.0, 2.0, 3.0]], requires_grad=True)
    passed, error = gradient_check(lambda t: t.softmax().sum(), x)
    assert passed, f"gradient error: {error:.2e}"

def test_gradient_log():
    """Test gradient of log."""
    x = Tensor([1.0, 2.0, 3.0], requires_grad=True)
    passed, error = gradient_check(lambda t: t.log().sum(), x)
    assert passed, f"gradient error: {error:.2e}"

def test_gradient_mean():
    """Test gradient of mean."""
    x = Tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
    passed, error = gradient_check(lambda t: t.mean(), x)
    assert passed, f"gradient error: {error:.2e}"

# Run gradient tests
print("\nTesting Gradients:")
print("-" * 50)
result = TestResult()

for test in [test_gradient_addition, test_gradient_multiplication, test_gradient_power,
             test_gradient_matmul, test_gradient_relu, test_gradient_sigmoid,
             test_gradient_softmax, test_gradient_log, test_gradient_mean]:
    run_test(test, result)

result.summary()

---

## Part 4: Testing Layers

In [None]:
def test_linear_layer_shape():
    """Test Linear layer output shape."""
    layer = Linear(10, 5)
    x = Tensor(np.random.randn(3, 10).astype(np.float32))
    y = layer(x)
    assert y.shape == (3, 5), f"expected (3, 5), got {y.shape}"

def test_linear_layer_gradient():
    """Test Linear layer gradients."""
    np.random.seed(42)
    layer = Linear(4, 3)
    x = Tensor(np.random.randn(2, 4).astype(np.float32), requires_grad=True)
    
    # Forward and backward
    y = layer(x)
    loss = y.sum()
    loss.backward()
    
    # Check gradients exist
    assert layer.weight.grad is not None, "weight gradient is None"
    assert layer.bias.grad is not None, "bias gradient is None"
    assert x.grad is not None, "input gradient is None"
    
    # Check gradient shapes
    assert layer.weight.grad.shape == (4, 3), f"weight grad shape wrong"
    assert layer.bias.grad.shape == (3,), f"bias grad shape wrong"
    assert x.grad.shape == (2, 4), f"input grad shape wrong"

def test_dropout_training_mode():
    """Test Dropout drops values in training mode."""
    set_seed(42)
    dropout = Dropout(p=0.5)
    dropout.train()
    
    x = Tensor(np.ones((100, 100)))
    y = dropout(x)
    
    # Some values should be 0
    num_zeros = np.sum(y.data == 0)
    assert num_zeros > 0, "no values dropped"
    
    # Non-zero values should be scaled (approximately 2x for p=0.5)
    non_zero_vals = y.data[y.data != 0]
    mean_non_zero = np.mean(non_zero_vals)
    assert 1.8 < mean_non_zero < 2.2, f"scaling wrong: {mean_non_zero}"

def test_dropout_eval_mode():
    """Test Dropout is identity in eval mode."""
    dropout = Dropout(p=0.5)
    dropout.eval()
    
    x = Tensor(np.ones((10, 10)))
    y = dropout(x)
    
    assert_close(y, x, "dropout in eval mode should be identity")

def test_sequential_forward():
    """Test Sequential container forward pass."""
    model = Sequential(
        Linear(10, 5),
        ReLU(),
        Linear(5, 2)
    )
    
    x = Tensor(np.random.randn(3, 10).astype(np.float32))
    y = model(x)
    
    assert y.shape == (3, 2), f"expected (3, 2), got {y.shape}"

def test_sequential_parameters():
    """Test Sequential collects all parameters."""
    model = Sequential(
        Linear(10, 5),  # 10*5 + 5 = 55 params
        ReLU(),  # 0 params
        Linear(5, 2)  # 5*2 + 2 = 12 params
    )
    
    params = model.parameters()
    total = sum(p.size for p in params)
    
    assert total == 67, f"expected 67 params, got {total}"

# Run layer tests
print("\nTesting Layers:")
print("-" * 50)
result = TestResult()

for test in [test_linear_layer_shape, test_linear_layer_gradient,
             test_dropout_training_mode, test_dropout_eval_mode,
             test_sequential_forward, test_sequential_parameters]:
    run_test(test, result)

result.summary()

---

## Part 5: Testing Loss Functions

In [None]:
def test_mse_loss_value():
    """Test MSE loss computes correct value."""
    pred = Tensor([1.0, 2.0, 3.0])
    target = Tensor([1.5, 2.0, 2.5])
    
    loss_fn = MSELoss()
    loss = loss_fn(pred, target)
    
    expected = np.mean([0.25, 0, 0.25])  # (0.5^2 + 0 + 0.5^2) / 3
    assert_close(loss, expected, "MSE value")

def test_mse_loss_gradient():
    """Test MSE loss gradient."""
    pred = Tensor([1.0, 2.0, 3.0], requires_grad=True)
    target = Tensor([1.5, 2.0, 2.5])
    
    loss_fn = MSELoss()
    passed, error = gradient_check(
        lambda p: loss_fn(p, target), 
        pred
    )
    assert passed, f"gradient error: {error:.2e}"

def test_cross_entropy_loss_gradient():
    """Test cross-entropy loss gradient."""
    logits = Tensor([[2.0, 1.0, 0.1], [0.5, 2.0, 0.3]], requires_grad=True)
    targets = Tensor([0, 1])
    
    loss_fn = CrossEntropyLoss()
    
    # Gradient check
    passed, error = gradient_check(
        lambda l: loss_fn(l, targets),
        logits,
        atol=1e-3  # Slightly higher tolerance for CE
    )
    assert passed, f"gradient error: {error:.2e}"

# Run loss tests
print("\nTesting Loss Functions:")
print("-" * 50)
result = TestResult()

for test in [test_mse_loss_value, test_mse_loss_gradient, test_cross_entropy_loss_gradient]:
    run_test(test, result)

result.summary()

---

## Part 6: Testing Optimizers

In [None]:
def test_sgd_basic_step():
    """Test SGD performs correct update."""
    x = Tensor([1.0], requires_grad=True)
    optimizer = SGD([x], lr=0.1)
    
    # Simulate gradient
    x.grad = np.array([2.0])  # Gradient of 2
    
    # Step
    optimizer.step()
    
    # x = x - lr * grad = 1.0 - 0.1 * 2.0 = 0.8
    assert_close(x.data, [0.8], "SGD step")

def test_sgd_momentum():
    """Test SGD with momentum accumulates velocity."""
    x = Tensor([1.0], requires_grad=True)
    optimizer = SGD([x], lr=0.1, momentum=0.9)
    
    # Two steps with same gradient
    x.grad = np.array([1.0])
    optimizer.step()  # v = 1.0, x = 1.0 - 0.1 * 1.0 = 0.9
    
    x.grad = np.array([1.0])
    optimizer.step()  # v = 0.9 * 1.0 + 1.0 = 1.9, x = 0.9 - 0.1 * 1.9 = 0.71
    
    assert_close(x.data, [0.71], "SGD with momentum", atol=1e-4)

def test_adam_converges():
    """Test Adam converges on simple problem."""
    x = Tensor([0.0], requires_grad=True)
    optimizer = Adam([x], lr=0.1)
    
    # Minimize (x - 3)^2
    for _ in range(100):
        loss = (x - 3) ** 2
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    assert abs(x.item() - 3.0) < 0.1, f"Adam didn't converge: x = {x.item()}"

def test_optimizer_zero_grad():
    """Test zero_grad resets gradients."""
    x = Tensor([1.0], requires_grad=True)
    optimizer = SGD([x], lr=0.1)
    
    # Set gradient
    x.grad = np.array([5.0])
    
    # Zero it
    optimizer.zero_grad()
    
    assert_close(x.grad, [0.0], "zero_grad")

# Run optimizer tests
print("\nTesting Optimizers:")
print("-" * 50)
result = TestResult()

for test in [test_sgd_basic_step, test_sgd_momentum, test_adam_converges, test_optimizer_zero_grad]:
    run_test(test, result)

result.summary()

---

## Part 7: End-to-End Training Test

In [None]:
def test_end_to_end_training():
    """Test full training loop works and improves loss."""
    set_seed(42)
    
    # Simple XOR problem
    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)
    y = np.array([0, 1, 1, 0], dtype=np.int32)  # XOR
    
    # Model
    model = Sequential(
        Linear(2, 8),
        ReLU(),
        Linear(8, 2)
    )
    
    loss_fn = CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=0.1)
    
    # Training
    X_tensor = Tensor(X, requires_grad=True)
    y_tensor = Tensor(y)
    
    initial_loss = loss_fn(model(X_tensor), y_tensor).item()
    
    for _ in range(500):
        logits = model(X_tensor)
        loss = loss_fn(logits, y_tensor)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    final_loss = loss_fn(model(X_tensor), y_tensor).item()
    
    # Check loss decreased
    assert final_loss < initial_loss * 0.1, \
        f"loss didn't decrease enough: {initial_loss:.4f} -> {final_loss:.4f}"
    
    # Check accuracy
    model.eval()
    predictions = np.argmax(model(X_tensor).data, axis=1)
    accuracy = np.mean(predictions == y)
    
    assert accuracy == 1.0, f"didn't solve XOR: accuracy = {accuracy}"

# Run end-to-end test
print("\nTesting End-to-End Training:")
print("-" * 50)
result = TestResult()
run_test(test_end_to_end_training, result)
result.summary()

---

## Part 8: Running All Tests

Let's run the complete test suite.

In [None]:
# Complete test suite
all_tests = [
    # Tensor operations
    test_tensor_creation, test_tensor_addition, test_tensor_multiplication,
    test_tensor_matmul, test_tensor_broadcasting, test_tensor_sum, test_tensor_mean,
    
    # Gradients
    test_gradient_addition, test_gradient_multiplication, test_gradient_power,
    test_gradient_matmul, test_gradient_relu, test_gradient_sigmoid,
    test_gradient_softmax, test_gradient_log, test_gradient_mean,
    
    # Layers
    test_linear_layer_shape, test_linear_layer_gradient,
    test_dropout_training_mode, test_dropout_eval_mode,
    test_sequential_forward, test_sequential_parameters,
    
    # Loss functions
    test_mse_loss_value, test_mse_loss_gradient, test_cross_entropy_loss_gradient,
    
    # Optimizers
    test_sgd_basic_step, test_sgd_momentum, test_adam_converges, test_optimizer_zero_grad,
    
    # End-to-end
    test_end_to_end_training,
]

print("="*60)
print("FULL TEST SUITE")
print("="*60)

result = TestResult()
for test in all_tests:
    run_test(test, result)

all_passed = result.summary()

if all_passed:
    print("\nüéâ All tests passed! Your MicroGrad+ implementation is correct!")
else:
    print("\n‚ö†Ô∏è Some tests failed. Review the errors above.")

---

## ‚úã Try It Yourself: Write Your Own Tests

Write tests for:
1. Tensor `reshape` operation
2. Tensor `tanh` activation gradient
3. `BatchNorm` layer (if you implemented it)

In [None]:
# YOUR CODE HERE
def test_tensor_reshape():
    """Test reshape operation."""
    # TODO: Create a tensor, reshape it, verify shape and values
    pass

def test_gradient_tanh():
    """Test tanh gradient."""
    # TODO: Use gradient_check on tanh
    pass

# Run your tests
# result = TestResult()
# run_test(test_tensor_reshape, result)
# run_test(test_gradient_tanh, result)
# result.summary()

---

## üéâ Checkpoint

You've learned:
- ‚úÖ How to structure tests for neural network code
- ‚úÖ How to verify tensor operations work correctly
- ‚úÖ How to use gradient checking to validate backpropagation
- ‚úÖ How to test layers, loss functions, and optimizers
- ‚úÖ How to run end-to-end training tests

---

## üßπ Cleanup

In [None]:
# Cleanup - release memory
from micrograd_plus.utils import cleanup_notebook
cleanup_notebook(globals())