# Lab 1.7.6 Solutions: Documentation and Benchmarks

This notebook contains notes and additional examples for the documentation and benchmarking concepts from Lab 1.7.6.

---

## Note on This Notebook

Lab 1.7.6 (Documentation and Benchmarks) is primarily an educational walkthrough that:

1. Reviews the API documentation of MicroGrad+
2. Compares MicroGrad+ performance against PyTorch
3. Explains why PyTorch is faster
4. Summarizes what was accomplished in the capstone

Unlike the previous notebooks, Lab 1.7.6 does not have specific exercises to solve. Instead, this solution notebook provides additional context and examples for students who want to explore further.

In [None]:
import numpy as np
import sys
from pathlib import Path

def _find_module_root():
    current = Path.cwd()
    for parent in [current] + list(current.parents):
        if (parent / 'micrograd_plus' / '__init__.py').exists():
            return str(parent)
    return str(Path.cwd().parent)

sys.path.insert(0, _find_module_root())

from micrograd_plus import (
    Tensor, Linear, ReLU, Dropout, Sequential,
    CrossEntropyLoss, MSELoss, Adam, SGD
)
from micrograd_plus.utils import set_seed

set_seed(42)

---

## Additional Benchmarking Examples

Here are some additional benchmarking patterns you might find useful.

In [None]:
import time

def benchmark_operation(name, operation, n_runs=100, warmup=5):
    """
    Benchmark a single operation with proper warmup and timing.
    
    Args:
        name: Name of the operation for display
        operation: Callable that performs the operation
        n_runs: Number of timed runs
        warmup: Number of warmup runs (not timed)
    
    Returns:
        Dictionary with timing statistics
    """
    # Warmup runs
    for _ in range(warmup):
        operation()
    
    # Timed runs
    times = []
    for _ in range(n_runs):
        start = time.perf_counter()
        operation()
        elapsed = time.perf_counter() - start
        times.append(elapsed)
    
    times = np.array(times)
    
    return {
        'name': name,
        'mean_ms': times.mean() * 1000,
        'std_ms': times.std() * 1000,
        'min_ms': times.min() * 1000,
        'max_ms': times.max() * 1000
    }

# Example: Benchmark matrix multiplication
A = Tensor(np.random.randn(256, 512).astype(np.float32))
B = Tensor(np.random.randn(512, 256).astype(np.float32))

result = benchmark_operation(
    'MatMul (256x512 @ 512x256)',
    lambda: A @ B,
    n_runs=100
)

print(f"Operation: {result['name']}")
print(f"  Mean: {result['mean_ms']:.2f} ms")
print(f"  Std:  {result['std_ms']:.2f} ms")
print(f"  Min:  {result['min_ms']:.2f} ms")
print(f"  Max:  {result['max_ms']:.2f} ms")

---

## Writing Good Documentation

Here's an example of comprehensive documentation following Google-style docstrings.

In [None]:
def example_documented_function(input_tensor, learning_rate=0.01, momentum=0.9,
                                 epsilon=1e-8, verbose=False):
    """
    Perform an example computation with comprehensive documentation.
    
    This function demonstrates how to write good documentation following
    Google-style docstring conventions. The docstring should include:
    
    1. A brief one-line summary
    2. A longer description if needed
    3. Arguments with types and descriptions
    4. Return value with type and description
    5. Exceptions that might be raised
    6. Example usage
    
    Args:
        input_tensor: Input tensor of shape (batch_size, features).
            Must be a Tensor object with requires_grad=True for
            gradient computation.
        learning_rate: Step size for the computation. Defaults to 0.01.
            Typical values are in the range [1e-4, 1e-1].
        momentum: Momentum factor for smoothing. Defaults to 0.9.
            Set to 0 to disable momentum.
        epsilon: Small constant for numerical stability. Defaults to 1e-8.
        verbose: Whether to print debug information. Defaults to False.
    
    Returns:
        Tensor: The computed output tensor of the same shape as input.
            The output maintains gradient tracking if input has gradients.
    
    Raises:
        ValueError: If input_tensor has fewer than 2 dimensions.
        TypeError: If input_tensor is not a Tensor object.
    
    Example:
        >>> x = Tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
        >>> output = example_documented_function(x, learning_rate=0.01)
        >>> print(output.shape)  # (2, 2)
    
    Note:
        This is a demonstration function and doesn't perform any
        meaningful computation. See the MicroGrad+ source code for
        real implementations.
    """
    if not isinstance(input_tensor, Tensor):
        raise TypeError(f"Expected Tensor, got {type(input_tensor)}")
    
    if len(input_tensor.shape) < 2:
        raise ValueError(f"Input must have at least 2 dimensions, got {len(input_tensor.shape)}")
    
    if verbose:
        print(f"Input shape: {input_tensor.shape}")
        print(f"Learning rate: {learning_rate}")
    
    # Example computation (just returns scaled input)
    return input_tensor * learning_rate

# Test the documentation
help(example_documented_function)

---

## Summary

Key documentation best practices:

1. **Module docstring**: Explain what the module does and provide a quick example
2. **Class docstring**: Describe purpose, args, attributes, and usage
3. **Method docstring**: Document args, returns, raises, and provide examples
4. **Type hints**: Use them for better IDE support and clarity

Key benchmarking best practices:

1. **Warmup runs**: Always run a few iterations before timing
2. **Multiple runs**: Take average over many runs for reliable results
3. **Report statistics**: Include mean, std, min, max
4. **Fair comparison**: Compare at the same batch sizes and configurations

---

## Congratulations!

You've completed all the notebooks in the Domain 1 Capstone: MicroGrad+ project!

**What you've accomplished:**

- Built a complete autograd engine from scratch
- Implemented neural network layers (Linear, ReLU, Sigmoid, Softmax, Dropout)
- Created loss functions (MSE, CrossEntropy)
- Built optimizers (SGD with momentum, Adam)
- Trained a real model on MNIST
- Learned to test and benchmark your implementations

**Next steps:**

In Domain 2, you'll use PyTorch to:
- Work with real GPU hardware on DGX Spark
- Build advanced architectures (CNNs, RNNs, Transformers)
- Handle real-world datasets
- Scale to larger models