# Neural Networks from Scratch

This chapter implements a neural network from first principles, helping us understand the fundamental concepts behind deep learning. We'll build everything from scratch using only NumPy, avoiding high-level frameworks like TensorFlow or PyTorch.

## Chapter Metadata

- **Title**: Neural Networks from Scratch: A First Principles Approach
- **Chapter**: Understanding and Implementing Neural Networks
- **Key Topics**: 
  - Neural Network Fundamentals
  - Backpropagation
  - Gradient Descent
  - Activation Functions
  - Forward and Backward Pass
  - Training Loop Implementation

## Key Concepts

1. **Neural Network Architecture**
   - Layers, neurons, and weights
   - Forward propagation
   - Activation functions

2. **Training Process**
   - Loss functions
   - Backpropagation
   - Gradient descent optimization

3. **Implementation Components**
   - Layer implementation
   - Forward and backward pass
   - Weight updates and learning

In [None]:
# Import required libraries
import numpy as np
from typing import List, Tuple, Optional
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

## Implementation: Neural Network Components

Let's implement the core components of our neural network. We'll start with activation functions and then build our layer and network classes.

In [None]:
class Activation:
    """Base class for activation functions"""
    
    @staticmethod
    def forward(x: np.ndarray) -> np.ndarray:
        raise NotImplementedError
        
    @staticmethod
    def backward(x: np.ndarray) -> np.ndarray:
        raise NotImplementedError

class ReLU(Activation):
    """Rectified Linear Unit activation function"""
    
    @staticmethod
    def forward(x: np.ndarray) -> np.ndarray:
        return np.maximum(0, x)
    
    @staticmethod
    def backward(x: np.ndarray) -> np.ndarray:
        return np.where(x > 0, 1, 0)

class Sigmoid(Activation):
    """Sigmoid activation function"""
    
    @staticmethod
    def forward(x: np.ndarray) -> np.ndarray:
        return 1 / (1 + np.exp(-x))
    
    @staticmethod
    def backward(x: np.ndarray) -> np.ndarray:
        s = Sigmoid.forward(x)
        return s * (1 - s)

In [None]:
class Layer:
    """Neural network layer implementation"""
    
    def __init__(self, input_size: int, output_size: int, activation: Activation):
        self.weights = np.random.randn(input_size, output_size) * 0.01
        self.bias = np.zeros((1, output_size))
        self.activation = activation
        
        # Cache for backpropagation
        self.input = None
        self.output = None
        self.activation_input = None
        
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Forward pass through the layer"""
        self.input = x
        self.activation_input = x @ self.weights + self.bias
        self.output = self.activation.forward(self.activation_input)
        return self.output
    
    def backward(self, grad_output: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
        """Backward pass through the layer"""
        # Gradient of activation
        grad_activation = grad_output * self.activation.backward(self.activation_input)
        
        # Gradients of weights and bias
        grad_weights = self.input.T @ grad_activation
        grad_bias = np.sum(grad_activation, axis=0, keepdims=True)
        
        # Gradient for next layer
        grad_input = grad_activation @ self.weights.T
        
        return grad_input, grad_weights, grad_bias

In [None]:
class NeuralNetwork:
    """Simple neural network implementation"""
    
    def __init__(self, layer_sizes: List[int], activations: List[Activation]):
        assert len(layer_sizes) >= 2, "Need at least input and output layers"
        assert len(layer_sizes) - 1 == len(activations), "Need activation for each layer except input"
        
        self.layers = []
        for i in range(len(layer_sizes) - 1):
            layer = Layer(layer_sizes[i], layer_sizes[i + 1], activations[i])
            self.layers.append(layer)
    
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Forward pass through the network"""
        current_input = x
        for layer in self.layers:
            current_input = layer.forward(current_input)
        return current_input
    
    def backward(self, grad_output: np.ndarray, learning_rate: float = 0.01):
        """Backward pass through the network"""
        current_gradient = grad_output
        for layer in reversed(self.layers):
            grad_input, grad_weights, grad_bias = layer.backward(current_gradient)
            # Update weights and biases
            layer.weights -= learning_rate * grad_weights
            layer.bias -= learning_rate * grad_bias
            current_gradient = grad_input

## Example: Training on a Simple Dataset

Let's create a simple binary classification problem and train our neural network to solve it.

In [None]:
# Generate a simple dataset: XOR problem
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Create a neural network with one hidden layer
network = NeuralNetwork(
    layer_sizes=[2, 4, 1],  # Input layer: 2, Hidden layer: 4, Output layer: 1
    activations=[ReLU(), Sigmoid()]  # ReLU for hidden layer, Sigmoid for output
)

# Training loop
epochs = 10000
learning_rate = 0.1
losses = []

for epoch in range(epochs):
    # Forward pass
    output = network.forward(X)
    
    # Compute loss (binary cross-entropy)
    loss = -np.mean(y * np.log(output + 1e-15) + (1 - y) * np.log(1 - output + 1e-15))
    losses.append(loss)
    
    # Compute gradient of loss with respect to output
    grad_output = -(y / (output + 1e-15) - (1 - y) / (1 - output + 1e-15)) / len(X)
    
    # Backward pass
    network.backward(grad_output, learning_rate)
    
    # Print progress every 1000 epochs
    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

In [None]:
# Plot the learning curve
plt.figure(figsize=(10, 6))
plt.plot(losses)
plt.title('Training Loss over Time')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid(True)
plt.show()

# Test the network
predictions = network.forward(X)
print("\nFinal predictions:")
print("Input -> Output (Expected)")
for x_i, y_i, pred in zip(X, y, predictions):
    print(f"{x_i} -> {pred[0]:.4f} ({y_i[0]})")

## Insights and Further Exploration

1. **Architecture Decisions**
   - We used a simple architecture with one hidden layer
   - ReLU activation for hidden layer provides non-linearity
   - Sigmoid activation in output layer constrains values between 0 and 1

2. **Training Process**
   - Binary cross-entropy loss is appropriate for binary classification
   - Learning rate and number of epochs affect training stability
   - Network successfully learns the XOR pattern

3. **Potential Improvements**
   - Add regularization to prevent overfitting
   - Implement momentum or adaptive learning rates
   - Add batch processing for larger datasets

### Questions for Further Exploration
1. How would the network perform with different activation functions?
2. What's the minimum number of hidden neurons needed to solve XOR?
3. How does the initialization of weights affect training?

### References and Further Reading
- Neural Networks and Deep Learning by Michael Nielsen
- Deep Learning by Goodfellow, Bengio, and Courville
- CS231n: Convolutional Neural Networks for Visual Recognition