# Neural Networks Practical Workshop - Part 5: Custom vs. PyTorch Comparison

In this notebook, we'll directly compare a neural network built from scratch with an equivalent PyTorch model. Both models will have the same architecture and will be trained and tested on the MNIST dataset.

## 1. Setup and Imports

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import time
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.metrics import accuracy_score

# For reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

## 2. Load MNIST Data

We'll use the preprocessed subset of the MNIST dataset to ensure a fair comparison.

In [None]:
# Load the preprocessed dataset
try:
    # Load data from file
    data = np.load('./processed_data/mnist_subset.npz')
    X_train = data['X_train']
    y_train = data['y_train']
    X_test = data['X_test']
    y_test = data['y_test']
    
    print("Loaded preprocessed MNIST subset:")
except FileNotFoundError:
    print("Preprocessed data not found. Run the '03_mnist_dataset.ipynb' notebook first.")
    # Dummy data for demonstration
    print("Creating small dummy dataset for demonstration...")
    X_train = np.random.randn(1000, 784)
    y_train = np.random.randint(0, 10, 1000)
    X_test = np.random.randn(200, 784)
    y_test = np.random.randint(0, 10, 200)

print(f"Training data shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Test labels shape: {y_test.shape}")

# Convert data to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train).to(device)
y_train_tensor = torch.LongTensor(y_train).to(device)
X_test_tensor = torch.FloatTensor(X_test).to(device)
y_test_tensor = torch.LongTensor(y_test).to(device)

# Create PyTorch datasets and data loaders
batch_size = 64
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

## 3. Custom Neural Network Implementation

Let's create a neural network from scratch with the same architecture as our PyTorch model.

In [None]:
class MNISTNeuralNetworkFromScratch:
    def __init__(self, input_size=784, hidden1_size=128, hidden2_size=64, output_size=10):
        # Initialize weights and biases with Xavier initialization
        self.W1 = np.random.randn(input_size, hidden1_size) * np.sqrt(1 / input_size)
        self.b1 = np.zeros((1, hidden1_size))
        self.W2 = np.random.randn(hidden1_size, hidden2_size) * np.sqrt(1 / hidden1_size)
        self.b2 = np.zeros((1, hidden2_size))
        self.W3 = np.random.randn(hidden2_size, output_size) * np.sqrt(1 / hidden2_size)
        self.b3 = np.zeros((1, output_size))
    
    def relu(self, x):
        # ReLU activation function
        return np.maximum(0, x)
    
    def relu_derivative(self, x):
        # Derivative of ReLU for backpropagation
        return np.where(x > 0, 1, 0)
    
    def softmax(self, x):
        # Softmax activation for output layer
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exp_x / np.sum(exp_x, axis=1, keepdims=True)
    
    def forward(self, X):
        # Forward propagation
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.relu(self.z1)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.relu(self.z2)
        self.z3 = np.dot(self.a2, self.W3) + self.b3
        self.a3 = self.softmax(self.z3)
        return self.a3
    
    def backward(self, X, y, output, learning_rate=0.01):
        # Backpropagation
        batch_size = X.shape[0]
        
        # Convert y to one-hot encoding
        y_one_hot = np.zeros((y.size, output.shape[1]))
        y_one_hot[np.arange(y.size), y] = 1
        
        # Output layer gradients
        dz3 = output - y_one_hot
        dW3 = (1/batch_size) * np.dot(self.a2.T, dz3)
        db3 = (1/batch_size) * np.sum(dz3, axis=0, keepdims=True)
        
        # Second hidden layer gradients
        da2 = np.dot(dz3, self.W3.T)
        dz2 = da2 * self.relu_derivative(self.a2)
        dW2 = (1/batch_size) * np.dot(self.a1.T, dz2)
        db2 = (1/batch_size) * np.sum(dz2, axis=0, keepdims=True)
        
        # First hidden layer gradients
        da1 = np.dot(dz2, self.W2.T)
        dz1 = da1 * self.relu_derivative(self.a1)
        dW1 = (1/batch_size) * np.dot(X.T, dz1)
        db1 = (1/batch_size) * np.sum(dz1, axis=0, keepdims=True)
        
        # Update parameters with gradient descent
        self.W3 -= learning_rate * dW3
        self.b3 -= learning_rate * db3
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
    
    def compute_loss(self, y_true, y_pred):
        # Cross-entropy loss
        # Convert y_true to one-hot encoding
        y_one_hot = np.zeros((y_true.size, y_pred.shape[1]))
        y_one_hot[np.arange(y_true.size), y_true] = 1
        
        # Calculate cross-entropy loss
        loss = -np.mean(np.sum(y_one_hot * np.log(y_pred + 1e-8), axis=1))
        return loss
    
    def train(self, X, y, batch_size=64, epochs=10, learning_rate=0.01):
        n_samples = X.shape[0]
        n_batches = n_samples // batch_size
        
        # Keep track of metrics
        losses = []
        accuracies = []
        
        start_time = time.time()
        
        for epoch in range(epochs):
            epoch_loss = 0
            epoch_correct = 0
            indices = np.random.permutation(n_samples)
            X_shuffled = X[indices]
            y_shuffled = y[indices]
            
            for i in range(n_batches):
                # Get batch
                start_idx = i * batch_size
                end_idx = min((i + 1) * batch_size, n_samples)
                X_batch = X_shuffled[start_idx:end_idx]
                y_batch = y_shuffled[start_idx:end_idx]
                
                # Forward pass
                y_pred = self.forward(X_batch)
                
                # Compute loss and accuracy
                batch_loss = self.compute_loss(y_batch, y_pred)
                epoch_loss += batch_loss * (end_idx - start_idx)
                
                # Count correct predictions
                batch_preds = np.argmax(y_pred, axis=1)
                epoch_correct += np.sum(batch_preds == y_batch)
                
                # Backward pass
                self.backward(X_batch, y_batch, y_pred, learning_rate)
            
            # Calculate epoch metrics
            epoch_loss /= n_samples
            epoch_accuracy = epoch_correct / n_samples * 100
            
            losses.append(epoch_loss)
            accuracies.append(epoch_accuracy)
            
            # Print progress
            print(f"Epoch {epoch+1}/{epochs} - Loss: {epoch_loss:.4f} - Accuracy: {epoch_accuracy:.2f}%")
        
        training_time = time.time() - start_time
        print(f"Training completed in {training_time:.2f} seconds")
        
        return losses, accuracies, training_time
    
    def predict(self, X):
        # Make predictions
        output = self.forward(X)
        return np.argmax(output, axis=1)

## 4. Equivalent PyTorch Model

In [None]:
class EquivalentPyTorchModel(nn.Module):
    def __init__(self, input_size=784, hidden1_size=128, hidden2_size=64, output_size=10):
        super(EquivalentPyTorchModel, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden1_size)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden1_size, hidden2_size)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden2_size, output_size)
        # Not including softmax here as CrossEntropyLoss includes it
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return x

# Initialize the PyTorch model
pytorch_model = EquivalentPyTorchModel().to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
py_optimizer = optim.SGD(pytorch_model.parameters(), lr=0.01, momentum=0.9)

## 5. PyTorch Model Training Functions

In [None]:
def train_pytorch_model(model, dataloader, criterion, optimizer, device, num_epochs=10):
    losses = []
    accuracies = []
    
    start_time = time.time()
    
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0
        
        for inputs, labels in dataloader:
            inputs, labels = inputs.to(device), labels.to(device)
            
            # Zero the gradients
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            
            # Backward pass and optimize
            loss.backward()
            optimizer.step()
            
            # Update statistics
            running_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
        
        epoch_loss = running_loss / total
        epoch_accuracy = correct / total * 100
        
        losses.append(epoch_loss)
        accuracies.append(epoch_accuracy)
        
        print(f"Epoch {epoch+1}/{num_epochs} - Loss: {epoch_loss:.4f} - Accuracy: {epoch_accuracy:.2f}%")
    
    training_time = time.time() - start_time
    print(f"Training completed in {training_time:.2f} seconds")
    
    return losses, accuracies, training_time

def evaluate_pytorch_model(model, dataloader, device):
    model.eval()
    correct = 0
    total = 0
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            all_preds.extend(predicted.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
    
    accuracy = correct / total * 100
    print(f"Test Accuracy: {accuracy:.2f}%")
    
    return accuracy, all_preds, all_labels

## 6. Train the Custom Neural Network

In [None]:
print("Training custom neural network from scratch...")
custom_nn = MNISTNeuralNetworkFromScratch(
    input_size=784, hidden1_size=128, hidden2_size=64, output_size=10)

custom_losses, custom_accuracies, custom_time = custom_nn.train(
    X_train, y_train, batch_size=64, epochs=10, learning_rate=0.01)

# Evaluate custom model on test data
custom_predictions = custom_nn.predict(X_test)
custom_test_accuracy = np.mean(custom_predictions == y_test) * 100
print(f"Custom model test accuracy: {custom_test_accuracy:.2f}%")

## 7. Train the PyTorch Model

In [None]:
print("\nTraining PyTorch model...")
pytorch_losses, pytorch_accuracies, pytorch_time = train_pytorch_model(
    pytorch_model, train_loader, criterion, py_optimizer, device, num_epochs=10)

# Evaluate PyTorch model on test data
pytorch_test_accuracy, pytorch_predictions, test_labels = evaluate_pytorch_model(
    pytorch_model, test_loader, device)

## 8. Compare Performance

In [None]:
# Compare the results
print("\nPerformance Comparison:")
print(f"{'Model':<20} {'Training Time (s)':<20} {'Test Accuracy (%)':<20}")
print(f"{'-'*60}")
print(f"{'Custom Neural Net':<20} {custom_time:<20.2f} {custom_test_accuracy:<20.2f}")
print(f"{'PyTorch Model':<20} {pytorch_time:<20.2f} {pytorch_test_accuracy:<20.2f}")
print(f"{'Speedup':<20} {custom_time/pytorch_time:.2f}x")

## 9. Visualize Training Metrics

In [None]:
# Plot training metrics comparison
plt.figure(figsize=(12, 10))

# Plot training loss
plt.subplot(2, 1, 1)
plt.plot(range(1, len(custom_losses) + 1), custom_losses, 'b-', label='Custom Implementation')
plt.plot(range(1, len(pytorch_losses) + 1), pytorch_losses, 'r-', label='PyTorch Implementation')
plt.title('Training Loss Comparison')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

# Plot training accuracy
plt.subplot(2, 1, 2)
plt.plot(range(1, len(custom_accuracies) + 1), custom_accuracies, 'b-', label='Custom Implementation')
plt.plot(range(1, len(pytorch_accuracies) + 1), pytorch_accuracies, 'r-', label='PyTorch Implementation')
plt.title('Training Accuracy Comparison')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

## 10. Compare Predictions

In [None]:
# Compare predictions visually
plt.figure(figsize=(15, 10))

# Select a few random test examples
num_samples = 5
random_indices = np.random.choice(len(X_test), num_samples, replace=False)

for i, idx in enumerate(random_indices):
    plt.subplot(2, num_samples, i + 1)
    plt.imshow(X_test[idx].reshape(28, 28), cmap='gray')
    plt.title(f"True: {y_test[idx]}")
    plt.axis('off')
    
    plt.subplot(2, num_samples, i + 1 + num_samples)
    plt.bar(['Custom', 'PyTorch'], [custom_predictions[idx], pytorch_predictions[idx]], color=['blue', 'red'])
    plt.title(f"Custom: {custom_predictions[idx]}, PyTorch: {pytorch_predictions[idx]}")

plt.tight_layout()
plt.show()

## 11. Analyze Prediction Differences

In [None]:
# Find examples where predictions differ between models
different_indices = np.where(custom_predictions != pytorch_predictions)[0]

if len(different_indices) > 0:
    print(f"Found {len(different_indices)} examples where models predict differently")
    
    # Visualize a few of these examples
    num_to_display = min(5, len(different_indices))
    display_indices = different_indices[:num_to_display]
    
    plt.figure(figsize=(15, 6))
    for i, idx in enumerate(display_indices):
        plt.subplot(2, num_to_display, i + 1)
        plt.imshow(X_test[idx].reshape(28, 28), cmap='gray')
        plt.title(f"True: {y_test[idx]}")
        plt.axis('off')
        
        plt.subplot(2, num_to_display, i + 1 + num_to_display)
        plt.bar(['Custom', 'PyTorch'], [custom_predictions[idx], pytorch_predictions[idx]], color=['blue', 'red'])
        plt.title(f"Custom: {custom_predictions[idx]}, PyTorch: {pytorch_predictions[idx]}")
    
    plt.suptitle("Examples with Different Predictions")
    plt.tight_layout()
    plt.show()
    
    # Analyze which predictions are correct
    custom_correct = custom_predictions[different_indices] == y_test[different_indices]
    pytorch_correct = pytorch_predictions[different_indices] == y_test[different_indices]
    
    print(f"Of the {len(different_indices)} differing predictions:")
    print(f"  - Custom model is correct in {np.sum(custom_correct)} cases")
    print(f"  - PyTorch model is correct in {np.sum(pytorch_correct)} cases")
    print(f"  - Both models are incorrect in {len(different_indices) - np.sum(custom_correct) - np.sum(pytorch_correct)} cases")
else:
    print("Both models make identical predictions on all test examples!")

## 12. Discussion: Custom vs. PyTorch Implementations

### Key Observations from the Comparison

From the experiment above, we can observe several key points:

1. **Training Speed**: PyTorch typically trains much faster due to its optimized backend, GPU acceleration, and vectorized operations.

2. **Implementation Complexity**: Our custom implementation required explicit coding of the forward and backward passes, while PyTorch handled this automatically through its autograd system.

3. **Performance**: Both implementations should theoretically converge to similar results given enough training time, but PyTorch's optimized operations often lead to better performance and stability.

4. **Memory Usage**: Custom implementations may use less memory for small models but don't scale as efficiently for larger networks.

5. **Code Length**: The custom implementation required significantly more code to achieve the same functionality.

### Advantages of Custom Implementation

1. **Educational Value**: Building from scratch provides deep understanding of neural network mechanics.
2. **Transparency**: Every operation is explicit, making it easier to understand what's happening at each step.
3. **Customization**: Complete control over every aspect of the implementation.
4. **No Dependencies**: Relies only on NumPy, which is a lightweight dependency.

### Advantages of PyTorch Implementation

1. **Performance**: Significantly faster training, especially with GPU acceleration.
2. **Automatic Differentiation**: No need to manually implement backpropagation.
3. **Ecosystem**: Rich set of tools, layers, and pre-built components.
4. **Scalability**: Efficiently handles large models and datasets.
5. **Production-Ready**: Optimized for real-world applications.

### When to Use Each Approach

**Use Custom Implementation When:**
- Learning the fundamentals of neural networks
- Teaching or demonstrating neural network concepts
- Implementing a novel algorithm not available in standard libraries
- Working in environments with limited dependencies

**Use PyTorch (or similar frameworks) When:**
- Building practical applications
- Working with large datasets
- Implementing complex architectures
- Focusing on results rather than implementation details
- Needing GPU acceleration and performance optimization

## 13. Key Insights from Building Neural Networks from Scratch

- **Understanding the Fundamentals**: Implementing neural networks from scratch gives a deep understanding of the core concepts and mathematics behind them.

- **Debugging Skills**: Knowledge of the underlying operations makes it easier to diagnose and fix issues in more complex models later.

- **Appreciation for Frameworks**: Building from scratch helps you appreciate the convenience and optimizations that frameworks like PyTorch provide.

- **Gradient Flow Insights**: Implementing backpropagation manually gives insights into how gradients flow through the network.

- **Architectural Understanding**: Deep knowledge of the fundamentals makes it easier to understand and implement new neural network architectures.

- **Educational Value**: The exercise of building from scratch is invaluable for education and gaining intuition about how neural networks work.

- **Computational Awareness**: Implementing optimizations by hand creates awareness of the computational challenges in neural networks.

## 14. Workshop Conclusion

In this workshop, we've:

1. **Built Neural Networks from Scratch**: We implemented both a simple and a multi-layer neural network using only NumPy, understanding every step of the process.

2. **Explored the MNIST Dataset**: We examined a classic dataset for image classification and prepared it for neural network training.

3. **Used PyTorch**: We built neural networks using PyTorch, leveraging its powerful optimizations and automatic differentiation.

4. **Compared Implementations**: We conducted a direct comparison of custom and framework-based implementations, understanding the trade-offs.

5. **Visualized Results**: We created visualizations to help understand the behavior of our models and their predictions.

This workshop served as both an educational exploration of neural network fundamentals and a practical introduction to using modern deep learning frameworks. Whether you continue to build models from scratch or leverage tools like PyTorch, the understanding gained here will be valuable in your future machine learning endeavors.

## Next Steps for Further Learning

1. **Explore More Complex Architectures**: Try implementing or using CNNs, RNNs, or transformer networks.
2. **Experiment with Different Datasets**: Apply your knowledge to other datasets beyond MNIST.
3. **Dive Deeper into PyTorch**: Learn more advanced features like custom datasets, data augmentation, and model deployment.
4. **Implement Advanced Optimization Techniques**: Explore regularization, batch normalization, and advanced optimization algorithms.
5. **Contribute to Open Source**: Apply your understanding by contributing to machine learning libraries or projects.