# Module 13: PyTorch Introduction (Comparison with TensorFlow)

**Difficulty**: ⭐⭐ (Intermediate)

**Estimated Time**: 45-60 minutes

**Prerequisites**: 
- [Module 04: Introduction to TensorFlow/Keras](04_introduction_to_tensorflow_keras.ipynb)
- [Module 05: Feed-Forward Neural Networks with Keras](05_feedforward_neural_networks_keras.ipynb)

## Learning Objectives

By the end of this notebook, you will be able to:
1. Understand PyTorch tensors and operations
2. Use automatic differentiation with PyTorch's autograd
3. Build neural networks using nn.Module
4. Implement training loops in PyTorch
5. Compare PyTorch and TensorFlow/Keras workflows
6. Decide when to use each framework based on 2025 best practices

## 1. Introduction to PyTorch

**PyTorch** is a deep learning framework developed by Meta (Facebook). It's known for:
- **Dynamic computation graphs**: Build graphs on-the-fly
- **Pythonic API**: Feels natural to Python developers
- **Research-friendly**: Easy to experiment and debug
- **Strong ecosystem**: Wide adoption in research community

### PyTorch vs TensorFlow (2025 Perspective)

| Aspect | PyTorch | TensorFlow/Keras |
|--------|---------|------------------|
| **Learning Curve** | Steeper initially | Easier with Keras high-level API |
| **Flexibility** | Very flexible, explicit | More abstraction, less boilerplate |
| **Research** | Dominant in research (70%+ papers) | Growing research adoption |
| **Production** | Improving (TorchServe, TorchScript) | Mature (TF Serving, TFLite) |
| **Debugging** | Easy (native Python debugging) | Improved with eager execution |
| **Community** | Strong research community | Strong production community |

### When to Use Each (2025 Guidelines)

**Use PyTorch when**:
- Research and experimentation
- Custom architectures and training loops
- Need maximum flexibility
- Working with research papers (most use PyTorch)

**Use TensorFlow/Keras when**:
- Production deployment at scale
- Mobile/edge deployment (TFLite)
- Quick prototyping with standard architectures
- Need extensive ecosystem (TF Extended, TF Data)

## 2. Setup and Imports

In [None]:
# PyTorch imports
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset

# TensorFlow for comparison
import tensorflow as tf
from tensorflow import keras

# Standard libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from time import time

# For reproducibility
torch.manual_seed(42)
np.random.seed(42)
tf.random.set_seed(42)

# Plotting configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')

print(f"PyTorch version: {torch.__version__}")
print(f"TensorFlow version: {tf.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

## 3. PyTorch Tensors

**Tensors** are the fundamental data structure in PyTorch, similar to NumPy arrays but with GPU support.

In [None]:
# Creating tensors
print("=" * 60)
print("Creating Tensors")
print("=" * 60)

# From Python list
tensor_from_list = torch.tensor([1, 2, 3, 4])
print(f"From list: {tensor_from_list}")

# From NumPy array
numpy_array = np.array([[1, 2], [3, 4]])
tensor_from_numpy = torch.from_numpy(numpy_array)
print(f"From NumPy:\n{tensor_from_numpy}")

# Special tensors
zeros = torch.zeros(2, 3)
ones = torch.ones(2, 3)
random = torch.randn(2, 3)  # Random normal distribution

print(f"\nZeros:\n{zeros}")
print(f"\nOnes:\n{ones}")
print(f"\nRandom (normal):\n{random}")

# Tensor properties
print(f"\nShape: {random.shape}")
print(f"Data type: {random.dtype}")
print(f"Device: {random.device}")

In [None]:
# Tensor operations
print("\n" + "=" * 60)
print("Tensor Operations")
print("=" * 60)

a = torch.tensor([[1., 2.], [3., 4.]])
b = torch.tensor([[5., 6.], [7., 8.]])

# Element-wise operations
print(f"Addition:\n{a + b}")
print(f"\nMultiplication (element-wise):\n{a * b}")

# Matrix multiplication
print(f"\nMatrix multiplication:\n{torch.matmul(a, b)}")
# Or using @ operator
print(f"\nUsing @ operator:\n{a @ b}")

# Reshaping
x = torch.randn(4, 3)
print(f"\nOriginal shape: {x.shape}")
print(f"Reshaped to (2, 6): {x.view(2, 6).shape}")
print(f"Flattened: {x.view(-1).shape}")  # -1 infers dimension

In [None]:
# PyTorch vs NumPy comparison
print("\n" + "=" * 60)
print("PyTorch vs NumPy")
print("=" * 60)

# NumPy way
numpy_array = np.random.randn(3, 3)
print(f"NumPy array:\n{numpy_array}")

# PyTorch way
pytorch_tensor = torch.randn(3, 3)
print(f"\nPyTorch tensor:\n{pytorch_tensor}")

# Converting between PyTorch and NumPy
tensor = torch.tensor([1, 2, 3])
numpy_from_tensor = tensor.numpy()
print(f"\nTensor to NumPy: {numpy_from_tensor}")

array = np.array([4, 5, 6])
tensor_from_numpy = torch.from_numpy(array)
print(f"NumPy to Tensor: {tensor_from_numpy}")

print("\n⚠️  Note: Conversions share memory! Modifying one affects the other.")

## 4. Automatic Differentiation with Autograd

**Autograd** is PyTorch's automatic differentiation engine. It tracks operations on tensors and computes gradients automatically.

**Key Concept**: Set `requires_grad=True` to track computations for backpropagation.

In [None]:
# Simple autograd example
print("Simple Autograd Example")
print("=" * 60)

# Create a tensor with gradient tracking
x = torch.tensor([2.0], requires_grad=True)
print(f"x = {x}")

# Perform operations
y = x ** 2 + 3 * x + 1
print(f"y = x² + 3x + 1 = {y}")

# Compute gradient dy/dx
y.backward()  # Compute gradients

print(f"\nGradient dy/dx = 2x + 3 = {x.grad}")
print(f"Analytical result at x=2: {2*2 + 3} ✓")

In [None]:
# More complex example with multiple variables
print("\n" + "=" * 60)
print("Autograd with Multiple Variables")
print("=" * 60)

# Create tensors
a = torch.tensor([3.0], requires_grad=True)
b = torch.tensor([4.0], requires_grad=True)

# Complex operation
c = a * b
d = c + a ** 2
e = torch.sigmoid(d)  # Sigmoid activation

print(f"a = {a.item()}, b = {b.item()}")
print(f"c = a * b = {c.item()}")
print(f"d = c + a² = {d.item()}")
print(f"e = sigmoid(d) = {e.item()}")

# Backpropagate
e.backward()

print(f"\nGradients:")
print(f"de/da = {a.grad.item():.4f}")
print(f"de/db = {b.grad.item():.4f}")

In [None]:
# Gradient accumulation and zeroing
print("\n" + "=" * 60)
print("Gradient Accumulation")
print("=" * 60)

x = torch.tensor([1.0], requires_grad=True)

# First computation
y1 = x ** 2
y1.backward()
print(f"After first backward: x.grad = {x.grad}")

# Second computation (gradients accumulate!)
y2 = x ** 3
y2.backward()
print(f"After second backward (accumulated): x.grad = {x.grad}")

# Zero gradients before next computation
x.grad.zero_()
print(f"After zeroing: x.grad = {x.grad}")

print("\n⚠️  Important: Always zero gradients before backward pass in training!")

## 5. Building Neural Networks with nn.Module

PyTorch uses **nn.Module** as the base class for all neural networks.

In [None]:
# Define a simple neural network
class SimpleNN(nn.Module):
    """
    Simple feed-forward neural network.
    """
    
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        
        # Define layers
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        
        # Dropout for regularization
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        """
        Forward pass through the network.
        
        Args:
            x: Input tensor
        
        Returns:
            Output tensor
        """
        # First hidden layer
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout(x)
        
        # Second hidden layer
        x = self.fc2(x)
        x = F.relu(x)
        x = self.dropout(x)
        
        # Output layer (no activation, will use CrossEntropyLoss)
        x = self.fc3(x)
        
        return x

# Create model instance
model_pytorch = SimpleNN(input_size=784, hidden_size=128, output_size=10)

print("PyTorch Model Architecture:")
print(model_pytorch)

# Count parameters
total_params = sum(p.numel() for p in model_pytorch.parameters())
trainable_params = sum(p.numel() for p in model_pytorch.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

In [None]:
# Equivalent Keras model for comparison
def create_keras_model():
    """Create equivalent model in Keras."""
    model = keras.Sequential([
        keras.layers.InputLayer(input_shape=(784,)),
        keras.layers.Dense(128, activation='relu'),
        keras.layers.Dropout(0.5),
        keras.layers.Dense(128, activation='relu'),
        keras.layers.Dropout(0.5),
        keras.layers.Dense(10, activation='softmax')
    ])
    return model

model_keras = create_keras_model()

print("\nKeras Model Architecture:")
model_keras.summary()

print("\n" + "=" * 60)
print("Key Differences:")
print("=" * 60)
print("PyTorch:")
print("  - Explicit forward() method")
print("  - Activations typically in forward(), not layer definition")
print("  - More control, more code")
print("\nKeras:")
print("  - Sequential API is more concise")
print("  - Activations specified in layer definition")
print("  - Less code, less flexibility")

## 6. Training Loop in PyTorch

Unlike Keras's `model.fit()`, PyTorch requires you to write explicit training loops.

**Typical Training Loop**:
1. Zero gradients
2. Forward pass
3. Compute loss
4. Backward pass (compute gradients)
5. Update weights

In [None]:
# Load and prepare MNIST data
from tensorflow.keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Normalize and flatten
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
X_train = X_train.reshape(-1, 784)
X_test = X_test.reshape(-1, 784)

# Convert to PyTorch tensors
X_train_torch = torch.FloatTensor(X_train)
y_train_torch = torch.LongTensor(y_train)
X_test_torch = torch.FloatTensor(X_test)
y_test_torch = torch.LongTensor(y_test)

# Create PyTorch dataset and dataloader
train_dataset = TensorDataset(X_train_torch, y_train_torch)
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)

print(f"Training samples: {len(X_train_torch)}")
print(f"Test samples: {len(X_test_torch)}")
print(f"Batches per epoch: {len(train_loader)}")

In [None]:
# Define training function
def train_pytorch_model(model, train_loader, epochs=5, learning_rate=0.001):
    """
    Train PyTorch model.
    
    Args:
        model: PyTorch model
        train_loader: DataLoader for training data
        epochs: Number of epochs
        learning_rate: Learning rate
    
    Returns:
        Training history
    """
    # Define loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    history = {'loss': [], 'accuracy': []}
    
    # Training loop
    for epoch in range(epochs):
        model.train()  # Set model to training mode
        
        epoch_loss = 0.0
        correct = 0
        total = 0
        
        for batch_idx, (data, target) in enumerate(train_loader):
            # 1. Zero gradients
            optimizer.zero_grad()
            
            # 2. Forward pass
            outputs = model(data)
            
            # 3. Compute loss
            loss = criterion(outputs, target)
            
            # 4. Backward pass
            loss.backward()
            
            # 5. Update weights
            optimizer.step()
            
            # Track metrics
            epoch_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()
        
        # Calculate epoch metrics
        avg_loss = epoch_loss / len(train_loader)
        accuracy = correct / total
        
        history['loss'].append(avg_loss)
        history['accuracy'].append(accuracy)
        
        print(f"Epoch {epoch+1}/{epochs} - Loss: {avg_loss:.4f}, Accuracy: {accuracy:.4f}")
    
    return history

# Train PyTorch model
print("Training PyTorch model...\n")
pytorch_model = SimpleNN(784, 128, 10)
pytorch_history = train_pytorch_model(pytorch_model, train_loader, epochs=5)

In [None]:
# Train equivalent Keras model for comparison
print("\nTraining Keras model...\n")

keras_model = create_keras_model()
keras_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

keras_history = keras_model.fit(
    X_train, y_train,
    epochs=5,
    batch_size=128,
    verbose=1
)

In [None]:
# Compare training curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss comparison
axes[0].plot(pytorch_history['loss'], label='PyTorch', linewidth=2, marker='o')
axes[0].plot(keras_history.history['loss'], label='Keras', linewidth=2, marker='s')
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Loss', fontsize=12)
axes[0].set_title('Training Loss Comparison', fontsize=13, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Accuracy comparison
axes[1].plot(pytorch_history['accuracy'], label='PyTorch', linewidth=2, marker='o')
axes[1].plot(keras_history.history['accuracy'], label='Keras', linewidth=2, marker='s')
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Accuracy', fontsize=12)
axes[1].set_title('Training Accuracy Comparison', fontsize=13, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nBoth frameworks achieve similar results!")

## 7. Evaluation in PyTorch

In [None]:
def evaluate_pytorch_model(model, X_test, y_test):
    """
    Evaluate PyTorch model.
    
    Args:
        model: Trained PyTorch model
        X_test: Test data
        y_test: Test labels
    
    Returns:
        Test accuracy
    """
    model.eval()  # Set to evaluation mode (disables dropout, etc.)
    
    with torch.no_grad():  # Disable gradient computation for efficiency
        outputs = model(X_test)
        _, predicted = torch.max(outputs.data, 1)
        correct = (predicted == y_test).sum().item()
        accuracy = correct / len(y_test)
    
    return accuracy

# Evaluate PyTorch model
pytorch_test_acc = evaluate_pytorch_model(pytorch_model, X_test_torch, y_test_torch)
print(f"PyTorch Test Accuracy: {pytorch_test_acc:.4f}")

# Evaluate Keras model
keras_test_loss, keras_test_acc = keras_model.evaluate(X_test, y_test, verbose=0)
print(f"Keras Test Accuracy: {keras_test_acc:.4f}")

print("\n⚠️  Important PyTorch concepts:")
print("  - model.eval(): Switch to evaluation mode")
print("  - torch.no_grad(): Disable gradient tracking for inference")
print("  - model.train(): Switch back to training mode")

## 8. Saving and Loading Models

In [None]:
import os

# Create directory for models
os.makedirs('models', exist_ok=True)

# PyTorch: Save entire model
torch.save(pytorch_model, 'models/pytorch_model.pth')
print("PyTorch model saved (entire model).")

# PyTorch: Save only state dict (recommended)
torch.save(pytorch_model.state_dict(), 'models/pytorch_model_state.pth')
print("PyTorch model state dict saved (recommended approach).")

# Loading in PyTorch
loaded_model = SimpleNN(784, 128, 10)
loaded_model.load_state_dict(torch.load('models/pytorch_model_state.pth'))
loaded_model.eval()
print("\nPyTorch model loaded successfully.")

# Verify loaded model works
loaded_acc = evaluate_pytorch_model(loaded_model, X_test_torch, y_test_torch)
print(f"Loaded model accuracy: {loaded_acc:.4f}")

print("\n" + "=" * 60)
print("PyTorch vs Keras: Model Saving")
print("=" * 60)
print("PyTorch:")
print("  - torch.save(model.state_dict(), path): Save weights")
print("  - model.load_state_dict(torch.load(path)): Load weights")
print("  - Requires model architecture to be defined separately")
print("\nKeras:")
print("  - model.save(path): Save entire model (architecture + weights)")
print("  - keras.models.load_model(path): Load complete model")
print("  - Easier but less flexible")

## 9. Converting Between Frameworks

Sometimes you need to convert models between PyTorch and TensorFlow.

In [None]:
# Simple weight transfer example
def transfer_weights_pytorch_to_keras(pytorch_model, keras_model):
    """
    Transfer weights from PyTorch to Keras model.
    Note: This is a simplified example for demonstration.
    Real conversion may require handling layer differences.
    """
    # Get PyTorch weights
    pytorch_weights = []
    for param in pytorch_model.parameters():
        pytorch_weights.append(param.detach().numpy())
    
    print(f"PyTorch model has {len(pytorch_weights)} weight tensors")
    print(f"Keras model has {len(keras_model.weights)} weight tensors")
    
    # For demonstration, show weight shapes
    print("\nPyTorch weight shapes:")
    for i, w in enumerate(pytorch_weights):
        print(f"  {i}: {w.shape}")
    
    print("\nKeras weight shapes:")
    for i, w in enumerate(keras_model.weights):
        print(f"  {i}: {w.shape}")
    
    print("\n⚠️  Note: Full conversion requires careful alignment of layers!")
    print("Tools like ONNX can help with automatic conversion.")

transfer_weights_pytorch_to_keras(pytorch_model, keras_model)

## 10. Advanced PyTorch Features

In [None]:
# Custom loss function
class CustomLoss(nn.Module):
    """Example custom loss function."""
    
    def __init__(self):
        super(CustomLoss, self).__init__()
    
    def forward(self, predictions, targets):
        # Example: MSE + L1 regularization
        mse_loss = F.mse_loss(predictions, targets)
        l1_reg = torch.abs(predictions).mean()
        return mse_loss + 0.1 * l1_reg

# Custom layer
class CustomLayer(nn.Module):
    """Example custom layer."""
    
    def __init__(self, input_size, output_size):
        super(CustomLayer, self).__init__()
        self.weight = nn.Parameter(torch.randn(input_size, output_size))
        self.bias = nn.Parameter(torch.zeros(output_size))
    
    def forward(self, x):
        # Custom computation
        return torch.matmul(x, self.weight) + self.bias

# Learning rate scheduler
model = SimpleNN(784, 128, 10)
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Reduce LR when plateau
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, 
    mode='min',
    factor=0.5,
    patience=2
)

print("Advanced PyTorch Features Demonstrated:")
print("  ✓ Custom loss function")
print("  ✓ Custom layer")
print("  ✓ Learning rate scheduler")
print("\nThese are easier to implement in PyTorch than Keras!")

## 11. Exercise 1: Implement a CNN in PyTorch

**Task**: Create a Convolutional Neural Network for MNIST using PyTorch.

**Requirements**:
1. Define a CNN class with at least 2 convolutional layers
2. Include MaxPooling and Dropout layers
3. Implement the training loop
4. Achieve >98% test accuracy
5. Compare with Keras CNN implementation

In [None]:
# YOUR CODE HERE
# Hint: Use nn.Conv2d for convolutional layers
# Remember to reshape data: X_train.reshape(-1, 1, 28, 28)

pass  # Replace with your implementation

## 12. Exercise 2: Custom Training Loop with Validation

**Task**: Extend the training loop to include validation and early stopping.

**Requirements**:
1. Split training data into train and validation sets
2. Evaluate on validation set after each epoch
3. Implement early stopping (stop if val loss doesn't improve for N epochs)
4. Save the best model based on validation loss
5. Plot training and validation curves

In [None]:
# YOUR CODE HERE
# Hint: Track validation loss and compare with best so far
# Use torch.save() to save best model

pass  # Replace with your implementation

## 13. Exercise 3: Gradient Clipping and Monitoring

**Task**: Implement gradient clipping and monitor gradient norms during training.

**Requirements**:
1. Create a deep network (5+ layers) prone to gradient issues
2. Monitor gradient norms for each layer during training
3. Implement gradient clipping using `torch.nn.utils.clip_grad_norm_`
4. Compare training with and without gradient clipping
5. Visualize gradient norms over epochs

In [None]:
# YOUR CODE HERE
# Hint: After loss.backward(), check gradients before optimizer.step()
# Use: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

pass  # Replace with your implementation

## 14. Summary

### Key Concepts Covered:

1. **PyTorch Tensors**
   - Similar to NumPy arrays with GPU support
   - Easy conversion between NumPy and PyTorch
   - Supports all standard mathematical operations

2. **Automatic Differentiation (Autograd)**
   - `requires_grad=True` enables gradient tracking
   - `backward()` computes gradients
   - Gradients accumulate (must zero them manually)

3. **nn.Module**
   - Base class for all neural networks
   - Define layers in `__init__`, logic in `forward()`
   - More explicit than Keras but more flexible

4. **Training Loop**
   - Explicit steps: zero_grad → forward → loss → backward → step
   - More code than Keras but complete control
   - Better for research and custom training procedures

5. **PyTorch vs TensorFlow/Keras**
   - PyTorch: Research-friendly, flexible, explicit
   - Keras: Production-ready, simple, high-level
   - Both can achieve same results

### PyTorch Best Practices (2025):

- **Use DataLoader** for efficient batching and shuffling
- **Always call model.train()/model.eval()** appropriately
- **Use torch.no_grad()** during inference
- **Zero gradients** before each backward pass
- **Save state_dict** instead of entire model
- **Use GPU** when available: `model.to('cuda')`
- **Profile code** with PyTorch Profiler for optimization

### When to Use Each Framework (2025 Update):

**Choose PyTorch for**:
- Research and paper implementations
- Custom architectures and training loops
- Natural language processing (dominant in NLP)
- Computer vision research
- Learning deep learning fundamentals

**Choose TensorFlow/Keras for**:
- Production deployment at scale
- Mobile and edge devices (TFLite)
- JavaScript deployment (TensorFlow.js)
- Quick prototyping with standard models
- Integration with Google Cloud ecosystem

**Good News**: You can learn both! Core concepts transfer easily.

### Code Comparison Summary:

| Task | PyTorch | Keras |
|------|---------|-------|
| **Model Definition** | Class + forward() | Sequential or Functional |
| **Training** | Manual loop | model.fit() |
| **Evaluation** | Manual with no_grad | model.evaluate() |
| **Prediction** | model(x) | model.predict(x) |
| **Saving** | torch.save(state_dict) | model.save() |

### What's Next?

- [Module 14: Final Project - Deep Learning Pipeline](14_final_project_dl_pipeline.ipynb)
- Advanced PyTorch: DataParallel, DistributedDataParallel, TorchScript
- Framework-specific features: PyTorch Lightning, tf.data, etc.

### Additional Resources:

1. PyTorch Documentation: https://pytorch.org/docs/
2. PyTorch Tutorials: https://pytorch.org/tutorials/
3. "Deep Learning with PyTorch" (official book): https://pytorch.org/deep-learning-with-pytorch
4. PyTorch Forums: https://discuss.pytorch.org/
5. Papers With Code: https://paperswithcode.com/ (most use PyTorch)