# PyTorch Deep Learning Tutorial

This notebook provides a comprehensive introduction to PyTorch, covering everything from basic tensor operations to building and training neural networks.

## What You'll Learn
1. **PyTorch Fundamentals** - Tensors, operations, and autograd
2. **Neural Networks** - Building models with nn.Module
3. **Training Loops** - Complete training pipeline
4. **CNNs** - Convolutional networks for image data
5. **RNNs** - Recurrent networks for sequences
6. **Advanced Topics** - Transfer learning, optimization

Let's start by importing the necessary libraries:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

## 1. PyTorch Fundamentals

### Tensors: The Building Blocks

Tensors are multi-dimensional arrays, similar to NumPy arrays but with additional capabilities for GPU computation and automatic differentiation.

In [None]:
# Creating tensors
print("=== Creating Tensors ===")

# From Python lists
x = torch.tensor([1, 2, 3, 4, 5])
print(f"From list: {x}")
print(f"Data type: {x.dtype}")

# Special tensors
zeros = torch.zeros(3, 4)
ones = torch.ones(2, 3)
random = torch.randn(2, 3)  # Normal distribution
uniform = torch.rand(2, 3)   # Uniform distribution [0, 1)

print(f"\nZeros tensor shape {zeros.shape}:")
print(zeros)
print(f"\nRandom tensor:")
print(random)

# From NumPy arrays
numpy_array = np.array([1, 2, 3, 4, 5])
tensor_from_numpy = torch.from_numpy(numpy_array)
print(f"\nFrom NumPy: {tensor_from_numpy}")

In [None]:
# Tensor operations
print("=== Tensor Operations ===")

a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

print(f"a = {a}")
print(f"b = {b}")

# Element-wise operations
print(f"\nElement-wise operations:")
print(f"a + b = {a + b}")
print(f"a * b = {a * b}")
print(f"a ** 2 = {a ** 2}")

# Reduction operations
print(f"\nReduction operations:")
print(f"Sum: {torch.sum(a)}")
print(f"Mean: {torch.mean(a)}")
print(f"Dot product: {torch.dot(a, b)}")

# Matrix operations
A = torch.randn(3, 4)
B = torch.randn(4, 2)
C = torch.mm(A, B)  # Matrix multiplication

print(f"\nMatrix multiplication:")
print(f"A shape: {A.shape}, B shape: {B.shape}")
print(f"C = A @ B, shape: {C.shape}")

### Automatic Differentiation (Autograd)

PyTorch's autograd system automatically computes gradients for backpropagation. This is the foundation of neural network training.

In [None]:
print("=== Automatic Differentiation ===")

# Enable gradient computation
x = torch.tensor([2.0], requires_grad=True)
print(f"x = {x.item()}, requires_grad = {x.requires_grad}")

# Define a function: y = x² + 3x + 1
y = x**2 + 3*x + 1
print(f"y = x² + 3x + 1 = {y.item()}")

# Compute gradient dy/dx
y.backward()
print(f"dy/dx = 2x + 3 = {x.grad.item()}")
print(f"At x=2: dy/dx = 2(2) + 3 = {2*2 + 3}")

# Multiple variables
print("\n=== Multiple Variables ===")
x = torch.tensor([1.0], requires_grad=True)
y = torch.tensor([2.0], requires_grad=True)
z = x**2 + y**3 + x*y

print(f"z = x² + y³ + xy = {z.item()}")
z.backward()
print(f"∂z/∂x = 2x + y = {x.grad.item()}")
print(f"∂z/∂y = 3y² + x = {y.grad.item()}")

## 2. Building Neural Networks

### Creating a Simple Neural Network

All neural networks in PyTorch inherit from `nn.Module`. This provides the structure and functionality needed for training.

In [None]:
class SimpleNN(nn.Module):
    """Simple feedforward neural network"""
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        self.dropout = nn.Dropout(0.2)
        
    def forward(self, x):
        # Forward pass through the network
        x = F.relu(self.fc1(x))  # First layer + ReLU activation
        x = self.dropout(x)      # Dropout for regularization
        x = F.relu(self.fc2(x))  # Second layer + ReLU activation
        x = self.dropout(x)      # More dropout
        x = self.fc3(x)          # Output layer (no activation)
        return x

# Create a model instance
model = SimpleNN(input_size=10, hidden_size=64, output_size=3)
print("Model architecture:")
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

### Testing the Model

Let's test our model with some random input to make sure it works:

In [None]:
# Test the model with random input
test_input = torch.randn(5, 10)  # Batch of 5 samples, 10 features each
print(f"Input shape: {test_input.shape}")

# Forward pass
model.eval()  # Set to evaluation mode
with torch.no_grad():  # Disable gradient computation for inference
    output = model(test_input)
    
print(f"Output shape: {output.shape}")
print(f"Output:")
print(output)

# Apply softmax to get probabilities
probabilities = F.softmax(output, dim=1)
print(f"\nProbabilities (after softmax):")
print(probabilities)
print(f"\nSum of probabilities for each sample: {probabilities.sum(dim=1)}")

## 3. Training a Neural Network

### Creating Training Data

Let's create a classification dataset to train our model:

In [None]:
# Generate sample classification data
X, y = make_classification(n_samples=1000, n_features=10, n_informative=8, 
                          n_redundant=2, n_classes=3, random_state=42)

# Preprocessing
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training data shape: {X_train.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Number of classes: {len(np.unique(y))}")
print(f"Class distribution: {np.bincount(y)}")

# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.LongTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.LongTensor(y_test)

# Create data loaders for batch processing
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

### Complete Training Loop

Now let's implement a complete training loop with loss computation, backpropagation, and optimization:

In [None]:
# Initialize model, loss function, and optimizer
model = SimpleNN(input_size=10, hidden_size=64, output_size=3)
criterion = nn.CrossEntropyLoss()  # For multi-class classification
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training parameters
num_epochs = 100
train_losses = []
train_accuracies = []

print("Starting training...")
print(f"Model: {model.__class__.__name__}")
print(f"Loss function: {criterion.__class__.__name__}")
print(f"Optimizer: {optimizer.__class__.__name__}")
print(f"Learning rate: {optimizer.param_groups[0]['lr']}")
print("-" * 50)

# Training loop
model.train()  # Set model to training mode

for epoch in range(num_epochs):
    epoch_loss = 0
    correct_predictions = 0
    total_samples = 0
    
    for batch_X, batch_y in train_loader:
        # Forward pass
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        
        # Backward pass and optimization
        optimizer.zero_grad()  # Clear gradients from previous iteration
        loss.backward()        # Compute gradients
        optimizer.step()       # Update parameters
        
        # Track metrics
        epoch_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total_samples += batch_y.size(0)
        correct_predictions += (predicted == batch_y).sum().item()
    
    # Calculate average loss and accuracy for the epoch
    avg_loss = epoch_loss / len(train_loader)
    accuracy = correct_predictions / total_samples
    
    train_losses.append(avg_loss)
    train_accuracies.append(accuracy)
    
    # Print progress
    if epoch % 20 == 0 or epoch == num_epochs - 1:
        print(f"Epoch [{epoch:3d}/{num_epochs}] - Loss: {avg_loss:.4f}, Accuracy: {accuracy:.4f}")

print("Training completed!")

### Model Evaluation

Let's evaluate our trained model on the test set:

In [None]:
# Evaluate the model
model.eval()  # Set to evaluation mode
test_loss = 0
correct_predictions = 0
total_samples = 0
all_predictions = []
all_targets = []

with torch.no_grad():  # Disable gradient computation for efficiency
    for batch_X, batch_y in test_loader:
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        
        test_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total_samples += batch_y.size(0)
        correct_predictions += (predicted == batch_y).sum().item()
        
        all_predictions.extend(predicted.cpu().numpy())
        all_targets.extend(batch_y.cpu().numpy())

test_accuracy = correct_predictions / total_samples
avg_test_loss = test_loss / len(test_loader)

print(f"Test Results:")
print(f"Average Loss: {avg_test_loss:.4f}")
print(f"Accuracy: {test_accuracy:.4f} ({correct_predictions}/{total_samples})")

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report

cm = confusion_matrix(all_targets, all_predictions)
print(f"\nConfusion Matrix:")
print(cm)

print(f"\nClassification Report:")
print(classification_report(all_targets, all_predictions))

### Visualizing Training Progress

Let's plot the training curves to see how our model learned:

In [None]:
# Plot training curves
plt.figure(figsize=(15, 5))

# Loss curve
plt.subplot(1, 3, 1)
plt.plot(train_losses)
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid(True)

# Accuracy curve
plt.subplot(1, 3, 2)
plt.plot(train_accuracies)
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.grid(True)

# Confusion matrix
plt.subplot(1, 3, 3)
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.colorbar()
tick_marks = np.arange(len(np.unique(all_targets)))
plt.xticks(tick_marks, range(len(np.unique(all_targets))))
plt.yticks(tick_marks, range(len(np.unique(all_targets))))
plt.xlabel('Predicted Label')
plt.ylabel('True Label')

# Add text annotations to confusion matrix
thresh = cm.max() / 2.
for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        plt.text(j, i, format(cm[i, j], 'd'),
                ha="center", va="center",
                color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.show()

print(f"Final training accuracy: {train_accuracies[-1]:.4f}")
print(f"Test accuracy: {test_accuracy:.4f}")

## 4. Convolutional Neural Networks (CNNs)

CNNs are specialized for processing grid-like data such as images. Let's build a simple CNN:

In [None]:
class SimpleCNN(nn.Module):
    """Simple Convolutional Neural Network"""
    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        
        # Fully connected layers
        # Note: Input size calculation depends on image dimensions
        # For 28x28 images: after two 2x2 pooling operations -> 7x7
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, num_classes)
        self.dropout = nn.Dropout(0.5)
        
    def forward(self, x):
        # First convolutional block
        x = self.pool(F.relu(self.conv1(x)))  # 28x28 -> 14x14
        # Second convolutional block
        x = self.pool(F.relu(self.conv2(x)))  # 14x14 -> 7x7
        
        # Flatten for fully connected layers
        x = x.view(-1, 64 * 7 * 7)
        
        # Fully connected layers
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# Create CNN model
cnn_model = SimpleCNN(num_classes=10)
print("CNN Architecture:")
print(cnn_model)

# Test with synthetic image data
batch_size = 4
test_images = torch.randn(batch_size, 1, 28, 28)  # Grayscale 28x28 images
print(f"\nInput shape: {test_images.shape}")

cnn_model.eval()
with torch.no_grad():
    output = cnn_model(test_images)
    
print(f"Output shape: {output.shape}")
print(f"Output logits:")
print(output)

### Understanding CNN Layers

Let's examine what each layer in our CNN does:

In [None]:
# Analyze CNN layer outputs
test_image = torch.randn(1, 1, 28, 28)  # Single grayscale image
print(f"Input image shape: {test_image.shape}")

# Forward pass through each layer
with torch.no_grad():
    # First conv layer
    conv1_out = F.relu(cnn_model.conv1(test_image))
    print(f"After conv1 + ReLU: {conv1_out.shape}")
    
    # First pooling
    pool1_out = cnn_model.pool(conv1_out)
    print(f"After pool1: {pool1_out.shape}")
    
    # Second conv layer
    conv2_out = F.relu(cnn_model.conv2(pool1_out))
    print(f"After conv2 + ReLU: {conv2_out.shape}")
    
    # Second pooling
    pool2_out = cnn_model.pool(conv2_out)
    print(f"After pool2: {pool2_out.shape}")
    
    # Flatten
    flattened = pool2_out.view(-1, 64 * 7 * 7)
    print(f"After flattening: {flattened.shape}")

# Visualize some filters from the first conv layer
plt.figure(figsize=(12, 3))
filters = cnn_model.conv1.weight.data
print(f"\nFirst layer has {filters.shape[0]} filters of size {filters.shape[2]}x{filters.shape[3]}")

for i in range(8):  # Show first 8 filters
    plt.subplot(2, 4, i+1)
    plt.imshow(filters[i, 0], cmap='gray')
    plt.title(f'Filter {i+1}')
    plt.axis('off')

plt.suptitle('First Layer CNN Filters')
plt.tight_layout()
plt.show()

## 5. Recurrent Neural Networks (RNNs)

RNNs are designed for sequential data. Let's create a simple LSTM network:

In [None]:
class SimpleLSTM(nn.Module):
    """Simple LSTM for sequence classification"""
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(SimpleLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # LSTM layer
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, 
                           batch_first=True, dropout=0.2)
        
        # Fully connected layer
        self.fc = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)
        
        # Forward propagate LSTM
        lstm_out, _ = self.lstm(x, (h0, c0))
        
        # Use output from last time step
        output = self.fc(lstm_out[:, -1, :])
        return output

# Create LSTM model
lstm_model = SimpleLSTM(input_size=10, hidden_size=64, 
                       num_layers=2, num_classes=3)
print("LSTM Architecture:")
print(lstm_model)

# Test with synthetic sequence data
batch_size = 4
seq_length = 20
input_size = 10

test_sequences = torch.randn(batch_size, seq_length, input_size)
print(f"\nInput shape: {test_sequences.shape}")
print(f"(batch_size, sequence_length, input_features)")

lstm_model.eval()
with torch.no_grad():
    output = lstm_model(test_sequences)
    
print(f"Output shape: {output.shape}")
print(f"Output logits:")
print(output)

## 6. Advanced Topics

### Transfer Learning

Transfer learning allows us to use pre-trained models and adapt them for our specific tasks:

In [None]:
# Transfer learning example (requires torchvision)
try:
    import torchvision.models as models
    
    # Load pre-trained ResNet-18
    print("Loading pre-trained ResNet-18...")
    resnet = models.resnet18(pretrained=True)
    
    print(f"Original final layer: {resnet.fc}")
    
    # Freeze all parameters
    for param in resnet.parameters():
        param.requires_grad = False
    
    # Replace final layer for our number of classes
    num_classes = 5
    resnet.fc = nn.Linear(resnet.fc.in_features, num_classes)
    
    print(f"\nModified final layer: {resnet.fc}")
    print(f"Modified for {num_classes} classes")
    
    # Show which parameters will be updated
    params_to_update = []
    for name, param in resnet.named_parameters():
        if param.requires_grad:
            params_to_update.append(param)
            print(f"Parameter to update: {name}")
    
    print(f"\nTotal parameters to update: {len(params_to_update)}")
    print("Only the final classification layer will be trained!")
    
except ImportError:
    print("torchvision not available - transfer learning example skipped")

### Different Optimizers

PyTorch provides various optimization algorithms. Let's compare their behavior:

In [None]:
# Compare different optimizers on a simple optimization problem
def rosenbrock(x, y):
    """Rosenbrock function - a classic optimization test function"""
    return (1 - x)**2 + 100 * (y - x**2)**2

# Test different optimizers
optimizers_to_test = {
    'SGD': lambda params: optim.SGD(params, lr=0.001),
    'Adam': lambda params: optim.Adam(params, lr=0.01),
    'RMSprop': lambda params: optim.RMSprop(params, lr=0.01),
}

plt.figure(figsize=(15, 5))

for idx, (opt_name, opt_func) in enumerate(optimizers_to_test.items()):
    # Initialize parameters
    x = torch.tensor([-1.5], requires_grad=True)
    y = torch.tensor([2.0], requires_grad=True)
    
    # Create optimizer
    optimizer = opt_func([x, y])
    
    # Track optimization path
    loss_history = []
    
    # Optimization loop
    for i in range(1000):
        optimizer.zero_grad()
        loss = rosenbrock(x, y)
        loss.backward()
        optimizer.step()
        
        loss_history.append(loss.item())
    
    # Plot results
    plt.subplot(1, 3, idx + 1)
    plt.plot(loss_history)
    plt.title(f'{opt_name} Optimization')
    plt.xlabel('Iteration')
    plt.ylabel('Loss')
    plt.yscale('log')
    plt.grid(True)
    
    print(f"{opt_name:8} - Final loss: {loss_history[-1]:.6f}, x: {x.item():.3f}, y: {y.item():.3f}")

plt.tight_layout()
plt.show()

print("\nOptimal solution is at x=1, y=1 with loss=0")

### Model Saving and Loading

It's important to know how to save and load trained models:

In [None]:
# Model saving and loading
print("=== Model Saving and Loading ===")

# Create a simple model
original_model = SimpleNN(10, 64, 3)

# Method 1: Save only the state dictionary (recommended)
torch.save(original_model.state_dict(), 'model_state_dict.pth')
print("Model state dict saved as 'model_state_dict.pth'")

# Load the state dictionary
loaded_model = SimpleNN(10, 64, 3)  # Must create model with same architecture
loaded_model.load_state_dict(torch.load('model_state_dict.pth', weights_only=True))
loaded_model.eval()
print("Model state dict loaded successfully")

# Test that models are identical
test_input = torch.randn(1, 10)

original_model.eval()
with torch.no_grad():
    output1 = original_model(test_input)
    output2 = loaded_model(test_input)
    
print(f"\nOriginal model output: {output1}")
print(f"Loaded model output:   {output2}")
print(f"Models produce identical outputs: {torch.allclose(output1, output2)}")

# Method 2: Save complete model (less flexible)
torch.save(original_model, 'complete_model.pth')
loaded_complete = torch.load('complete_model.pth', weights_only=False)
print("\nComplete model saved and loaded")

# Save training checkpoint (including optimizer state)
optimizer = optim.Adam(original_model.parameters(), lr=0.001)
checkpoint = {
    'epoch': 100,
    'model_state_dict': original_model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': 0.05,
}
torch.save(checkpoint, 'checkpoint.pth')
print("Training checkpoint saved")

# Load checkpoint
checkpoint = torch.load('checkpoint.pth', weights_only=False)
model_from_checkpoint = SimpleNN(10, 64, 3)
optimizer_from_checkpoint = optim.Adam(model_from_checkpoint.parameters(), lr=0.001)

model_from_checkpoint.load_state_dict(checkpoint['model_state_dict'])
optimizer_from_checkpoint.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

print(f"Checkpoint loaded: epoch {epoch}, loss {loss}")
print("Ready to resume training!")

## Summary

In this notebook, we've covered:

1. **PyTorch Fundamentals**
   - Creating and manipulating tensors
   - Automatic differentiation with autograd

2. **Neural Networks**
   - Building models with nn.Module
   - Forward pass and parameter counting

3. **Training**
   - Complete training loop with loss, backpropagation, and optimization
   - Model evaluation and metrics
   - Visualizing training progress

4. **Advanced Architectures**
   - Convolutional Neural Networks for images
   - Recurrent Neural Networks for sequences

5. **Advanced Topics**
   - Transfer learning with pre-trained models
   - Comparing different optimizers
   - Model saving and loading

### Next Steps

- **Experiment**: Try modifying the architectures and hyperparameters
- **Real Data**: Apply these techniques to real datasets (MNIST, CIFAR-10, etc.)
- **Advanced Topics**: Explore GANs, Transformers, and other advanced architectures
- **Production**: Learn about model deployment and optimization

### Key Mathematical Concepts Covered

- **Gradient Descent**: θ = θ - α∇J(θ)
- **Chain Rule**: Essential for backpropagation
- **Convolution**: Feature extraction in images
- **LSTM Gates**: Managing information flow in sequences

Happy deep learning with PyTorch! 🔥🧠