# PyTorch Optimizers Guide

This notebook covers common PyTorch optimizers: SGD and Adam. We'll demonstrate how to create optimizers for neural network training.

Cross-Entropy Loss

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

Stochastic Gradient Descent

Define a Simple Model

We'll create a simple neural network to demonstrate optimizers.

In [2]:
# Define a simple neural network model
model = nn.Sequential(
    nn.Linear(10, 64),
    nn.ReLU(),
    nn.Linear(64, 32),
    nn.ReLU(),
    nn.Linear(32, 2)
)

print("Model:")
print(model)
print("\nTotal parameters:", sum(p.numel() for p in model.parameters()))

Model:
Sequential(
  (0): Linear(in_features=10, out_features=64, bias=True)
  (1): ReLU()
  (2): Linear(in_features=64, out_features=32, bias=True)
  (3): ReLU()
  (4): Linear(in_features=32, out_features=2, bias=True)
)

Total parameters: 2850


Stochastic Gradient Descent (SGD)

SGD is a foundational optimization algorithm. The momentum parameter smooths out updates and can help training converge faster.

In [3]:
# momentum=0.9 smoothes out updates and can help training
optimizer_sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
print("SGD Optimizer:")
print(optimizer_sgd)

SGD Optimizer:
SGD (
Parameter Group 0
    dampening: 0
    differentiable: False
    foreach: None
    fused: None
    lr: 0.01
    maximize: False
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)


Adam

Adam Optimizer

Adam combines the benefits of adaptive learning rates with momentum. It typically requires less tuning than SGD.

In [4]:
optimizer_adam = optim.Adam(model.parameters(), lr=0.01)
print("Adam Optimizer:")
print(optimizer_adam)

Adam Optimizer:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.01
    maximize: False
    weight_decay: 0
)
