# Understanding Optimizers in Machine Learning

In this notebook, we will learn about what optimizers are, specifically focusing on two popular types: **SGD (Stochastic Gradient Descent)** and **Adam**. We'll explore how they work and see a simple code example comparing both.

## What are Optimizers?

**Definition:** Algorithms that adjust model weights to minimize loss

**Analogy:** Like a smart GPS that finds the fastest route to your destination! 🗺️

- They use feedback from the loss function
- Gradually improve predictions
- Different strategies for different problems

## 🚶 Stochastic Gradient Descent (SGD)

**The Classic Approach:**

- Takes small steps in the direction of lower loss
- Simple and reliable
- Like walking downhill to find the bottom

**Pros:** Simple, works well
**Cons:** Can be slow, might get stuck

## 🚀 Adam Optimizer

**The Smart Choice:**

- Adapts step size automatically
- Remembers previous updates
- Like a smart car with adaptive cruise control

**Pros:** Fast, adaptive, generally better
**Cons:** More complex, uses more memory

## Real-World Example

**Training a Chatbot:**

- 🎯 Goal: Generate human-like responses
- 📊 Loss: Measures response quality
- ⚡ SGD: Slow but steady improvement
- 🚀 Adam: Faster adaptation to patterns

*Adam often gets better results faster! ⏰*

## Let's Compare Optimizers in Code!

Time to see SGD vs Adam in action
*We'll train a simple model and watch the difference*

In [None]:
import torch
import torch.nn as nn

model = nn.Linear(10, 1)  # Simple model

# SGD Optimizer
sgd_optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Adam Optimizer
adam_optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training loop example
loss_fn = nn.MSELoss()
for epoch in range(100):
    # Forward pass, calculate loss, update weights
    # SGD takes consistent steps
    # Adam adapts step size automatically
    pass


[🚀 Try in Colab](https://colab.research.google.com/github/Roopesht/codeexamples/blob/main/genai/python_easy/2/optimizers_comparison.ipynb)

## Optimizers Made Simple

- **SGD:** Like walking downhill with consistent steps 🚶
- **Adam:** Like having a smart guide that adjusts pace 🧭

- Both try to minimize loss
- Adam is usually faster and smarter
- SGD is simpler and more predictable

## Visualizing Optimizer Comparison

Watch SGD vs Adam navigate to the optimal solution!

## Different Angle: Learning to Drive

- **SGD Approach:** Consistent practice, same routine daily
- **Adam Approach:** Adapts to traffic, weather, and road conditions

- 🚗 SGD: Steady progress, might be slower
- 🏎️ Adam: Smart adjustments, usually faster learning

*Both get you there, but Adam is often the smarter choice! 🎯*

## Time to Think!

**Optimizers are like navigation systems for AI learning.**

**Question:** When learning a new skill like playing guitar, would you prefer consistent daily practice (SGD) or adapting your practice based on your progress (Adam)? 🎸