# Optimizers and types:

### 1️⃣ WHAT is an Optimizer?

**An optimizer is an algorithm that updates the model’s parameters (weights & biases) so that the model’s error (loss) is minimized.**
Think of training as a journey down a hill:
- The height of the hill = the loss (error)
- The position on the hill = the model’s weights
- The optimizer = the strategy you use to find the bottom (minimum error)

### 2️⃣ WHY do we need Optimizers?

**Because deep learning models have millions of weights, and we can’t manually adjust them.**

We need a smart algorithm that can:

- Learn which direction to move (increase or decrease weights)

- Decide how big each step should be (learning rate)

- Avoid overshooting the minimum

- Handle complex “mountain landscapes” (local minima, saddle points)

### Momentum Optimizer

**What: Adds a fraction of the previous update to the current one — helps build speed in the right direction.**

Why: Prevents oscillations and speeds up convergence.
Analogy:
Like pushing a heavy ball downhill — it may start slow, but momentum keeps it rolling past small bumps.


### RMSProp (Root Mean Square Propagation)

**What: Adjusts learning rate for each weight based on how frequently it updates — slows down learning for frequent updates, speeds it up for rare ones.**
**Why: Solves vanishing/exploding gradient issues (common in RNNs).**
**How: Keeps a moving average of squared gradients.**

Analogy:
Imagine you’re walking downhill — if one direction keeps changing steeply, you take smaller steps there; if another is steady, you take longer steps.

Example:
Used heavily in RNNs and deep sequence models.

### Adam (Adaptive Moment Estimation)

**What: Combines Momentum + RMSProp**
It keeps track of:

- The average of gradients (like momentum)

- The average of squared gradients (like RMSProp)

**Why: Fastest and most widely used — adapts learning rate individually for each parameter.**

Analogy:
You’re a smart hiker — you remember the last few steps (momentum) and how rough the path was (RMSProp) to choose the best next move.

Example:
Used in almost all modern deep learning models — CNNs, RNNs, Transformers, GANs, BERT, GPT, etc.

When: Default optimizer for most deep learning tasks.