# 🔹 Optimizers in Deep Learning – Overview & Types

## 1. Introduction

Optimizers are algorithms used to **update the weights** of a neural network to **minimize the cost function**.  
They determine **how** the model learns during training by adjusting weights based on gradients.

---

## 2. Why Optimizers are Important?

- ✅ Control the **speed** of convergence.  
- ✅ Prevent getting stuck in **local minima**.  
- ✅ Handle challenges like **vanishing/exploding gradients**.  
- ✅ Improve **generalization** and **model performance**.

---

## 3. How Optimizers Work?

- Optimizers use **gradient information** from **Backpropagation** to update weights:
  $$
  w := w - \eta \frac{\partial J}{\partial w}
  $$
- Where:
  - \( w \) → weight
  - \( \eta \) → learning rate
  - \( \frac{\partial J}{\partial w} \) → gradient of cost function

---

## 4. Types of Optimizers (Names Only)

### ✅ **A. First-Order Optimizers (Gradient-Based)**
1. **Gradient Descent (GD)**
2. **Stochastic Gradient Descent (SGD)**
3. **Mini-Batch Gradient Descent**

---

### ✅ **B. Gradient Descent Variants**
4. **SGD with Momentum**
5. **Nesterov Accelerated Gradient (NAG)**

---

### ✅ **C. Adaptive Learning Rate Optimizers**
6. **AdaGrad** (Adaptive Gradient)
7. **RMSProp** (Root Mean Square Propagation)
8. **Adam** (Adaptive Moment Estimation)
9. **AdaMax** (Variant of Adam)
10. **Nadam** (Nesterov + Adam)

---

### ✅ **D. Advanced/Other Optimizers**
11. **AMSGrad** (Improved Adam variant)
12. **Lion Optimizer** (Latest efficient optimizer for transformers)
13. **L-BFGS** (Quasi-Newton method, used in smaller networks)

---

## ✅ Conclusion
- Optimizers are crucial for effective training.
- **Adam** is the most widely used in deep learning.
- Different optimizers are chosen based on **task, dataset size, and network architecture**.
