<a href="https://colab.research.google.com/github/ShubhamP1028/DeepLearning/blob/main/LearningRates.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning Rates

Learning rate scheduling helps the model converge faster and avoid bad minima. Instead of keeping LR constant, we reduce it gradually (or cyclically) during training.

<b><u>what is Learning Rate?</u></b>
*  The learning rate is like the “step size” your model takes when learning.
*  If it’s too big, the model jumps around and misses the best spot.
*  If it’s too small, the model takes forever to reach the best spot.

---
<b><u>Why change the learning rate?</u></b>

Think of climbing down a mountain:
*  At the start, you want big steps to quickly cover distance.
*  Near the bottom (the optimal point), you want small careful steps so you don’t overshoot.

Learning rate schedules do exactly this:
*  High LR at start → fast learning.
*  Low LR later → fine-tuning.

---
### <u>Different Types of Learning Rate Schedules</u>
<b>1. Step Decay</b>
*  Learning rate drops suddenly after a fixed number of epochs.
*  Example: Start at 0.1 → after 10 epochs, reduce to 0.01 → after 20 epochs, reduce to 0.001.
*  🔎 Simple but a bit “jumpy”.

<b>2. Exponential Decay</b>
*  Learning rate decreases smoothly over time, not suddenly.
*  Formula:
$$
\text{LR}(t) = LR_0 \times e^{-\text{decay_rate} \times t}
$$
*  Example: If LR₀ = 0.1 and decay_rate = 0.1 → it shrinks gradually with each epoch.

<b>3. Polynomial Decay</b>
*  Decreases the LR following a polynomial curve (like 1/t²).
*  More flexible, can control how fast or slow LR reduces.

<b>4. Cosine Annealing</b>
*  Learning rate goes down like a wave (cosine curve).
*  Starts high → slowly drops → almost zero.
*  Looks smooth and natural.
*  Sometimes resets (warm restarts) to encourage the model to explore new paths.

<b>5. Cyclical Learning Rates (CLR)</b>
*  Instead of only going down, LR goes up and down in cycles.
*  Idea: small cycles help the model escape local traps and find better solutions.
*  Example: oscillates between 0.001 and 0.01.

<b>6. Warm-up</b>
*  Start with a very small LR (so model doesn’t explode at the beginning).
*  Gradually increase it to a bigger value.
*  Often used with Transformers and very large networks.

---
<b>Summary in Simple Words</b>

*  Learning rate schedule = changing the learning speed over time.
*  High at start → learn fast.
*  Low at end → learn carefully.

Different strategies (step, exponential, cosine, cyclic, warm-up) are just different ways of controlling how the “speed” changes.

---
We’ll implement common learning rate schedulers in PyTorch on your dataset:

*  StepLR → decreases LR every fixed number of epochs.
*  ExponentialLR → multiplies LR by a decay factor at each step.
*  ReduceLROnPlateau → reduces LR when validation loss stops improving.
*  CosineAnnealingLR → smooth cosine decay, popular in deep learning.


