# RMSProp

While AdaGrad adapts the learning rate based on the gradient accumulation for each parameter, it tends to reduce the learning rate too drastically over time, causing the model to stop learning effectively. RMSProp (Root Mean Square Propagation) is a modification of AdaGrad that tries to resolve this issue by changing the way the learning rate is adapted.

RMSProp **keeps track** of an exponentially decaying aaverage of squared gradients. Instead of accumulating all past squared gradients as AdaGrad does, RMSProp uses a moving average. This means that RMSProp gives more weight to recent gradients, allowing it to adapt more effectively without diminishing the learning rate too quickly.

You will be surprised to see how simple the RMSProp algorithm is. It is just a small modification to the AdaGrad algorithm combine with Momentum:

$$
v_t = \beta v_{t - 1} + (1 - \beta) \nabla_{\theta} J(\theta_{t - 1})^2
$$

Where:
- $v_t$ is the moving average of squared gradients at time step $t$
- $\beta$ is the decay rate, typically set to 0.9

And the update rule is:

$$
\theta_{t} = \theta_{t - 1} - \frac{\eta}{\sqrt{v_t} + \epsilon} \cdot \nabla_{\theta} J(\theta_{t - 1})
$$

Crazy right? Just a small modification to AdaGrad and we have a new optimizer.

This optimization can be valued in many scenarios, especially when you are training deep neural networks. Because:
- Efficient with Non-Convex Problems: RMSProp performs well with complex, non-convex optimization problems common in deep learning, unlike AdaGrad, which works best in convex settings.
- Stable Learning Rate: RMSProp prevents the learning rate from diminishing too quickly, allowing the optimizer to move steadily towards the minimum.
- Empirically Proven Performance: RMSProp is widely used in practice due to its robustness and stability across a variety of neural network architectures.

Due to all of that advantage, but it's not the best; which will lead us to Adam, a very popular optimizer in the deep learning community.