# RMSprop

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Mitchell-Mirano/sorix/blob/develop/docs/learn/optimizers/03-RMSprop.ipynb)
[![Open in GitHub](https://img.shields.io/badge/Open%20in-GitHub-black?logo=github)](https://github.com/Mitchell-Mirano/sorix/blob/develop/docs/learn/optimizers/03-RMSprop.ipynb)
[![Open in Docs](https://img.shields.io/badge/Open%20in-Docs-blue?logo=readthedocs)](http://127.0.0.1:8000/sorix/learn/optimizers/03-RMSprop)


**RMSprop** adaptively adjusts the learning rate for each parameter. It divides the learning rate by an exponentially decaying average of squared gradients, which prevents the learning rate from vanishing too quickly.

## Mathematical definition

The update rules for RMSprop are:

$$
v_t = \rho \cdot v_{t-1} + (1 - \rho) \cdot (\nabla \mathcal{L}(\theta_t))^2
$$
$$
\theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t} + \epsilon} \cdot \nabla \mathcal{L}(\theta_t)
$$

where:
- $v_t$: Moving average of the squared gradients at time $t$.
- $\rho$: Decay rate (often 0.9).
- $\epsilon$: Small constant for numerical stability.
- $\eta$: Learning rate ($lr$).

## Implementation details

Sorix's `RMSprop` stores the historical gradients in the `vts` **list**. This adaptive method is particularly useful for recurrent neural networks and handling non-stationary objectives.


In [1]:
# Uncomment the next line and run this cell to install sorix
#!pip install 'sorix @ git+https://github.com/Mitchell-Mirano/sorix.git@develop'

In [2]:
import numpy as np
from sorix import tensor
from sorix.optim import RMSprop
import sorix

In [3]:
# Minimize an anisotropic function: f(x, y) = x^2 + 10*y^2
# RMSprop normalizes the update using the moving average of squared gradients,
# effectively equalizing the step sizes across parameters with different gradient magnitudes.
x = tensor([5.0], requires_grad=True)
y = tensor([5.0], requires_grad=True)
optimizer = RMSprop([x, y], lr=0.1)

for epoch in range(10):
    loss = x * x + tensor([10.0]) * y * y
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    print(f"Epoch {epoch+1}: x = {x.data[0]:.4f}, y = {y.data[0]:.4f}, loss = {loss.data[0]:.4f}")


Epoch 1: x = 4.6838, y = 4.6838, loss = 275.0000
Epoch 2: x = 4.4616, y = 4.4616, loss = 241.3149
Epoch 3: x = 4.2793, y = 4.2793, loss = 218.9631
Epoch 4: x = 4.1201, y = 4.1201, loss = 201.4354
Epoch 5: x = 3.9762, y = 3.9762, loss = 186.7233
Epoch 6: x = 3.8433, y = 3.8433, loss = 173.9078
Epoch 7: x = 3.7189, y = 3.7189, loss = 162.4813
Epoch 8: x = 3.6011, y = 3.6011, loss = 152.1306
Epoch 9: x = 3.4887, y = 3.4887, loss = 142.6466
Epoch 10: x = 3.3808, y = 3.3808, loss = 133.8827
