# SGDMomentum

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Mitchell-Mirano/sorix/blob/feature/docs_learn/docs/learn/optimizers/02-SGDMomentum.ipynb)
[![Open in GitHub](https://img.shields.io/badge/Open%20in-GitHub-black?logo=github)](https://github.com/Mitchell-Mirano/sorix/blob/feature/docs_learn/docs/learn/optimizers/02-SGDMomentum.ipynb)
[![Open in Docs](https://img.shields.io/badge/Open%20in-Docs-blue?logo=readthedocs)](http://127.0.0.1:8000/sorix/learn/optimizers/02-SGDMomentum)


**SGD with Momentum** is an enhancement over standard SGD that helps it navigate the landscape of high-curvature regions by incorporating information from previous gradients. This reduces oscillations and speeds up the optimization process.

## Mathematical definition

Let $\theta$ represent the parameters and $\nabla \mathcal{L}(\theta_t)$ the gradient at time $t$. SGDMomentum maintains a velocity vector $v_t$:

$$
v_{t} = \mu \cdot v_{t-1} + \nabla \mathcal{L}(\theta_t)
$$
$$
\theta_{t+1} = \theta_t - \eta \cdot v_t
$$

where:
- $v_t$: Accumulated velocity at time $t$.
- $\mu$: Momentum coefficient (typically 0.9).
- $\eta$: Learning rate ($lr$).

## Implementation details

In Sorix, the `SGDMomentum` optimizer keeps track of the velocity vectors in a **list** (`vts`). These vectors are stored on the same device as the parameters, ensuring consistency across CPU and GPU setups.


In [1]:
# Uncomment the next line and run this cell to install sorix
#!pip install 'sorix @ git+https://github.com/Mitchell-Mirano/sorix.git@feature/docs_learn/docs_learn/docs_learn/docs_learn'

In [2]:
import numpy as np
from sorix import tensor
from sorix.optim import SGDMomentum
import sorix

In [3]:
# Same minimizing problem as SGD example: f(x, y) = x^2 + 10*y^2
# Notice how momentum accelerates the convergence despite the flat landscape in x
x = tensor([5.0], requires_grad=True)
y = tensor([5.0], requires_grad=True)
optimizer = SGDMomentum([x, y], lr=0.01, momentum=0.9)

for epoch in range(10):
    loss = x * x + tensor([10.0]) * y * y
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    print(f"Epoch {epoch+1}: x = {x.data[0]:.4f}, y = {y.data[0]:.4f}, loss = {loss.data[0]:.4f}")


Epoch 1: x = 4.9000, y = 4.0000, loss = 275.0000
Epoch 2: x = 4.7120, y = 2.3000, loss = 184.0100
Epoch 3: x = 4.4486, y = 0.3100, loss = 75.1030
Epoch 4: x = 4.1225, y = -1.5430, loss = 20.7507
Epoch 5: x = 3.7466, y = -2.9021, loss = 40.8034
Epoch 6: x = 3.3333, y = -3.5449, loss = 98.2587
Epoch 7: x = 2.8947, y = -3.4144, loss = 136.7721
Epoch 8: x = 2.4421, y = -2.6141, loss = 124.9600
Epoch 9: x = 1.9859, y = -1.3710, loss = 74.2980
Epoch 10: x = 1.5356, y = 0.0220, loss = 22.7398
