# Nesterov Accelerated Gradient (NAG)

Nesterov Accelerated Gradient is a refined version of momentum-based optimization, designed to further smooth and speed up convergence. It does so by looking ahead at the future gradient and adjusting accordingly, which helps reduce overshooting and improves stability.

In simple terms, NAG calculates the gradient at a **“look-ahead”** position rather than the current position. This look-ahead step is what makes NAG more effective at navigating tricky error surfaces, especially in non-convex optimization problems.

![NAG](./img/nesterov.jpeg)

To understand NAG, let’s explore how it differs from simple momentum:
- Momentum Update: Traditional momentum calculates an update based on the current gradient and a weighted sum of past gradients, creating a “velocity” term that guides the update direction.
- NAG Update: Instead of calculating the gradient at the current point, NAG takes a look-ahead step using the momentum term. It then computes the gradient at this new look-ahead position and adjusts the update based on this gradient. This approach makes it easier to predict when the optimizer is likely to overshoot, allowing for more controlled updates.

Let's see the update rule for NAG:

$$
v_t = \gamma v_{t-1} + \eta \nabla_{\theta} J(\theta_{t - 1} - \gamma v_{t-1}) \\
\theta_{t} = \theta_{t - 1} - v_t
$$

And its look easy

NAG can take advantage of the momentum by:
- **Reduced Oscillations**: By peeking ahead, NAG reduces oscillations around the minimum, which can be especially useful in error surfaces with multiple peaks and valleys.
- **Faster Convergence**: NAG often converges faster than simple momentum-based methods because it reduces unnecessary steps around the optimum.

Here's the example of NAG compared to Momentum:

***Momentum:***

![Momentum](./img/LR_momentum_contours.gif)

***NAG:***

![NAG](./img/LR_nag_contours.gif)