# RMSprop Optimizer Tutorial: Intuition and Implementation in Python

## Introduction

__Root-mean squared propagation__ (RMSprop) is a powerful optimization algorithm used in machine learning to find the model parameters that correspond to the best fit between actual values and model predictions. The algorithm is widely used in deep learning in combination with backpropagation during neural network training.

In this tutorial, you will learn:
- The intuition behind RMSprop optimizer
- How to use RMSprop in PyTorch
- How to implement it in pure NumPy for deeper understanding
- Its differences from other optimization algorithms such as SGD and Adam

## What is RMSprop? The Short Answer

RMSprop (Root Mean Square Propagation) is an adaptive learning rate optimization algorithm designed to accelerate the convergence of gradient descent. Key features of RMSprop include:

1. Adaptive learning rates: It adjusts the learning rate for each parameter based on the historical gradient information.

2. Moving average of squared gradients: RMSprop maintains a moving average of squared gradients for each parameter, which helps to normalize the gradient updates.

3. Momentum-like behavior: By using the moving average, RMSprop achieves a momentum-like effect without explicitly incorporating momentum terms.

4. Improved performance on non-stationary problems: RMSprop is particularly effective for optimizing non-convex loss functions and handling non-stationary objectives.

5. Hyperparameter sensitivity reduction: Compared to standard SGD, RMSprop reduces the need for manual tuning of the learning rate hyperparameter.

In essence, RMSprop addresses the diminishing learning rates problem of AdaGrad while providing adaptive per-parameter learning rates, making it a popular choice for training deep neural networks.

## The Intuition Behind RMSprop Optimizer

It helps to think of optimization as finding the lowest point in a hilly terrain while being blindfolded. Since you are limited to your touch, you can find which way is down by only feeling the ground immediately around you. 



## Using RMSprop in PyTorch and TensorFlow

In practice, you rarely have to implement RMSprop manually. Since it is a widely used algorithm, it is available in popular frameworks such as PyTorch and Tensorflow. 

### RMSprop in PyTorch

In PyTorch, the algorithm is implemented under the `optim` module:

In [17]:
import torch

torch.optim.RMSprop

torch.optim.rmsprop.RMSprop

Here is how you can use it to optimize (find the minimum) of any function `f(x)`:

In [18]:
import torch.optim as optim

# Define the function: f(x) = x - log(x)
def f(x):
    return x / torch.log(x)

# Create a tensor with requires_grad=True
x = torch.tensor([364.0], requires_grad=True)

# Create an RMSprop optimizer
optimizer = optim.RMSprop([x], lr=0.3)

# Optimization loop
for i in range(1500):
    # Forward pass: compute the loss
    loss = f(x)
    
    # Backward pass: compute the gradients
    loss.backward()
    
    # Update the parameter
    optimizer.step()
    
    # Zero the gradients
    optimizer.zero_grad()
    
    if i % 100 == 0:
        print(f'Iteration {i}: x = {x.item():.4f}')

print(f'Final result: x = {x.item():.4f}')

Iteration 0: x = 361.0000
Iteration 100: x = 302.5840
Iteration 200: x = 268.0156
Iteration 300: x = 236.2579
Iteration 400: x = 205.2928
Iteration 500: x = 174.5349
Iteration 600: x = 143.7643
Iteration 700: x = 112.8619
Iteration 800: x = 81.7036
Iteration 900: x = 50.0471
Iteration 1000: x = 17.0591
Iteration 1100: x = 2.7183
Iteration 1200: x = 2.7183
Iteration 1300: x = 2.7183
Iteration 1400: x = 2.7183
Final result: x = 2.7183


In this case, we are optimizing `x - log(x)` function which has a minimum at _e_. When we are initializing the `RMSprop` class, we are giving it an arbitrary start value, 364. After about 1100 iterations, the minimum is correctly found.

To use `RMSprop` in supervised learning problems, you refer to our [Introduction to PyTorch course](https://www.datacamp.com/courses/introduction-to-deep-learning-with-pytorch).

### RMSprop in Tensorflow and Keras

Let's see how to optimize the same function with RMSprop in Tensorflow and Keras:

In [24]:
import tensorflow as tf

# Define the function: f(x) = x / log(x)
def f(x):
    return x / tf.math.log(x)

# Create a variable
x = tf.Variable([364.0])

# Create an RMSprop optimizer
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.3)

# Optimization loop
for i in range(1500):
    with tf.GradientTape() as tape:
        loss = f(x)
    
    # Compute gradients
    gradients = tape.gradient(loss, [x])
    
    # Apply gradients
    optimizer.apply_gradients(zip(gradients, [x]))
    
    if i % 100 == 0:
        print(f'Iteration {i}: x = {x.numpy()[0]:.4f}')

print(f'Final result: x = {x.numpy()[0]:.4f}')

Iteration 0: x = 363.0513
Iteration 100: x = 330.9281
Iteration 200: x = 300.8935
Iteration 300: x = 270.8544
Iteration 400: x = 240.8105
Iteration 500: x = 210.7600
Iteration 600: x = 180.7007
Iteration 700: x = 150.6295
Iteration 800: x = 120.5406
Iteration 900: x = 90.4234
Iteration 1000: x = 60.2539
Iteration 1100: x = 29.9579
Iteration 1200: x = 2.7183
Iteration 1300: x = 2.7183
Iteration 1400: x = 2.7183
Final result: x = 2.7183


This time, the optimizer class is located at `tf.keras.optimizers` module. As you can see, the algorithm correctly converged at _e_ just like PyTorch but took slightly more iterations.

## Implementing RMSprop in Python Step-by-Step

## RMSprop vs. SGD vs. Adam

## Conclusion