# RMS Porp 

RMSprop (Root Mean Square Propagation) is a popular optimization algorithm used in deep learning to update the parameters of a neural network. It was introduced by Geoffrey Hinton in 2012 as an improvement over the standard stochastic gradient descent (SGD) optimizer.

The main idea behind RMSprop is to adjust the learning rate for each weight based on the average of the squared gradients for that weight. The algorithm keeps track of an exponential moving average of the squared gradients, which is then used to normalize the learning rate for each weight. This normalization prevents the learning rate from becoming too small or too large, which can slow down the training process or cause it to diverge.

RMSprop is particularly effective in dealing with sparse gradients, which are common in deep neural networks. It has become a popular choice for optimizing neural networks due to its ability to converge faster and produce better results than other optimization algorithms, especially in deep architectures.

Overall, RMSprop is a powerful optimization algorithm that can help accelerate the training of deep neural networks and improve their performance.
$$calculated_-step_-size = \frac {step_-size}{1e-8 + \sqrt {s}}$$
$$s_{t+1} = ((s_t r_{ho}) + (f^1(x^2_t (1 - r_{ho})))$$


In [None]:
import numpy as np 

Our motive is to create this function somehow $$calculated_-step_-size = \frac {step_-size}{1e-8 + \sqrt {s}}$$ But what are these values, We will slowly create these values and then implememnt them in this function, But first lets create a skeleton for this function 

In [None]:
calculated_step_size = step_sizes / (1e-8 + np.sqrt(s))

Lets initialize these values as random for test

In [None]:
step_sizes = 1
s = 1
calculated_step_size = step_sizes / (1e-8 + np.sqrt(s))

In [None]:
calculated_step_size

0.9999999900000002

Now we will try to build the values, first comes the `s`, so what is `s`. `s` is basically a value that stores the `sum of all the sqaured partial derivative gradients` we would have taken if the normal gradient worked on a particualr set of data 

So now we know we need to find the `sum of all the sqaured partial derivative gradients`, for this we first need to find `all squared partial derivative gradients`, or we can just run a for loop and append all the gradients in a list, for this we first need to find a way to get the `partial derivative of one gradient`, and for this we need to find a way how to get `gradient descent `

Lets consider the most basic dataset, that has one column/feature and one target, so the eqaution of a line that tries to predict the values will be $line = \beta_{1}x_1 + \beta_0$ there will be one value of, as we can see there are two hyperparameters, and so we need to predict two random values

In [None]:
np.random.randn(2)

array([ 1.24475101, -0.2307477 ])

These values are not consistent, and we know that for finding one graddient, we intialize some values randomly, it is a good practice to set the values, between $(-1 , 1)$ 

One way we can do this is to divide these values by $10$ at the time of intialization

In [None]:
np.random.randn(2) * 0.1

array([-0.02019788,  0.15916401])

And this is perfect

These can also be said to be weights and biases for the initial training process. 

In [None]:
params = np.random.randn(2) * 0.1

In [None]:
params 

array([-0.07061944,  0.10155017])

Now lets caculate the gradient of these values, gradient is just the derivatives of these functions, tese values are actually in the form of $x^2$ and there derivates will be $2x$, lets do these 

In [None]:
gradient = [params[0] * 2 , params[1] * 2]

In [None]:
gradient 

[-0.14123887758702727, 0.20310033563751598]

Now we need to add these values into a list and add them all, but wait, we only have $1$ of these values, Lets first think that we only have $1$ of these values, also we only need to add the weights not the biases. So at this point we dont need to do anything of adding. So now we have our `s`, now we need to compute the `step_size`. 

Step size is the random value from where we start, its the initializing value

In [None]:
step_size = np.random.rand(2) * 0.1

In [None]:
step_size

array([0.04709335, 0.08997532])

So now apply the fromula

In [None]:
step_size = step_size / (1e-8 + gradient)

TypeError: ignored

This error is because, $1e-8$ is an int and gradient is a list, but we need to do that, so lets just make this an array 

In [None]:
step_size = step_size / (1e-8 + np.array(gradient))

In [None]:
step_size

array([-0.33343056,  0.4430092 ])

Now our RMS Prop has been created , lets put it in a function for better usage 

In [None]:
def rms_prop():
    params = np.random.ran(2) * 0.1
    gradient = [params[0] * 2 , params[1] * 2]
    step_size = np.random.rand(2) * 0.1

    return step_size / (1e-8 + np.array(gradient))


But this function lacks some abilites, such as 
* What if we run this more than once ?
* What if user has a list of columns instead of one ?

# What if we run this more than once

If we run this more than once, we need to add another hyperparameter, `rho` for that, this is actualy a condition, which we had skipped 

In [None]:
def rms_prop(rho = 0.999 , lr = 0.01 , epochs = 100):
    params = np.random.ran(2) * 0.1
    gradient = [params[0] * 2 , params[1] * 2]
    step_size = np.random.rand(2) * 0.1
    for i in range(len(gradient)):
        gradient[i] = (gradient[i] * rho) + ((gradeint[i] ** 2) * lr)
        for i in range(epochs) : 
            step_size = step_size / (1e-8 + np.array(gradient))

# Whta if user has a list of columns instead of one 

Here we will implement one more formula $$s_{t+1} = ((s_t r_{ho}) + (f^1(x^2_t (1 - r_{ho})))$$

In [None]:
def rms_prop(columns , rho = 0.999 , lr = 0.01 , epochs = 100):
    params = np.random.ran(len(columns)) * 0.1
    gradient = [params[:len(columns) - 2] * 2 , params[-1] * 2]
    step_size = np.random.rand(len(columns)) * 0.1
    for i in range(len(gradient)):
        gradient[i] = (gradient[i] * rho) + ((gradient[i] ** 2) * lr)
        for i in range(epochs) : 
            step_size = step_size / (1e-8 + np.array(gradient[i]))

And now our function is created, lets beutify it a little bit 

In [None]:
def rms_prop(columns , rho = 0.999 , lr = 0.01 , epochs = 100):
    
    params = np.random.ran(len(columns)) * 0.1
    
    gradient = [params[:len(columns) - 2] * 2 , params[-1] * 2]
    
    step_size = np.random.rand(len(columns)) * 0.1
    
    for i in range(len(gradient)):
    
        gradient[i] = (gradient[i] * rho) + ((gradient[i] ** 2) * lr)
    
        for i in range(epochs) : 
    
            step_size = step_size / (1e-8 + np.array(gradient[i]))

    return step_size