<a href="https://colab.research.google.com/github/Ak08032000/Gradient-Descent/blob/master/Momentum_Gradient_Descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is Momentum based Gradient Descent?
Momentum gradient descent is a variant of the standard gradient descent optimization algorithm used in machine learning. In momentum gradient descent, a "momentum" term is added to the weight updates, which helps to speed up convergence and reduce oscillations. The momentum term is a moving average of the past gradients, and its effect is to dampen the oscillations in the gradient descent updates and smooth out the weight updates over time. Specifically, the momentum term accelerates the gradient descent algorithm in the relevant direction and decelerates it in irrelevant directions. The update rule for the weights using momentum gradient descent includes the current gradient and a fraction of the previous velocity, and it can be mathematically expressed as:

v(t) = beta * v(t-1) + (1-beta) * gradient
weight(t) = weight(t-1) - learning_rate * v(t)

where v(t) is the velocity vector at time t, beta is a hyperparameter that controls the momentum effect, and learning_rate is the step size parameter.

In [None]:
import numpy as np

X = [0.5, 2.5]
Y = [0.2, 0.9]

In [None]:
def f(w,x,b):
  return 1.0 / 1.0 + np.exp(-(w*x + b))

In [None]:
def error(w,b):
  err = 0.0
  for x,y in zip(X,Y):
    fx = f(w,x,b)
    err += 0.5 * (fx - y)**2
  return err

In [None]:
def grad_w(w, b, x, y):
  fx = f(w, x, b)
  return (fx - y) * fx * (1 - fx) * x

def grad_b(w, b, x, y):
  fx = f(w, x,b)
  return (fx - y) * fx * (1 - fx)
  

In [None]:
def do_momentum_gradient_descent():
  w, b, eta, max_epochs = 0, 0, 1.0, 100
  prev_v_w, prev_v_b, gamma = 0, 0, 0.9
  for i in range(max_epochs):
    dw, db = 0, 0
    for x,y in zip(X,Y):
      dw += grad_w(w, b, x, y)
      db += grad_b(w, b, x, y)

    v_w = gamma * prev_v_w + eta* dw
    v_b = gamma * prev_v_b + eta* db
    w = w - v_w
    b = b - v_b
    prev_v_w = v_w
    prev_v_b = v_b
    print(w,b)

In [None]:
do_momentum_gradient_descent()

7.3 5.800000000000001
13.870031481408173 11.020062962801983
19.78305982104742 15.718119642067553
25.104785326725764 19.946370653412604
29.894338281836276 23.751796563623156
34.204935941435735 27.176679882812653
38.08447383507525 30.2590748700832
41.57605793935082 33.033230358626696
44.71848363319883 35.52997029831584
47.54666675766204 37.77703624403607
50.09203156967892 39.79939559518427
52.38285990049412 41.619519011217655
54.44460539822779 43.257630085647705
56.3001763461881 44.731930052634745
57.97019019935238 46.05880002292308
59.47320266720023 47.25298299618258
60.8259138882633 48.32774767211613
62.04335398722006 49.29503588045633
63.139050076281144 50.16559526796251
64.12517655643612 50.94909871671807
65.01269038857559 51.65425182059808
65.81145283750112 52.28888961409008
66.5303390415341 52.860063628232886
67.17733662516379 53.37412024096141
67.7596344504305 53.83677119241708
68.28370249317054 54.25315704872719
68.75536373163658 54.627904319406284
69.17985884625601 54.9651768630