# Gradient Descent

Gradient Descent is an optimization method used to find a local minimum for a differentiable function $f:X\subseteq\mathbb{R}^{n} \rightarrow Y\subseteq\mathbb{R}$. The key idea is that the gradient of a function points in the direction of steepest ascent and thus by moving in the direction opposite to the gradient $-\nabla f(\boldsymbol x)$ the value of $f$ decreases the fastest.

For an initial guess of the vector $\boldsymbol{a}_{0}$ that minimizes $f$ and using an aggresion parameter $\gamma\in\mathbb{R}$ that specifies the "step size", gradient descent is a method defined by the iterative rule:
$$\boldsymbol{a}_{n+1} = \boldsymbol{a}_{n} - \gamma\nabla f\left(\boldsymbol{a}_{n}\right)$$

In [1]:
import numpy as np

In [2]:
def gradientDescent(delF, a0, gamma, n):
    a = a0
    for i in range(n):
        a = a - gamma*delF(a)
    return a

For gradient descent to work effectively $\gamma$ must be "sufficiently" small. However if its too small it will take too long to converge, if its too big, well you might skip the minima. One way to determine an effective aggression parameter is to use the inverse of the Hessian matrix. This way, the iterative rule becomes:
$$\boldsymbol{a}_{n+1} = \boldsymbol{a}_{n} - \mathcal{H}^{-1}\left(\boldsymbol{a}_{n}\right)\nabla f\left(\boldsymbol{a}_{n}\right),$$

where $\mathcal{H}$ is the Hessian matrix: $\mathcal{H}_{ij}(\boldsymbol{x}) = \partial_{x_{i}x_{j}}^2 f$.

However, the Hessian might change the direction of the gradient and thus find a maximum instead of a minimum.

In [4]:
def hessianDescent(delF, a0, H, n):
    a = a0
    for i in range(n):
        a = a - np.linalg.inv(H(a)) @ delF(a)
    return a

To get the best of both worlds we can use a method which tries the Hessian unless the step $- \mathcal{H}^{-1}\left(\boldsymbol{a}_{n}\right)\nabla f\left(\boldsymbol{a}_{n}\right)$ would be too big in magnitude or if it would point backwards, in which case it goes back to using steepest descent.


In [5]:
def hybridDescent(delF, a0, H, n) :
    gamma = 0.2
    a = a0
    step = lambda a : -linalg.inv(H(a)) @ delF(a)
    for i in range(n):
        if step @ - delF(a) <= 0 or linalg.norm(step) > 2 :
            step = -gamma * delF(a)
        a = a + step
    return a