<a href="https://colab.research.google.com/github/Wiickz/MAT421/blob/main/HW6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Section 3.2: Continuity and Differentiation

The optimization algorithm discussed in this module (steepest descent) makes heavy use of the gradient, a form of differentiation applied to vectors. In order for steepest descent to be used, the function must be differentiable, and in order for a function to be differentiable, it must be continuous (meaning there are no sudden changes in value or slope along the function):

In [None]:
import numpy as np
import matplotlib.pyplot as plt

continuous = lambda x: x**2
discontinuous = lambda x: np.abs(2*x - 5)

x = np.linspace(-6, 10, 1000)

plt.plot(x, continuous(x), label='Continuous')
plt.plot(x, discontinuous(x), label='Discontinuous')
plt.legend()
plt.show()

A better definition of a continuous function is defined in the notes, where each point in the function has an existing limit that approaches the same value when taken from both sides.

### Derivatives

There is a reason that continuity is defined in this way: the derivative of a function is defined using the limit. By taking the slope between a point and an infinitely shrinking offset for every point in the function, a derivative function can be generated:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

base = lambda x: x**2 - 4*x + 4

def derivative(f, x, h):
    return (f(x+h) - f(x)) / h

x = np.linspace(-6, 10, 1000)
h = 100

plt.plot(x, base(x), label='Base')

for i in range(1, 4):
    plt.plot(x, derivative(base, x, h/(10**i)), label=f'Derivative Approximation {i}')

true_df = lambda x: 2*x - 4
plt.plot(x, true_df(x), label='True Derivative') # approximation 3 (h = 0.1) is already very close, thus the true derivative overlaps it - comment this out to see the difference

plt.legend()
plt.show()

The derivative can also be used in multivariable functions like those often found in linear algebra settings. When attempting to observe the slope of a three-dimensional functions, the gradient is most helpful, as it creates a map of two-dimensional vectors that can be overlaid atop the function (from top-down) to show the direction of greatest increase in slope at various points in the function:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d

x = np.outer(np.linspace(-3, 3, 30), np.ones(30))
y = x.copy().T
z = -x**3 + y**2

fig = plt.figure()

ax = plt.axes(projection='3d')
ax.plot_surface(x, y, z, cmap='viridis')
plt.show()

gradient = lambda x, y: (-3*x**2, 2*y)

x = np.linspace(-3, 3, 30)
y = np.linspace(-3, 3, 30)
X, Y = np.meshgrid(x, y)
U, V = gradient(X, Y)

plt.quiver(X, Y, U, V)
plt.show()

## Section 3.2.3: Taylor's Theorem

The Taylor series of a function is an approximation of that function using a polynomial of variable degree. This representation is accomplished by taking increasing derivatives of a function divided by a corresponding increasing factorial value and summing the derivatives together:

In [None]:
import numpy as np
import math
import matplotlib.pyplot as plt

f = lambda x: np.sin(x)
h = 0.01 # good enough for a decent approximation of the derivative
center = 0 # center (a) of Taylor approximation

def derivative(f, x, h, n):
    if n == 0:
        return f(x)
    else:
        return (derivative(f, x+h, h, n-1) - derivative(f, x, h, n-1)) / h

def taylor(f, x, a, n):
    return sum(derivative(f, a, h, i) * (x-a)**i / math.factorial(i) for i in range(n+1))

x = np.linspace(-2*np.pi + center, 2*np.pi + center, 1000)

fig = plt.figure()
plt.ylim(-3, 3)
plt.xlim(-2*np.pi + center, 2*np.pi + center)

plt.plot(x, f(x), label='sin(x)')
for i in range(1, 9, 2):
    plt.plot(x, taylor(f, x, center, i), label=f'Taylor approximation (degree {i})')

plt.legend()
plt.show()

As seen above, as the degree of the Taylor approximation increases, the range in which the polynomial represents the true function accurately increases. This particular Taylor series is centered at a=0 and is also known as the Maclaurin series; Taylor series approximations at other points can be shown by changing `center` in the code above to a different value.

## Section 3.3.3: Optimization by Gradient Descent

There are various ways to optimize a continuous function to find a potential minimization; one such method is gradient descent, where a local minimum of a function is determined by following its gradient until a still point (the local minimum) is reached. This is also known as steepest descent, and a method to compute local minima can be implemented using Scipy and Numpy:

In [None]:
### Excerpt from own work in MAT 423: Numerical Analysis I
### Method of steepest descent to find a local minimum of a function

import numpy as np
import scipy as sp

def steepest_descent_2(f, grad, x0, N=100, tol=1e-5):
    '''Steepest descent method for a function of two variables'''

    x = x0

    for k in range(N):
        u = grad(x[0], x[1])
        if np.linalg.norm(u) < tol:
            break
        g = lambda t: f(x[0] - t*u[0], x[1] - t*u[1])
        t = sp.optimize.minimize_scalar(g, bounds=(0, 1), method='bounded').x
        x = x - t*u

    return x, k

if __name__ == '__main__':
    f = lambda x1, x2: np.sin(x1) + np.cos(x2)
    grad = lambda x1, x2: np.array([np.cos(x1), -np.sin(x2)])

    x0 = np.array([-1, 1]) # guess

    x, n = steepest_descent_2(f, grad, x0)

    print('Steepest Descent (k = %d)' % n)
    print('x1         x2')
    print('%.8f %.8f' % (x[0], x[1]))

As a bonus, the code here can also be adjusted to find a local maximum: by negating the output of `f` within the declaration of `g` and adding `t*u` instead of subtracting (`g = lambda t: -f(x[0] + t*u[0], x[1] + t*u[1])`, `x = x + t*u`), the algorithm will instead perform steepest ascent optimization.