### Visualizing the Gradient
https://rsokl.github.io/CogWeb/Math_Materials/Multivariable_Calculus.html#Visualizing-the-Gradient

The following code generates a surface plot of the function $f(x,y) = 2x^2 + xy$ and a quiver plot of ∇⃗ f(x, y) = [4x + yx]. Run the code in a Jupyter Notebook to see the plot. Change the line `Z = 2 * X ** 2 + X * Y` to plot some of the other functions we have discussed; note that you should use MyGrad functions (e.g. `mg.exp` and `mg.cos`) if needed.

Imagine that you are standing at some point on the surface; notice that the derivative of f, or the slope of the graph where you’re standing, is different depending on the direction you are facing. For example, at the point (0,0), if you are facing the +x direction the slope is nearly flat, whereas facing the +y direction there is a very large slope. The gradient of f, or ∇⃗ f, points in the direction of steepest ascent for the point it is evaluated at.



In [None]:
import numpy as np
import mygrad as mg
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import axes3d

%matplotlib inline

fig = plt.figure(figsize=(14, 10))
ax1 = fig.add_subplot(111, projection="3d")

_range = (-5, 5, 0.05)
X, Y = mg.arange(*_range), mg.arange(*_range).reshape(-1, 1)

###
Z = 2 * X ** 2 + X * Y
###

# compute the ∂Z/∂X and ∂Z/∂Y at the
# sampled points in X and Y
mg.sum(Z).backward()

_cmap = plt.get_cmap("GnBu")
plt.xlabel("x")
plt.ylabel("y")
ax1.set_title("Surface Plot of $f(x,y)=2x^2+xy$")

# get underlying numpy arrays for plotting
U = X.grad[::20]
V = Y.grad[::20]
X = X.data
Y = Y.data
Z = Z.data
zeros = np.full_like(Z[::20, ::20], Z.min() - 1e-2)

# reduce the sampling for the quiver plot so that arrows are distinuishable
ax1.quiver(X[::20], Y[::20], zeros, U, V, zeros, length=0.4, normalize=True)

surf = ax1.plot_surface(X, Y, Z, cmap=_cmap, alpha=0.75)
fig.colorbar(surf, ax=ax1, shrink=0.5, aspect=5)

### Minimizing a Function Using Gradient Descent
https://rsokl.github.io/CogWeb/Math_Materials/Multivariable_Calculus.html#Minimizing-a-Function-Using-Gradient-Descent

Write a program that performs gradient descent on the function $ f(x) = \frac{1}{2}x^2 + e^{−x}$. Your program should take a starting coordinate `x` and a number of iterations `n`. Try running your algorithm for a few hundred iterations to see if you end up near the minimum around `x = 0.56`. Experiment with δ to see if there is a value that is small enough to avoid overshooting the minimum but large enough to efficiently narrow in on it (avoiding an excessive number of iterations).

In [2]:
# perform gradient descent on our function given
# a starting value of x_start for n iterations

def grad_descent(x_start, n):
    # defining the gradient of our function
    def grad(x):
        return x - np.exp(-x) # df/dx @ the values in `x`

    delta = 0.1 # step size; experiment with this value
    x_old = x_start
    for _ in range(n):
        x_new = x_old - delta * grad(x_old)
        x_old = x_new
    return x_new

In [None]:
grad_descent(-1, 100)

In [None]:
grad_descent(-10, 400)

In [None]:
grad_descent(20, 400)

### Autodifferentiation with Multivariable Functions
https://rsokl.github.io/CogWeb/Math_Materials/Multivariable_Calculus.html#Autodifferentiation-with-Multivariable-Functions

Autodifferentiation libraries, like MyGrad, can be used to compute the partial derivatives of multivariable functions.



In [4]:
# using mygrad to compute the derivatives of a multivariable function

import mygrad as mg

x = mg.tensor(3)
y = mg.tensor(4)

f = 2 * (x ** 2) + x * y
f.backward()


In [5]:
# stores ∂f/∂x @ x=3, y=4
x.grad

#array(16.)


In [6]:
# stores ∂f/∂y @ x=3, y=4
y.grad

#array(3.)


### Programming Multivariable Gradient Descent

Write a program that performs gradient descent on the function $f(x,y) = 2x^2 + 4y^2 + e^{−3x}  + 3e^{−2y}$. Your program should take starting coordinates `x` and `y` and a number of iterations `n`. Try running your algorithm for a few hundred iterations to see if you end up near the minimum around `x=0.3026, y=0.3629`. Experiment with δ to see if there is a value that is small enough to avoid overshooting the minimum but large enough to efficiently narrow in on it (avoiding an excessive number of iterations).

Warning: Make sure that when you are updating the value of `Tensor`s, you perform the update to `Tensor.data` and not to the `Tensor` itself, to avoid back-propagating through the operation.



In [None]:
# perform gradient descent on our function
# given starting values of x_start and y_start for n iterations
def multi_grad_descent(x_start, y_start, n):

    # convert x and y to Tensors so that
    # we can compute their partial derivatives
    x = mg.tensor(x, dtype=np.float64)
    y = mg.tensor(y, dtype=np.float64)

    # defining our function; we use MyGrad operations
    # instead of NumPy so that we can compute derivatives
    # through these functions
    def f(x, y):
        return 2 * (x ** 2) + 4 * (y ** 2) + mg.exp(-3 * x) + 3 * mg.exp(-2 * y)


    # step size; experiment with this value
    delta = 0.1

    for _ in range(n):
        # calculating the gradient and updating the parameters
        z = f(x, y)
        z.backward()
        x.data -= delta * x.grad # x.grad stores ∂f/∂x @ current x and y value
        y.data -= delta * y.grad # y.grad stores ∂f/∂x @ current x and y value

    return x.item(), y.item()

In [None]:
multi_grad_descent(14, -53, 1)

In [None]:
multi_grad_descent(14, -53, 10)

In [None]:
multi_grad_descent(14, -53, 100)