In [0]:
import numpy as np
import matplotlib.pyplot as plt

# Derivatives, Partial Derivatives, and Gradients

We are making our way towards automatic differentiation or "autodiff". But before we get there, let's explore some related but distinct concepts.

## Finite differences

Derivatives can be approximated numerically using a method of numerical differentiation known as finite differences. The one-sided definition looks much like the difference quotient:

$$
\frac{\partial f(\mathbf{x})}{\partial x_i} = \frac{\partial f(x_1, x_2, \ldots, x_N)}{x_i} = \frac{f(x_1, \ldots, x_i + h, \ldots, x_N) - f(x_1, \ldots, x_i, \ldots, x_N) }{h}
$$

where we have perturbed the input of interest by a tiny amount, $h$.

There is a more numerically stable two-sided version as well:

$$
 \frac{\partial f(x_1, x_2, \ldots, x_N)}{\partial x_i} = \frac{f(x_1, \ldots, x_i + h, \ldots, x_N) - f(x_1, \ldots, x_i - h, \ldots, x_N) }{2h}
$$.

Let's apply the method of finite differences to the softplus function, which shows up in neural networks. It has a convenient analytical derivative.

In [0]:
# softplus function
def f(x):
  return np.log(1 + np.exp(x))

# derivative of softplus is the sigmoid
def fprime(x):
  return 1 / (1 + np.exp(-x))

### Exercise

Show (on paper) that the derivative of the softplus is the sigmoid.

In [0]:
x = np.linspace(-5, 5, 20)
y = f(x) 
plt.plot(x, y)
plt.title('Softplus')
plt.xlabel('x')
plt.ylabel('y')

Let's use an off-the-shelf implementation of finite differences. We can adjust the method ('central', 'forward', 'backward') and the amount of perturbation, $h$.

In [0]:
#Source: https://www.math.ubc.ca/~pwalls/math-python/differentiation/
def derivative(f, a, method='central', h=0.01):
    '''Compute the difference formula for f'(a) with step size h.

    Parameters
    ----------
    f : function
        Vectorized function of one variable
    a : number
        Compute derivative at x = a
    method : string
        Difference formula: 'forward', 'backward' or 'central'
    h : number
        Step size in difference formula

    Returns
    -------
    float
        Difference formula:
            central: f(a+h) - f(a-h))/2h
            forward: f(a+h) - f(a))/h
            backward: f(a) - f(a-h))/h            
    '''
    if method == 'central':
        return (f(a + h) - f(a - h)) / (2 * h)
    elif method == 'forward':
        return (f(a + h) - f(a)) / h
    elif method == 'backward':
        return (f(a) - f(a - h)) / h
    else:
        raise ValueError("Method must be 'central', 'forward' or 'backward'.")

We see that finite differences, even using the default step size, is pretty accurate. Note that when checking our analytical gradients by hand in practice, typically we use a much smaller stepsize, e.g. $1 \times 10^{-8}$ or so.

In [0]:
print(fprime(0))
print(derivative(f, 0, method='forward'))
print(derivative(f, 0, method='central'))

Let's plot the true derivative, and the finite difference estimates at several points along the domain of $x$.

In [0]:
plt.plot(x, y, label='softplus')
plt.plot(x, fprime(x), label='true derivative')
plt.plot(x, derivative(f, x), 'g.', label='central difference')
plt.legend()

### Exercise

Write a replacement function for `derivative` that works on functions of multiple variables. Plot the central difference estimate for a function of two variables, for each partial derivative.

## Symbolic Differentiation

Symbolic differentiation libraries can take a mathematical expression for a function and return a mathematical expression for the derivative. It is convenient for simple expression but can become unwieldy for more complex expressions.

The SymPy library provides symbolic differentiation functionality.

In [0]:
import sympy
from sympy import init_printing

# Get pretty printing to work correctly in Colab: https://stackoverflow.com/a/52959734
def custom_latex_printer(exp,**options):
    from google.colab.output._publish import javascript
    url = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=default"
    javascript(url=url)
    return sympy.printing.latex(exp,**options)
init_printing(use_latex="mathjax",latex_printer=custom_latex_printer)

In [0]:
x = sympy.symbols('x')
f = sympy.log(1 + sympy.exp(x))
sympy.diff(f)

Notice that SymPy didn't simplify the result into the more common expression for the sigmoid $1/\left(1 + \exp(-x)\right)$. That's one of the downsides of this technique.

SymPy can take derivatives of functions of multiple variables:

In [0]:
x1, x2 = sympy.symbols('x1, x2')
f = x1**3 + 2 * x2**2
display(sympy.diff(f, x1))
display(sympy.diff(f, x2))

Sympy can also take derivatives with respect to many variables at once. Just pass each derivative in order:

In [0]:
display(sympy.diff(f, x1, x1))  # d^2f/dx1^2
display(sympy.diff(f, x1, x2))  # d^2f/(dx1 dx2)

### Exercise

Use SymPy to evaluate $\frac{df}{dt}$ for $f(x_1, x_2) = x_1^2 + 2x_2$, where $x_1 = \sin t$ and $x_2 = \cos t$. Check your answer with Example 5.8 in [Mathematics for Machine Learning](https://mml-book.github.io/).