In [1]:
import numpy as np
import scipy.linalg as la
import matplotlib.pyplot as plt
import scipy.optimize as opt

> The Jacobian

The Jacobian of a multi-valued function $f:\mathbb{R}^{m}\to\mathbb{R}^{n}$ is the matrix $J_f$, where

$$\begin{equation} J_f[i,j] = \frac{\partial{f}_i}{\partial{x_j}} \end{equation},$$

Previously, we only provided the function. If you are able to provide the Jacobian as well, then you can typically solve problems with fewer function evaluations because you have better information about how to decrease the function value. If you don't provide the Jacobian, many solvers will try to approximate the Jacobian using finite difference approximations, but these are typically less accurate (and slower) than if you can give an explicit formula


> The Hessian

The Hessian of a multivariate function $f:\mathbb{R}^{m}\to\mathbb{R}^{n}$ is a
 matrix of second derivatives: \begin{equation} H[i,j] = \frac{\partial^2 f}{\partial x_i \partial x_j} \end{equation}

The Hessian provides information about the curvature of the function
, which can be used to accelerate convergence of the optimization algorithm. If you don’t provide the Hessian, many solvers will numerically approximate it, which will typically not work as well as an explicit Hessian.

> Example 1: Consider $f(x)=\cos(x_1)+\sin(x_2)$

In [2]:
# function f
def f(x):
    return np.cos(x[0]) + np.sin(x[1])

# Jacobian of function f
def Jf(x):
    return np.array([-np.sin(x[0]), np.cos(x[1])])

# Hessian of function f
def Hf(x):
    return np.array([[-np.cos(x[0]), 0], [0, -np.sin(x[1])]])

In [3]:
# Only the function f

x0 = np.random.rand(2)

%time sol1 = opt.minimize(f, x0)
sol1

CPU times: user 9.36 ms, sys: 0 ns, total: 9.36 ms
Wall time: 19.2 ms


  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: -1.9999999999996474
        x: [ 1.571e+01 -7.854e+00]
      nit: 8
      jac: [-7.451e-08 -8.196e-07]
 hess_inv: [[ 9.993e-01  2.874e-04]
            [ 2.874e-04  1.000e+00]]
     nfev: 72
     njev: 24

In [4]:
# Using the function f and Jf

x0 = np.random.rand(2)

%time sol2 = opt.minimize(f, x0, jac=Jf)
sol2

CPU times: user 2.76 ms, sys: 0 ns, total: 2.76 ms
Wall time: 7.46 ms


  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: -1.99999999999897
        x: [ 3.142e+00 -1.571e+00]
      nit: 10
      jac: [ 1.384e-06 -3.815e-07]
 hess_inv: [[ 1.021e+00 -5.837e-03]
            [-5.837e-03  1.002e+00]]
     nfev: 13
     njev: 13

In [5]:
# Using the function f, Jf and Hf
x0 = np.random.rand(2)

%time sol3 = opt.minimize(f, x0, jac=Jf, hess=Hf)
sol3

CPU times: user 4.48 ms, sys: 0 ns, total: 4.48 ms
Wall time: 9.42 ms


  warn('Method %s does not use Hessian information (hess).' % method,


  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: -1.9999999999999956
        x: [ 3.142e+00 -1.571e+00]
      nit: 9
      jac: [ 9.275e-08  4.045e-09]
 hess_inv: [[ 1.021e+00  9.685e-04]
            [ 9.685e-04  1.000e+00]]
     nfev: 19
     njev: 19

We see that the defualt method BFGS does not use the Hessian. Let’s choose one that does, like Newton-CG.

In [6]:
%time sol4 = opt.minimize(f, x0, jac=Jf, hess=Hf, method='Newton-CG')
sol4

CPU times: user 2.23 ms, sys: 0 ns, total: 2.23 ms
Wall time: 3.36 ms


 message: Optimization terminated successfully.
 success: True
  status: 0
     fun: -2.0
       x: [ 3.142e+00 -1.571e+00]
     nit: 7
     jac: [ 1.925e-06 -1.889e-06]
    nfev: 11
    njev: 11
    nhev: 7