# How to calculate second derivatives

In this guide, we show you how to compute second derivatives with optimagic, while
introducing some core concepts.

In [None]:
import numpy as np
import optimagic as om
import pandas as pd

## Introduction

Instead of the sphere function, let's now look at an ellipse $$f(x) = x^\top W x,$$
with a weighting matrix $W$.

The second derivative of $f$ is given by $f''(x) = W + W^\top$. With numerical
derivatives, we have to specify the value of $x$ at which we want to compute the
derivative. Note that in this case the second derivative should be independent of the
value of $x$.

In [None]:
def fun_scalar(params):
    weight = 1
    return weight * params**2

Let's first consider two **scalar** points $x = 0$ and $x=1$. Since the second
derivative here is constant, we have $f''(0) = f''(1) = 2$.

To compute the derivative using optimagic, we simply pass the function `fun_scalar`
and `params` to the function `second_derivative`:

In [None]:
sd = om.second_derivative(func=fun_scalar, params=0)
sd.derivative

In [None]:
sd = om.second_derivative(func=fun_scalar, params=1)
sd.derivative

Notice that the output of `second_derivative` is an object containing the derivative
under the attribute `derivative`. We discuss the ouput in more detail below.

## Hessian and Batch-Hessian

The scalar case from above extends directly to the multivariate case. Let's consider two
cases: 

|         |                                     |
|:--------|:------------------------------------|
|Hessian | $f_1: \mathbb{R}^N \to \mathbb{R}$  |
|Batch-Hessian | $f_2: \mathbb{R}^N \to \mathbb{R}^M$|


The second derivative of $f_1$ is usually referred to as the Hessian, while the second
derivative of $f_2$ is usually called a Batch-Hessian.

### Hessian

Let's again use the ellipse function, but this time with a vector input. The hessian is
a 2-dimensional object of shape (N, N).

In [None]:
def fun_vector(params):
    weight = np.arange(len(params) ** 2).reshape(len(params), len(params))
    return params @ weight @ params

In [None]:
sd = om.second_derivative(fun_vector, params=np.arange(4))
sd.derivative.round(2)

### Batch-Hessian

As an example let's now use the function
$$f(x) = (x^\top x) \begin{pmatrix}1\\2\\3 \end{pmatrix},$$
with $f: \mathbb{R}^N \to \mathbb{R}^3$. The Batch-Hessian is now a 3-dimensional object
of shape (M, N, N), where M is the output dimension.

In [None]:
def fun_multivariate(params):
    weight = np.arange(len(params) ** 2).reshape(len(params), len(params))
    return (params @ weight @ params) * np.arange(3)

In [None]:
sd = om.second_derivative(fun_multivariate, params=np.arange(4))
sd.derivative.round(2)

## The output of `second_derivative`

The output of `second_derivative` has the following attributes:

- `derivative`: The computed numerical derivative.
- `func_value`: The function value at the params vector, if `return_func_value` is True.

In [None]:
sd = om.second_derivative(fun_scalar, params=0, return_func_value=True)

In [None]:
sd.derivative

In [None]:
assert sd.func_value == fun_scalar(0)

## The ``params`` argument

Above we used a ``numpy.ndarray`` as the ``params`` argument. In optimagic, params can be arbitrary [pytrees](https://jax.readthedocs.io/en/latest/pytrees.html). Examples are (nested) dictionaries of numbers, arrays and pandas objects. Lets look at a few cases.

### pandas

In [None]:
params = pd.DataFrame(
    [["time_pref", "delta", 0.9], ["time_pref", "beta", 0.6], ["price", "price", 2]],
    columns=["category", "name", "value"],
).set_index(["category", "name"])

params

In [None]:
def fun_pandas(params):
    weight = np.arange(len(params) ** 2).reshape(len(params), len(params))
    return params["value"] @ weight @ params["value"]

In [None]:
sd = om.second_derivative(fun_pandas, params)
sd.derivative

### nested dicts

In [None]:
params = {"a": 0, "b": 1, "c": pd.Series([2, 3, 4])}

params

In [None]:
def fun_dict(params):
    return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()

In [None]:
sd = om.second_derivative(
    func=fun_dict,
    params=params,
)

sd["derivative"]

### Description of the output

> Note. Understanding the output of the first and second derivative requires terminolgy
> of pytrees. Please refer to the
> [JAX documentation of pytrees](https://jax.readthedocs.io/en/latest/pytrees.html).

The output tree is a product of the params tree with itself. This is equivalent to the
numpy case, where the hessian is a matrix of shape `(len(params), len(params))`. If,
however, the params tree contains non-scalar entries like `numpy.ndarray`'s,
`pandas.Series`', or `pandas.DataFrame`'s, the output is not expanded but a block is
created instead. In the above example, the entry `params["c"]` is a 3-dimensional
`pandas.Series`. Thus, the second derivative output contains the corresponding 3x3-block
of the hessian at the position `["c"]["c"]`:

In [None]:
sd.derivative["c"]["c"].round(3)

## Parallelization

Function evaluations can be run in parallel by  by setting the `n_cores` argument. For
example, if we wish to evaluate the function on `2` cores, we simply write

In [None]:
sd = om.second_derivative(fun_scalar, params=0, n_cores=2)