# Bilinear scalar deep nets

Test the Hessian expression for a scalar deep net

$$z_\ell = f_\ell(z_{\ell-1}; x_\ell) = z_{\ell-1} \cdot x_\ell.$$

For such a model,

$$z_\ell = z_0 \prod_{i=1}^\ell x_i.$$

In [1]:
import numpy as np

num_layers = 5
xs = np.random.randn(num_layers)
z0 = np.random.rand()

The following identities hold as a result:

$$\begin{align}
b_\ell &\equiv \frac{\partial z_L}{\partial z_\ell} = \frac{\partial}{\partial z_\ell} z_\ell \prod_{i=\ell+1}^L x_i = \prod_{i=\ell+1}^L x_i. \\
\nabla_x f_\ell &= z_{\ell-1} \\
\nabla_z f_\ell &= x_\ell \\
\nabla_{xx} f_\ell &= 0 \\
\nabla_{xz} f_\ell &= 1 \\
\nabla_{zx} f_\ell &= 1 \\
\nabla_{zz} f_\ell &= 0
\end{align}$$

Compute the gradient and the Hessian the generic way:

$$
\frac{\partial z_L}{\partial x} =  e_L M^{-1} D.
$$
and
$$
H \equiv \frac{\partial^2 z_L}{\partial x^2} 
= D_D \left(D_{xx} + D_{zx} PM^{-1} D_x\right) + D_x^\top M^{-T}P^\top D_M \left(D_{xz}+D_{zz}P M^{-1}D_x\right)
$$

In [2]:
P = np.diag(np.ones(num_layers - 1), k=-1)

backwards = np.hstack([np.cumprod(xs[:0:-1])[::-1], 1])
D_D = np.diag(backwards)
D_M = D_D

M = np.eye(num_layers) - np.diag(xs[1:], k=-1)
Minv = np.linalg.inv(M)

zs = z0 * np.cumprod(xs)
D_x = np.diag(np.hstack([z0, zs[:-1]]))

D_xx = np.zeros((num_layers, num_layers))
D_xz = np.eye(num_layers)
D_zx = D_xz
D_zz = np.zeros((num_layers, num_layers))

grad_est = (Minv @ D_x)[-1]

H_est = D_D @ (D_xx + D_zx @ P @ Minv @ D_x) + D_x.T @ Minv.T @ P.T @ D_M @ (
    D_xz + D_zz @ P @ Minv @ D_x
)

For this simple model, we can also compute the gradient and the Hessian in a much simpler way:
$$\begin{align}
\frac{\partial z_L}{\partial x_\ell} = z_0 \prod_{i\neq \ell} x_i = z_L / x_\ell \\
\frac{\partial^2 z_L}{\partial x_\ell x_k} = \frac{z_L}{x_\ell x_k} \delta(\ell - k)
\end{align}$$

Compare the two ways to compute the derivatives:

In [3]:
assert np.allclose(backwards, Minv[-1])

In [4]:
zL = z0 * np.prod(xs)
grad_actual = zL / xs
assert np.allclose(grad_actual, grad_est)


In [5]:
H_actual = zL / xs[:, None] / xs[None, :]
H_actual[np.diag_indices(num_layers)] = 0
assert np.allclose(H_actual, H_actual.T), "Simple Hessian is not symmetric."
assert np.allclose(H_est, H_actual), "Simple and complicated Hessian mismatch."

# Quadratic scalar deep nets

Test the Hessian expression for a deep net of the form

$$z_\ell = f_\ell(z_{\ell-1}; x_\ell) = z_{\ell-1} \cdot x_\ell^2.$$

For such a model,

$$z_\ell = z_0 \prod_{i=1}^\ell x_i^2.$$ 

The following identities hold as a result:

$$\begin{align}
b_\ell &\equiv \frac{\partial z_L}{\partial z_\ell} = \frac{\partial}{\partial z_\ell} z_\ell \prod_{i=\ell+1}^L x_i^2 = \prod_{i=\ell+1}^L x_i^2. \\
\nabla_x f_\ell &= 2 z_{\ell-1} \cdot x_\ell \\
\nabla_z f_\ell &= x_\ell^2 \\
\nabla_{xx} f_\ell &= 2 z_{\ell-1} \\
\nabla_{xz} f_\ell &= 2 x_\ell \\
\nabla_{zx} f_\ell &= 2 x_\ell \\
\nabla_{zz} f_\ell &= 0
\end{align}$$

In [6]:
P = np.diag(np.ones(num_layers - 1), k=-1)

backwards = np.hstack([np.cumprod(xs[:0:-1] ** 2)[::-1], 1])
D_D = np.diag(backwards)
D_M = D_D

M = np.eye(num_layers) - np.diag(xs[1:] ** 2, k=-1)
Minv = np.linalg.inv(M)

zs = z0 * np.cumprod(xs**2)
D_x = np.diag(2 * np.hstack([z0, zs[:-1]]) * xs)

D_xx = np.diag(2 * np.hstack([z0, zs[:-1]]))
D_xz = np.diag(2 * xs)
D_zx = D_xz
D_zz = np.zeros((num_layers, num_layers))

grad_est = (Minv @ D_x)[-1]

H_est = D_D @ (D_xx + D_zx @ P @ Minv @ D_x) + D_x.T @ Minv.T @ P.T @ D_M @ (
    D_xz + D_zz @ P @ Minv @ D_x
)

For this simple model, we can also compute the gradient and the Hessian in a much simpler way:
$$\begin{align}
\frac{\partial z_L}{\partial x_\ell} = z_0 \left(\prod_{i\neq \ell} x_i^2\right)2x_\ell = 2 z_L / x_\ell \\
\frac{\partial^2 z_L}{\partial x_\ell x_k} = \begin{cases} 
   \frac{4z_L}{x_\ell x_k}, \quad \ell \neq k \\
   \frac{4z_L}{x_\ell^2}-\frac{2z_L}{x_\ell^2} = \frac{2z_L}{x_\ell^2}, \quad \ell = k
 \end{cases}
\end{align}$$

Compare the two ways to compute the derivatives:

In [7]:
zL = z0 * np.prod(xs**2)
grad_actual = 2 * zL / xs
assert np.allclose(grad_actual, grad_est)

In [8]:
H_actual = 4 * zL / xs[:, None] / xs[None, :]
H_actual[np.diag_indices(num_layers)] /= 2
assert np.allclose(H_actual, H_actual.T), "Simple Hessian is not symmetric."

In [9]:
assert np.allclose(H_est, H_actual), "Simple and complicated Hessian mismatch."