# How to get the $\nabla^2$operator in tensorflow

In tensorflow, the default 2nd derivative operator will contract over the hessian matrix in an undesirable way.  For example:

In [6]:
import tensorflow  as tf

In [25]:
def generate_inputs(nwalkers, n_particles, dimension):
    x = tf.random.uniform(shape=[nwalkers, n_particles, dimension])
    return x

This function generates walkers in the same format as metropolis walkers, with random values

In [26]:
inputs = generate_inputs(4,1,2)

The function below creates a scalar value: f(x, y) = $\alpha x^2 + \beta y^2 + \gamma x y$

In [32]:
def wavefunction(inputs, alpha=0.1, beta=0.2, gamma=0.5):
    # return x^2 + y^2 + xy, with constants:
    ret = alpha * inputs[:,:,0]**2
    ret += beta * inputs[:,:,1]**2
    ret += gamma * inputs[:,:,0]*inputs[:,:,1]
    
    return tf.squeeze(ret)


In [33]:
output = wavefunction(inputs)

In [34]:
output.shape

TensorShape([4])

Here is the forward pass, telling tensorflow to watch the inputs since that's what we want to differentiate with respect to:

In [96]:
inputs = generate_inputs(4,1,2)

with tf.GradientTape(persistent=True) as outer_tape:
    outer_tape.watch(inputs)
    with tf.GradientTape(persistent=True) as inner_tape:
        inner_tape.watch(inputs)
        outputs = wavefunction(inputs)
    # Compute the first derivative with respect to the inputs
    dw_dx = inner_tape.gradient(outputs, inputs)

Note that we have to compute the first derivative, above, within the block of the outer_tape to compute a second derivative

This should have the same shape as the inputs, and the values are analytically computable (and thus checkable):

In [97]:
assert dw_dx.shape == inputs.shape

In [98]:
def analytic_derivative(inputs, alpha=0.1, beta=0.2, gamma=0.5):
    _x = inputs[:,:,0]
    _y = inputs[:,:,1]
    
    x = 2*alpha*_x + gamma*_y
    y = 2*beta*_y  + gamma*_x
    
    return tf.stack([x,y], axis=2)
    

In [99]:
dw_dx - analytic_derivative(inputs)

<tf.Tensor: shape=(4, 1, 2), dtype=float32, numpy=
array([[[0., 0.]],

       [[0., 0.]],

       [[0., 0.]],

       [[0., 0.]]], dtype=float32)>

The challenge, then, is the 2nd derivative.  We know what it *should* be if we're computing $\nabla^2$, but this is not what tensorflow computes:

In [100]:
def nabla_squared(inputs, alpha=0.1, beta=0.2, gamma=0.5):
    _x = tf.constant(2 * alpha, shape=inputs[:,:,0].shape)
    _y = tf.constant(2*beta, shape=inputs[:,:,1].shape)
    
    return tf.stack([_x, _y], axis=2)

In [101]:
nabla_squared(inputs)

<tf.Tensor: shape=(4, 1, 2), dtype=float32, numpy=
array([[[0.2, 0.4]],

       [[0.2, 0.4]],

       [[0.2, 0.4]],

       [[0.2, 0.4]]], dtype=float32)>

In [102]:
d2w_dx2 = outer_tape.gradient(dw_dx, inputs)

In [103]:
print(d2w_dx2)

tf.Tensor(
[[[0.7 0.9]]

 [[0.7 0.9]]

 [[0.7 0.9]]

 [[0.7 0.9]]], shape=(4, 1, 2), dtype=float32)


If you look closely, each value here is off by $\gamma$ - tensorflow is contracting the hessian to compute this second derivative!

In [104]:
outer_tape.gradient(dw_dx, inputs[:,:,0])

In [105]:
outer_tape.gradient(dw_dx, inputs)

<tf.Tensor: shape=(4, 1, 2), dtype=float32, numpy=
array([[[0.7, 0.9]],

       [[0.7, 0.9]],

       [[0.7, 0.9]],

       [[0.7, 0.9]]], dtype=float32)>

In [106]:
dw_dx - analytic_derivative(inputs)

<tf.Tensor: shape=(4, 1, 2), dtype=float32, numpy=
array([[[0., 0.]],

       [[0., 0.]],

       [[0., 0.]],

       [[0., 0.]]], dtype=float32)>

In [119]:
d2wdx2 = outer_tape.jacobian(dw_dx, inputs)
print(d2wdx2)

tf.Tensor(
[[[[[[0.2 0.5]]

    [[0.  0. ]]

    [[0.  0. ]]

    [[0.  0. ]]]


   [[[0.5 0.4]]

    [[0.  0. ]]

    [[0.  0. ]]

    [[0.  0. ]]]]]




 [[[[[0.  0. ]]

    [[0.2 0.5]]

    [[0.  0. ]]

    [[0.  0. ]]]


   [[[0.  0. ]]

    [[0.5 0.4]]

    [[0.  0. ]]

    [[0.  0. ]]]]]




 [[[[[0.  0. ]]

    [[0.  0. ]]

    [[0.2 0.5]]

    [[0.  0. ]]]


   [[[0.  0. ]]

    [[0.  0. ]]

    [[0.5 0.4]]

    [[0.  0. ]]]]]




 [[[[[0.  0. ]]

    [[0.  0. ]]

    [[0.  0. ]]

    [[0.2 0.5]]]


   [[[0.  0. ]]

    [[0.  0. ]]

    [[0.  0. ]]

    [[0.5 0.4]]]]]], shape=(4, 1, 2, 4, 1, 2), dtype=float32)


The jacobian comes out with a dimension that is much too big.  We can contract it with "einsum":

In [110]:
tf.einsum("wpdwpd->wpd",d2wdx2)

<tf.Tensor: shape=(4, 1, 2), dtype=float32, numpy=
array([[[0.2, 0.4]],

       [[0.2, 0.4]],

       [[0.2, 0.4]],

       [[0.2, 0.4]]], dtype=float32)>

And now, that is the correct value for $\nabla^2$ of this function!

How long does this take to run?

In [114]:
%timeit d2wdx2 = tf.einsum("wpdwpd->wpd", outer_tape.jacobian(dw_dx, inputs))

312 ms ± 6.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


This is an equivalent implementation:

In [113]:
tf.einsum("wpdpd->wpd",outer_tape.batch_jacobian(dw_dx, inputs))

<tf.Tensor: shape=(4, 1, 2), dtype=float32, numpy=
array([[[0.2, 0.4]],

       [[0.2, 0.4]],

       [[0.2, 0.4]],

       [[0.2, 0.4]]], dtype=float32)>

How long does this take?

In [116]:
%timeit tf.einsum("wpdpd->wpd",outer_tape.batch_jacobian(dw_dx, inputs))

339 ms ± 5.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Let's also time the incorrect gradient:

In [117]:
%timeit outer_tape.gradient(dw_dx, inputs)

1.31 ms ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


As you can see, the CORRECT gradient is a good bit slower.  But, speed means  nothing if it's incorrect!