# 02 â€¢ Neural Hamiltonian

Consider again the Hamiltonian of harmonic oscillator in one dimension:

$$H(q, p) = T(p) + V(q) = \frac{p^2}{2m} + \frac{k}{2} (q-q_0)^2$$

- $T(p)$ and $V(q)$ are the kinetic and potential energies,
- $q$ and $p$ are the generalized coordinate and momentum,
- $m$ is the mass, $k$ the spring constant, and $q_0$ the equilibrium position.

In notebook 01, we have used the Hamiltonian `HarmonicOscillator` to evaluate and simulate the above system. We are now using a `NeuralHamiltonian` to learn the above system from observations.

In [None]:
import matplotlib.pyplot as plt
import torch

from hamiltonian import HarmonicOscillator, NeuralHamiltonian

## Ground truth

The `HarmonicOscillator` acts as our ground truth, $H_{\text{exact}}$, for generating training data and evaluation of the trained models.

In [None]:
H_exact = HarmonicOscillator()

## Neural Hamiltonian

The `NeuralHamiltonian` inherits from our base `Hamiltonian` class, but instead of implementing kinetic and potential energy ($T$, $V$) analytically, it uses a neural network to model their sum ($H=T+V$). It is a generic Hamiltonian that can compute the equations of motion in canonical form, but it needs to be trained on external data to adapt to a specific behavior.

The training can be done in multiple ways. Here we are looking at two options, gradient matching and trajectory matching.

## Gradient matching

We train our neural Hamiltonian on the gradients

$$\frac{\text{d}q}{\text{d}t}(q, p) = \frac{\partial H}{\partial p}(q, p) \quad \text{and} \quad \frac{\text{d}p}{\text{d}t}(q, p) = -\frac{\partial H}{\partial q}(q, p).$$

In detail, we are sampling random generalized coordinates and momenta

$$(q_n, p_n), \quad n=1,\dots,N$$

and compute the true time derivatives according to the exact Hamiltonian $H_{\text{exact}}$:

$$\frac{\text{d}q_{\text{true}}}{\text{d}t}(q_n, p_n) = \frac{\partial H_{\text{exact}}}{\partial p}(q_n, p_n) \quad \text{and} \quad \frac{\text{d}p_{\text{true}}}{\text{d}t}(q_n, p_n) = -\frac{\partial H_{\text{exact}}}{\partial q}(q_n, p_n).$$

In [None]:
def sample(batch, amplitude=1.0):
    q = torch.rand(batch, 1).sub(0.5).mul(2 * amplitude)
    p = torch.rand(batch, 1).sub(0.5).mul(2 * amplitude)
    return q, p


q_sample, p_sample = sample(5)
H_exact.time_derivatives(q_sample, p_sample)

We then create a neural Hamiltonian, $H_{\text{neural}}$, and compare its predicted time derivatives

$$\frac{\text{d}q_{\text{pred}}}{\text{d}t}(q_n, p_n) = \frac{\partial H_{\text{neural}}}{\partial p}(q_n, p_n) \quad \text{and} \quad \frac{\text{d}p_{\text{pred}}}{\text{d}t}(q_n, p_n) = -\frac{\partial H_{\text{neural}}}{\partial q}(q_n, p_n)$$

against the true time derivatives with a mean squared error loss:

$$\mathcal{L} = N^{-1} \sum\limits_{n=1}^{N} \left( \left( \frac{\text{d}q_{\text{true}}}{\text{d}t}(q_n, p_n) - \frac{\text{d}q_{\text{pred}}}{\text{d}t}(q_n, p_n) \right)^2 + \left( \frac{\text{d}p_{\text{true}}}{\text{d}t}(q_n, p_n) - \frac{\text{d}p_{\text{pred}}}{\text{d}t}(q_n, p_n) \right)^2  \right).$$

Thus, the neural Hamiltonian learns to replicate the time deriviaties of the exact Hamiltonian.

In [None]:
H_neural = NeuralHamiltonian(shape=(1,), hidden=16)
optimizer = torch.optim.Adam(H_neural.parameters(), lr=1e-3)

for epoch in range(1, 10001):
    optimizer.zero_grad()
    q, p = sample(50, amplitude=1.5)
    dq_dt_true, dp_dt_true = H_exact.time_derivatives(q, p)
    dq_dt_pred, dp_dt_pred = H_neural.time_derivatives(q, p)
    loss = (
        (dq_dt_true - dq_dt_pred).pow(2) + (dp_dt_true - dp_dt_pred).pow(2)
    ).mean()
    loss.backward()
    optimizer.step()
    if epoch == 1 or epoch % 500 == 0:
        print(f"epoch: {epoch:5d} | loss: {loss.item():.5e}")

To evaluate our trained model, we compare simulated trajectories (2000 steps) and measure deviations in the propagated generalized coordinates and momenta.

In [None]:
def simulate(H, q, p, delta_t, steps):
    qt, pt = [q.unsqueeze(0)], [p.unsqueeze(0)]
    for _ in range(steps):
        q, p = H.step(q, p, delta_t)
        qt.append(q.unsqueeze(0))
        pt.append(p.unsqueeze(0))
    return torch.cat(qt, dim=0).detach(), torch.cat(pt, dim=0).detach()


q = torch.tensor([1.0])  # initial coordinate
p = torch.tensor([0.0])  # initial momentum

steps = 2000  # make 2000 steps
delta_t = 0.01  # time step

qt_true, pt_true = simulate(H_exact, q, p, delta_t, steps)
qt_pred, pt_pred = simulate(H_neural, q, p, delta_t, steps)

fig, axes = plt.subplots(ncols=2, figsize=(8, 3))

axes[0].plot(qt_true, color="C0", label="q (true)")
axes[0].plot(qt_pred, "--", color="C0", label="q (pred)")
axes[0].plot(pt_true, color="C1", label="p (true)")
axes[0].plot(pt_pred, "--", color="C1", label="p (pred)")
axes[0].set_ylabel("q, p / a.u.")

axes[1].plot(qt_true - qt_pred, label="q")
axes[1].plot(pt_true - pt_pred, label="p")
axes[1].set_ylabel("error / a.u.")

for ax in axes.flat:
    ax.set_xlabel("step")
    ax.legend()
fig.tight_layout()

Both true and predicted trajectories $(q(t), p(t))$ stay very close. The error oscilates around zero with an amplitude that increases over time.

## Trajectory matching

In another approach, train the neural Hamiltonian on observed trajectory data, i.e., trajectories $(q_{\text{true}}(t_n), p_{\text{true}}(t_n))$, $n=0,\dots,N$, generated by the exact Hamiltonian.

The aim is that the neural Hamiltonian learns to correctly propagate

$$H_{\textrm{neural}}:(q_{\text{true}}(t_{n-1}), p_{\text{true}}(t_{n-1})) \mapsto (q_{\text{pred}}(t_n), p_{\text{pred}}(t_n))$$

such that the mean squared error on the generalized coordinates and momenta becomes minimal:

$$\mathcal{L} = N^{-1} \sum\limits_{n=1}^{N} \left( \left( q_{\text{true}}(t_{n}) - q_{\text{pred}}(t_{n}) \right)^2 + \left( p_{\text{true}}(t_{n}) - p_{\text{pred}}(t_{n}) \right)^2  \right).$$

In this particular example, we

- sample 20 random initial coordinates and momenta
- propagate those simultaneously over 200 steps with $H_{\text{exact}}$
- subsample by keeping only every 50th step ($t_0, t_{50}, t_{100}, t_{150}, t_{200}$) which gives us four propagations per initial condition
- replicate the 50 step propagations with $H_{\text{neural}}$
- minimize the mean squared error between the true and predicted propagated coordinates and momenta

In [None]:
q_ini, p_ini = sample(20, amplitude=1.5)
subsample = 50
delta_t = 0.01

qt, pt = simulate(H_exact, q_ini.squeeze(), p_ini.squeeze(), delta_t, 200)
qt = qt[::subsample].T
pt = pt[::subsample].T

q0, q_true = qt[:, :-1].reshape(-1, 1), qt[:, 1:].reshape(-1, 1)
p0, p_true = pt[:, :-1].reshape(-1, 1), pt[:, 1:].reshape(-1, 1)

H_neural = NeuralHamiltonian(shape=(1,), hidden=16)
optimizer = torch.optim.Adam(H_neural.parameters(), lr=1e-3)

for epoch in range(1, 2001):
    q_pred, p_pred = q0.clone().detach(), p0.clone().detach()
    optimizer.zero_grad()
    for _ in range(subsample):
        q_pred, p_pred = H_neural.step(q_pred, p_pred, delta_t)
    loss = ((q_true - q_pred).pow(2) + (p_true - p_pred).pow(2)).mean()
    loss.backward()
    optimizer.step()
    if epoch == 1 or epoch % 100 == 0:
        print(f"epoch: {epoch:5d} | loss: {loss.item():.5e}")

We again compare simulated trajectories (2000 steps) and measure deviations in the propagated generalized coordinates and momenta.

In [None]:
q = torch.tensor([1.0])  # initial coordinate
p = torch.tensor([0.0])  # initial momentum

steps = 2000  # make 2000 steps
delta_t = 0.01  # time step

qt_true, pt_true = simulate(H_exact, q, p, delta_t, steps)
qt_pred, pt_pred = simulate(H_neural, q, p, delta_t, steps)

fig, axes = plt.subplots(ncols=2, figsize=(8, 3))

axes[0].plot(qt_true, color="C0", label="q (true)")
axes[0].plot(qt_pred, "--", color="C0", label="q (pred)")
axes[0].plot(pt_true, color="C1", label="p (true)")
axes[0].plot(pt_pred, "--", color="C1", label="p (pred)")
axes[0].set_ylabel("q, p / a.u.")

axes[1].plot(qt_true - qt_pred, label="q")
axes[1].plot(pt_true - pt_pred, label="p")
axes[1].set_ylabel("error / a.u.")

for ax in axes.flat:
    ax.set_xlabel("step")
    ax.legend()
fig.tight_layout()

We can observe the same behavior again:
- both true and predicted trajectories $(q(t), p(t))$ stay very close
- the error oscilates around zero with an amplitude that increases over time

To summarize, a neural Hamiltonian can learn the dynamics of an exact Hamiltonian via gradient or trajectory matching.

In the next notebook, we will look at a double well potential and thermostatting.