# Knowledge-guided ML for Dynamical Systems

This part of the homework focuses on knowledge-guided machine learning (KGML) methods for modeling dynamical systems. You will implement simplified versions of Neural ODE (NODE), Hamiltonian Neural Network (HNN), and Symplectic ODE-Net (SymODEN), which are methods we discussed in class. The last two methods blend physical knowledge with neural networks to enhance model interpretability and generalizability when modeling systems with inherent physical properties (like conservation of energy).


## Part 1: Neural ODE

The following libraries will be necessary for your implementation. Ensure you have them installed before proceeding.

In [1]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import scipy.integrate # SciPy module for integrating ODEs
solve_ivp = scipy.integrate.solve_ivp
from torchdiffeq import odeint  # Solver for ODEs integrated with PyTorch

# Custom functions and datasets for the pendulum problem
from data_pend import get_dataset, get_field, get_trajectory, dynamics_fn, hamiltonian_fn

# Set a random seed for reproducibility
seed = 888
torch.manual_seed(seed)

# Plotting parameters
LINE_SEGMENTS = 10
ARROW_SCALE = 40
ARROW_WIDTH = 6e-3
LINE_WIDTH = 2
DPI = 300
R = 2.7

### Data preparation

You will load the numerical solution of the ideal pendulum equations for Hamiltonian $$H(q, p) = 1.5 p^2 + 5(1-cos q).$$

Here your inverse mass is $3$ and potential energy $V (q) = 5(1 - cos q).$ As depicted in the figure below, $q$ is angular position, and $p$ is momentum. 

Your models will be trained with the synthetic data generated based on this Hamilonian. The data consists of training and test sets of 25 trajectories each and added Gaussian noise with standard deviation $\sigma^2 = 0.1$ to every data point. Each trajectory has 30 observations; each observation is a concatenation of $(q,p)$. Note that for our methods we assume we do not know the exact Hamiltonian equations above.

In [2]:
# Use function to generate data
data = get_dataset(seed=seed)
# Convert data to PyTorch tensors
x = torch.tensor( data['x'], dtype=torch.float32)
test_x = torch.tensor( data['test_x'], dtype=torch.float32)
dx_dt = torch.Tensor(data['dx'])
test_dx_dt = torch.Tensor(data['test_dx'])

In [None]:
# visualize
field = get_field(xmin=-R, xmax=R, ymin=-R, ymax=R, gridsize=15)

###### PLOT ######
fig = plt.figure(figsize=(8, 2.6), facecolor='white', dpi=DPI)

# plot physical system
fig.add_subplot(1, 4, 1, frameon=True) 
plt.xticks([]) ;  plt.yticks([])
schema = mpimg.imread('pendulum.png')
plt.imshow(schema)
plt.title("Pendulum system", pad=10)

# plot one single trajectory, for a given initial conditions
y0 = np.asarray([2.1, 0])
fig.add_subplot(1, 4, 2, frameon=True)
x_traj, y_traj, dx_traj, dy_traj, t_traj = get_trajectory(t_span=[0,4], radius=2.1, y0=y0)
N = len(x_traj)
point_colors = [(i/N, 0, 1-i/N) for i in range(N)]
plt.quiver(field['x'][:,0], field['x'][:,1], field['dx'][:,0], field['dx'][:,1],
           cmap='gray_r', color=(.5,.5,.5))
plt.scatter(x_traj, y_traj, s=14, label='data', c=point_colors)
plt.xlabel("$q$", fontsize=14)
plt.ylabel("$p$", rotation=0, fontsize=14)
plt.title("One trajectory", pad=10)

# plot all trajectory (training data), each trajectory a different initial condition
fig.add_subplot(1, 4, 3, frameon=True)
plt.quiver(field['x'][:,0], field['x'][:,1], field['dx'][:,0], field['dx'][:,1],
           cmap='gray_r', color=(.5,.5,.5))
plt.scatter(data['x'][:,0],data['x'][:,1],s=10, c='b',alpha=0.3)
plt.xlabel("$q$", fontsize=14)
plt.ylabel("$p$", rotation=0, fontsize=14)
plt.title("Many trajectories \n(our training data)", pad=10)

plt.tight_layout() ; plt.show()

### Neural ODE architecture

A basic architecture for your Neural ODE is provided. 

**Task 1: Complete the integrate method**
* Use the function `odeint` from `torchdiffeq`.
* Input `x0` are your initial conditions.

In [4]:
class NeuralODE(nn.Module):
    ''' NeuralODE on canonical coordinates'''
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()

        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.Tanh(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.Tanh(),
            nn.Linear(hidden_dim, output_dim, bias=None)
        )

        # Initialize weights
        self._initialize_weights(self.net)

    def _initialize_weights(self, module):
        # Xavier (Glorot) initialization for all linear layers
        for layer in module:
            if isinstance(layer, nn.Linear):
                nn.init.xavier_uniform_(layer.weight)  # Initialize weights
                if layer.bias is not None:
                    nn.init.zeros_(layer.bias)  # Initialize biases to zero

    def forward(self, t, x):
        '''Compute the time derivative of x, dx/dt, using the neural network'''
        dx_dt = self.net(x)
        return dx_dt
    
    def simulate(self, x0, t):
        '''
        Simulate the system dynamics over time using the neural network output.
        - x0: Initial state of the system
        - t: Time points for simulation
        '''
        # TODO: Integrate using output of the neural network, use method dopri5
        return ...

**Task 2: Instantiate the Neural ODE Model**
* Define the dimensions (input_dim, hidden_dim, output_dim) for the network and instantiate the NODE model.

In [None]:
# TODO: Instantiate the model
node_model = NeuralODE(
    input_dim=...,  
    hidden_dim=..., 
    output_dim=..., 
)

# Display the neural network architecture
print(node_model)

### Train the Neural ODE 

**Task 3: Forward pass and loss computation**
* For both train and test data


In [None]:
def train_model(model):
    """ Function to train our neural dynamical models """
    # Set up the optimizer
    optim = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-5)
    # Set maximum training iterations and print frequency for logging
    max_iterations = 2000
    print_every = 100
    # Loss
    loss_fcn = nn.MSELoss()

    # Training loop
    for iteration in range(max_iterations+1):
        
        # TODO: Perform a forward pass on the training data and compute the loss
        # dx_dt_hat is the predicted time derivatives (model output)
        dx_dt_hat = ...
        loss = loss_fcn(...)
        loss.backward()
        optim.step()
        optim.zero_grad()

        if iteration % print_every == 0:
            # TODO: Same as before, now use test data
            test_dx_dt_hat = ...
            test_loss = loss_fcn(...)
            # logging
            print("iteration {}, train_loss {:.4e}, test_loss {:.4e}".format(iteration, loss.item(), test_loss.item()))

train_model(node_model)

## Part 2: Hamilonian Neural Networks

### Hamilonian Neural Architecture

A basic architecture for the HNN is provided. 

**Task 4: Compute derivatives based on Hamiltonian**
* Leverage the properties of the Hamiltonian formalism

**Task 5: Construct dx_dt**
* Put together the two derivatives for your canonical coordinates

In [7]:
class HamiltonianNN(NeuralODE):
    ''' Hamiltonian neural net on canonical coordinates
        This class extends the NeuralODE class and incorporates the Hamiltonian formalism.

        It predicts the system's Hamiltonian and uses its gradient to compute 
        time derivatives of the system's canonical coordinates (position and momentum).
    '''
    def __init__(self, 
                 input_dim, hidden_dim, output_dim):
        super().__init__(input_dim, hidden_dim, output_dim)

    def hamiltonian(self, x):
        # Compute the Hamiltonian of the system using the neural network.
        H = self.net(x)
        return H

    def forward(self, t, x):
        """
        Compute the time derivative of the input state using the Hamiltonian formulation
        It returns the time derivative of the input state, containing dq/dt and dp/dt.
        """
        
        # Reshape input to ensure it is of shape (-1, 2) for pairs of (q, p)
        x = x.reshape(-1,2)

        # Enable gradient tracking for the input tensor to compute derivatives
        x = x.requires_grad_(True)

        # TODO: Predict the Hamiltonian for the current state
        H = ...

        # TODO: Compute the gradient of the Hamiltonian with respect to input coordinates
        dH_dx = ...

        # Reshape gradient to match the expected dimensions
        dH_dx = dH_dx.reshape(-1,2)
        
        # TODO: Compute time derivatives using the Hamiltonian gradients
        dq_dt = ...
        dp_dt = ...

        # TODO: Put them together
        dx_dt = ...
        
        # Make sure they are the same shape as input x
        dx_dt = dx_dt.reshape_as(x)

        return dx_dt



**Task 6: Instantiate the HNN Model**
* Define the dimensions (input_dim, hidden_dim, output_dim) for the network and instantiate the HNN model.

In [None]:
# TODO: Instantiate the HNN model
hnn_model = HamiltonianNN(
    input_dim=...,  
    hidden_dim=..., 
    output_dim=..., 
)
print(hnn_model)

In [None]:
# train model
train_model(hnn_model)

### Plot the solution

Run the code below to visualize the fields learned by the two models.

In [None]:
def get_vector_field(model, **kwargs):
    field = get_field(**kwargs)
    np_mesh_x = field['x']
    # run model
    mesh_x = torch.tensor(np_mesh_x, requires_grad=True, dtype=torch.float32)
    mesh_dx = model.forward(None, mesh_x)
    return mesh_dx.data.numpy()

def energy_loss(true_x, integrated_x):
    true_energy = (true_x**2).sum(1)
    integration_energy = (integrated_x**2).sum(1)
    return np.mean((true_energy - integration_energy)**2)

# get their vector fields
R = 2.6
field = get_field(xmin=-R, xmax=R, ymin=-R, ymax=R, gridsize=10)
data = get_dataset(radius=2.0)
node_field = get_vector_field(node_model, xmin=-R, xmax=R, ymin=-R, ymax=R, gridsize=10)
hnn_field = get_vector_field(hnn_model, xmin=-R, xmax=R, ymin=-R, ymax=R, gridsize=10)

# integrate along those fields starting from point (1,0)
t_span = torch.linspace(0, 28, 1000, dtype=torch.float32)
x0 = torch.tensor([2.1, 0], dtype=torch.float32)
node_ivp = node_model.simulate(x0, t_span)
hnn_ivp = hnn_model.simulate(x0, t_span)


###### PLOT ######
fig = plt.figure(figsize=(11.3, 3.2), facecolor='white', dpi=DPI)

# plot physical system
fig.add_subplot(1, 4, 1, frameon=True) 
plt.xticks([]) ;  plt.yticks([])
schema = mpimg.imread('pendulum.png')
plt.imshow(schema)
plt.title("Pendulum system", pad=10)

# plot dynamics
fig.add_subplot(1, 4, 2, frameon=True)
x_traj, y_traj, dx_traj, dy_traj, t_traj = get_trajectory(t_span=[0,4], radius=2.1, y0=x0.detach().numpy())
N = len(x_traj)
point_colors = [(i/N, 0, 1-i/N) for i in range(N)]
plt.scatter(x_traj, y_traj, s=14, label='data', c=point_colors)

plt.quiver(field['x'][:,0], field['x'][:,1], field['dx'][:,0], field['dx'][:,1],
        cmap='gray_r', scale=ARROW_SCALE, width=ARROW_WIDTH, color=(.2,.2,.2))  
plt.xlabel("$q$", fontsize=14)
plt.ylabel("$p$", rotation=0, fontsize=14)
plt.title("Data", pad=10)

# plot neural ODE
fig.add_subplot(1, 4, 3, frameon=True)
plt.quiver(field['x'][:,0], field['x'][:,1], node_field[:,0], node_field[:,1],
        cmap='gray_r', scale=ARROW_SCALE, width=ARROW_WIDTH, color=(.2,.2,.2))

for i, l in enumerate(np.split(node_ivp.detach().numpy(), LINE_SEGMENTS)):
    color = (float(i)/LINE_SEGMENTS, 0, 1-float(i)/LINE_SEGMENTS)
    plt.plot(l[:,0],l[:,1],color=color, linewidth=LINE_WIDTH)
    
plt.xlabel("$q$", fontsize=14)
plt.ylabel("$p$", rotation=0, fontsize=14)
plt.title("Neural ODE", pad=10)

# plot HNN
fig.add_subplot(1, 4, 4, frameon=True)
plt.quiver(field['x'][:,0], field['x'][:,1], hnn_field[:,0], hnn_field[:,1],
        cmap='gray_r', scale=ARROW_SCALE, width=ARROW_WIDTH, color=(.2,.2,.2))

for i, l in enumerate(np.split(hnn_ivp.detach().numpy(), LINE_SEGMENTS)):
    color = (float(i)/LINE_SEGMENTS, 0, 1-float(i)/LINE_SEGMENTS)
    plt.plot(l[:,0],l[:,1],color=color, linewidth=LINE_WIDTH)

plt.xlabel("$q$", fontsize=14)
plt.ylabel("$p$", rotation=0, fontsize=14)
plt.title("Hamiltonian NN", pad=10)

plt.tight_layout() ; plt.show()

Now, let’s examine the error over time and the total energy. Recall that in our ideal pendulum system, there is no energy loss, so our goal is to learn dynamics that accurately conserve energy throughout the motion.

In [None]:
def integrate_models(dynamics_fn, model, x0, t_span, t_eval):
    kwargs = {'t_eval': t_eval, 'rtol': 1e-12}
    true_path = solve_ivp(fun=dynamics_fn, t_span=t_span, y0=x0, **kwargs)
    true_x = true_path['y'].T

    t_span_tensor = torch.tensor(t_eval, dtype=torch.float32)
    x0_tensor = torch.tensor(x0, dtype=torch.float32)
    model_x = model.simulate(x0_tensor, t_span_tensor).detach().numpy()

    return true_x, model_x

def evaluate_models(x0, t_span, model1, model1_name, model2, model2_name, dynamics_fn, hamiltonian_fn):

    t_eval = np.linspace(t_span[0], t_span[1], 2000)

    # Integrate the models
    true_x, model1_x = integrate_models(dynamics_fn, model1, x0, t_span, t_eval)
    _, model2_x = integrate_models(dynamics_fn, model2, x0, t_span, t_eval)

    # Plotting
    tpad = 7
    fig = plt.figure(figsize=[9, 3], dpi=100)

    plt.subplot(1, 3, 1)
    plt.title("Predictions", pad=tpad)
    plt.xlabel('$q$')
    plt.ylabel('$p$')
    plt.plot(true_x[:, 0], true_x[:, 1], 'k-', label='Ground truth', linewidth=2)
    plt.plot(model1_x[:, 0], model1_x[:, 1], 'r-', label=model1_name, linewidth=2)
    plt.plot(model2_x[:, 0], model2_x[:, 1], 'b-', label=model2_name, linewidth=2)
    plt.xlim(-2.5, 4)
    plt.ylim(-2.5, 4)
    plt.legend(fontsize=7)

    plt.subplot(1, 3, 2)
    plt.title("MSE between coordinates", pad=tpad)
    plt.xlabel('Time step')
    plt.plot(t_eval, ((true_x - model1_x) ** 2).mean(-1), 'r-', label=model1_name, linewidth=2)
    plt.plot(t_eval, ((true_x - model2_x) ** 2).mean(-1), 'b-', label=model2_name, linewidth=2)
    plt.legend(fontsize=7)

    plt.subplot(1, 3, 3)
    plt.title("Total energy", pad=tpad)
    plt.xlabel('Time step')
    true_e = np.stack([hamiltonian_fn(c) for c in true_x])
    model1_e = np.stack([hamiltonian_fn(c) for c in model1_x])
    model2_e = np.stack([hamiltonian_fn(c) for c in model2_x])
    plt.plot(t_eval, true_e, 'k-', label='Ground truth', linewidth=2)
    plt.plot(t_eval, model1_e, 'r-', label=model1_name, linewidth=2)
    plt.plot(t_eval, model2_e, 'b-', label=model2_name, linewidth=2)
    plt.legend(fontsize=7)

    plt.tight_layout()
    plt.show()

x0 = np.asarray([2.1, 0])
t_span = [0, 20]
evaluate_models(x0, t_span, node_model, 'Neural ODE', hnn_model, 'Hamiltonian NN', dynamics_fn, hamiltonian_fn)

## Part 3: Symplectic ODE-Net (SymODEN) 

SymODEN, unlike Hamiltonian Neural Networks (HNNs), explicitly utilizes a Hamiltonian formulation that accounts for both position and potential energy in the system. While our earlier methods did not leverage system parameters such as mass, we will continue to assume that this information is unavailable. However, we will exploit the fact that, in canonical coordinates, the Hamiltonian follows the general form:
$$H_{\theta_1, \theta_2}(\mathbf{q}, \mathbf{p}) = \frac{1}{2} \mathbf{p}^T \mathbf{M}_{\theta_1}^{-1}(\mathbf{q}) \mathbf{p} + V_{\theta_2}(\mathbf{q}),$$
where  \mathbf{q}  and  \mathbf{p}  represent position and momentum, respectively. In your implementation of SymODEN, you will build two neural networks: one to predict the inverse mass matrix $\mathbf{M}_{\theta_1}^{-1}(\mathbf{q})$ and another to predict the potential energy $V_{\theta_2}(\mathbf{q})$. This approach allows the model to capture the underlying structure of the physical system without explicitly knowing the system’s parameters, improving both interpretability and generalization.

**Task 7: Compute the Hamiltonian**
* Use your two neural networks and the equations above.

In [12]:
class SymODEN(HamiltonianNN):
    ''' 
    Symplectic ODE-Net (SymODEN)

    This class extends HamiltonianNN by introducing two neural networks to predict 
    the inverse mass matrix and potential energy. It leverages the 
    symplectic structure of physical systems to accurately capture the system dynamics 
    without direct knowledge of parameters like mass. 
    '''
    def __init__(self, 
                 input_dim, hidden_dim, output_dim):
        super().__init__(input_dim, hidden_dim, output_dim)

        # Neural network to predict the inverse mass matrix M^{-1}(q)
        self.M_net = nn.Sequential(
            nn.Linear(input_dim//2, hidden_dim//2),
            nn.Tanh(),
            nn.Linear(hidden_dim//2, hidden_dim//2),
            nn.Tanh(),
            nn.Linear(hidden_dim//2, output_dim, bias=None)
        )

        # Neural network to predict the potential energy V(q)
        self.V_net = nn.Sequential(
            nn.Linear(input_dim//2, hidden_dim//2),
            nn.Tanh(),
            nn.Linear(hidden_dim//2, hidden_dim//2),
            nn.Tanh(),
            nn.Linear(hidden_dim//2, output_dim, bias=None)
        )

        # Set the net attribute to None since SymODEN uses two separate networks
        self.net = None

        # Initialize the weights of both neural networks
        self._initialize_weights(self.M_net)
        self._initialize_weights(self.V_net)

    def hamiltonian(self, x):
        # Split the input tensor into position (q) and momentum (p)
        q, p = torch.chunk(x, 2, dim=1)
        
        # TODO: Compute the Hamiltonian using your two neural networks
        V_q = ...
        M_q_inv = ...
        H = ...
        return H
    


**Task 8: Instantiate the Model**
* Define the dimensions (input_dim, hidden_dim, output_dim) for the network and instantiate the model.

In [None]:
# TODO: Instantiate the model
symODE_model = SymODEN(
    input_dim=...,  
    hidden_dim=..., 
    output_dim=..., 
)
print(symODE_model)

In [2]:
# train model
train_model(symODE_model)

Now, let's compare this new model with the HNN.

In [4]:
x0 = np.asarray([2.1, 0])
t_span = [0, 20]
evaluate_models(x0, t_span, hnn_model, 'Hamiltonian NN', symODE_model, 'SymODEN', dynamics_fn, hamiltonian_fn)

### Plotting the Learned Functions for Inverse Mass and Potential Energy

Even if the total energy closely matches the ground truth, you may observe a non-trivial discrepancy (bias) between the predicted and true potential energy. This difference highlights subtle challenges in accurately learning individual components of the energy function. For a detailed explanation of this phenomenon, refer to the SymODEN paper.

In [None]:
fig = plt.figure(figsize=(5, 2.5), dpi=DPI)
q = np.linspace(-R, R, 40)
q_tensor = torch.tensor(q, dtype=torch.float32).view(40, 1)

plt.subplot(1, 2, 1)

M_q_inv = symODE_model.M_net(q_tensor)
plt.subplot(1, 2, 1)
plt.plot(q, 3 * np.ones_like(q), label='Ground Truth', color='k', linewidth=2)
plt.plot(q, M_q_inv.detach().cpu().numpy(), 'b--', linewidth=3, label=r'SymODEN $M^{-1}_{\theta_1}(q)$')
plt.xlabel("$q$", fontsize=14)
plt.title("$M^{-1}(q)$", pad=10, fontsize=14)
plt.xlim(-R, R)
plt.ylim(0, 4)
plt.legend(fontsize=10)

V_q = symODE_model.V_net(q_tensor)
plt.subplot(1, 2, 2)
plt.plot(q, 5*(1-np.cos(q)), label='Ground Truth', color='k', linewidth=2)
plt.plot(q, V_q.detach().cpu().numpy(), 'b--', linewidth=3, label=r'SymODEN $V_{\theta_2}(q)$')
plt.xlabel("$q$", fontsize=14)
plt.title("$V(q)$", pad=10, fontsize=14)
plt.xlim(-R, R)
plt.ylim(-15, 20)
plt.legend(fontsize=10)
plt.tight_layout()

**Task 9: Comment on the results**
* Compare methods quantitatively and qualitatively.