# Training a model on forces and energies

In addition to the energy, machine learning models can also be used to model response properties such as forces.
These are $N_\mathrm{atoms} \times 3$ arrays describing the Cartesian force acting on each atom due to the overall (potential) energy.
They are formally defined as the negative gradient of the energy $E_\mathrm{pot}$ with respect to the nuclear positions $\mathbf{R}$
\begin{equation}
\mathbf{F}^{(\alpha)} = -\frac{\partial E_\mathrm{pot}}{\partial \mathbf{R}^{(\alpha)}},
\end{equation}
where $\alpha$ is the index of the nucleus.

The above expression offers a straightforward way to include forces in machine learning models by simply defining a model for the energy and taking the appropriate derivatives. 
The resulting model can directly be trained on energies and forces.
Moreover, in this manner energy conservation and the correct behaviour under rotations of the molecule is guaranteed.

Using forces in addition to energies to construct a machine learning model offers several advantages.
Accurate force predictions are important for molecular dynamics simulations, which will be covered in the subsequent tutorial.
Forces also encode a greater wealth of information than the energies. 
For every molecule, only one energy is present, while there are $3N_\mathrm{atoms}$ force entries.
This property, combined with the fact that reference forces can be computed at the same cost as energies, makes models trained on forces and energies very data efficient.

In the following, we will show how to train such force models and how to use them in practical applications.

## Preparing the data

The process of preparing the data is smilar to the tutorial on [QM9](tutorial_02_qm9.ipynb). We begin by importing all relevant packages and generating a directory for the tutorial experiments.

In [12]:
import os
import torch
import torchmetrics
import schnetpack as spk
import schnetpack.transform as trn
import pytorch_lightning as pl
import schnetpack.properties as prp

forcetut = './forcetut'
if not os.path.exists(forcetut):
    os.makedirs(forcetut)

Next, the data needs to be loaded from a suitable dataset. 
For convenience, we use the MD17 dataset class provided in SchNetPack, which automtically downloads and builds suitable databases containing energies and forces for a range of small organic molecules. 
In this case, we use the ethanol molecule as an example.

In [2]:
from schnetpack.datasets import MD17

%rm split.npz

ethanol_data = MD17(
    os.path.join(forcetut,'ethanol.db'),
    molecule='ethanol',
    batch_size=100,
    num_train=1000,
    num_val=1000,
    transforms=[
        trn.TorchNeighborList(cutoff=5.),
        trn.RemoveOffsets(MD17.energy, remove_mean=True, remove_atomrefs=False),
        trn.CastTo32()
    ],
    property_units={MD17.energy: 'kcal/mol'},
    num_workers=1,
    split_file=os.path.join(forcetut, "split.npz")
)
ethanol_data.prepare_data()
ethanol_data.setup()

rm: cannot remove 'split.npz': No such file or directory


100%|██████████| 10/10 [00:15<00:00,  1.57s/it]


As in the last tutorial, the downloading and data splitting is directly carried out by the data modules.
Once again, we want to use the mean and standardeviation of the energies in the training data to precondition our model.
This only needs to be done for the energies, since the forces are obtained as derivatives and automatically capture the scale of the data.
Unlike in the case of QM9, the subtraction of atomic reference energies is not necessary, since only configurations of the same molecule are loaded.

For custom datasets, the data would have to be loaded via the SchNetPack `AseAtomsData` class.
In this case, one needs to make sure, that the naming of properties is kept consistent. The `schnetpack.properties`
module provides standard names for a wide range of properties.
Here, we use the definitions provided with the `MD17` class.

In order to train force models, forces need to be included in the reference data.
Once the dataset was loaded, this can e.g. be checked as follows:

In [3]:
properties = ethanol_data.train_dataset[0]
print('Loaded properties:\n', *['{:s}\n'.format(i) for i in properties.keys()])

Loaded properties:
 _idx
 energy
 forces
 _n_atoms
 _atomic_numbers
 _positions
 _cell
 _pbc
 _idx_i
 _idx_j
 _Rij



As you see, `energy` and `forces` are included in the properties dictionary. To have a look at the `forces` array
and check whether it has the expected dimensions, we can call:

In [4]:
print('Forces:\n', properties[MD17.forces])
print('Shape:\n', properties[MD17.forces].shape)


Forces:
 tensor([[-26.1604, -30.9251, -63.7383],
        [ -1.6113,  -9.5043, 106.5988],
        [ 23.3516,  10.0741, -34.2052],
        [ -6.7063,  15.5720,  -2.0589],
        [  7.5514, -18.2026,  -6.1878],
        [ -2.9997, -18.9138,   4.3195],
        [-32.5186,  26.9276, -32.8140],
        [ 19.7319,  22.4136,  -4.4714],
        [ 19.4675,   2.5799,  32.6996]])
Shape:
 torch.Size([9, 3])


## Building the model

After preparing the data, we can now build and train the force model.
This is done in the same two steps as described in [QM9 tutorial](tutorial_02_qm9.ipynb):

1. Building the representation
2. Defining an output module

For the representation we can use the same `SchNet` layer as in the previous tutorial:

In [5]:
n_features = 128

cutoff = 5.
n_atom_basis = 30

radial_basis = spk.nn.GaussianRBF(n_rbf=20, cutoff=cutoff)
schnet = spk.representation.SchNet(
    n_atom_basis=n_atom_basis, n_interactions=3,
    radial_basis=radial_basis,
    cutoff_fn=spk.nn.CosineCutoff(cutoff)
)

Since we aim to model forces, we will use the `Forces` module that will calculate the forces as derivatives of the
energies predicted by the `Atomwise` module.
We also define `ModelOutputs` for energies and forces that define the losses, loss weights and metrics to be logged.

In [6]:
pred_energy = spk.atomistic.Atomwise(n_in=n_atom_basis, output_key=MD17.energy)
pred_forces = spk.atomistic.Forces(energy_key=MD17.energy, force_key=MD17.forces)

To train the model on energies and forces, we need to update the loss function to include the latter.
This combined loss function is:
\begin{equation}
\mathcal{L}(E_\mathrm{ref},\mathbf{F}_\mathrm{ref},E_\mathrm{pred}, \mathbf{F}_\mathrm{pred}) = \frac{1}{n_\text{train}} \sum_{n=1}^{n_\text{train}} \left[  \rho_\text{energy} \left( E_\mathrm{ref} - E_\mathrm{pred} \right)^2  +  \frac{(\rho_\text{forces})}{3N_\mathrm{atoms}} \sum^{N_\mathrm{atoms}}_\alpha \left\| \mathbf{F}_\mathrm{ref}^{(\alpha)} - \mathbf{F}_\mathrm{pred}^{(\alpha)} \right\|^2 \right].
\end{equation}

We have introduced the loss weights $\rho$ in order to control the tradeoff between energy and force loss.
By varying this parameter, the accuracy on energies and forces can be tuned.

We define the model outputs

In [7]:
output_energy = spk.atomistic.ModelOutput(
    name=MD17.energy,
    loss_fn=torch.nn.MSELoss(),
    loss_weight=0.05,
    metrics={
        "MAE": torchmetrics.MeanAbsoluteError()
    }
)

output_forces = spk.atomistic.ModelOutput(
    name=MD17.forces,
    loss_fn=torch.nn.MSELoss(),
    loss_weight=0.95,
    metrics={
        "MAE": torchmetrics.MeanAbsoluteError()
    }
)

All components are then assembled to the final model. It is important, that the energy output module is first
in the list, since the `Forces` module needs to access the energy predictions. The output modules are always
called in the order they are passed.

In [8]:
model = spk.atomistic.AtomisticModel(
    representation=schnet,
    output_modules=[pred_energy, pred_forces],
    outputs=[output_energy, output_forces],
    optimizer_cls=torch.optim.AdamW,
    optimizer_args={"lr":5e-4},
    postprocess=[trn.CastTo64(), trn.AddOffsets(MD17.energy, add_mean=True, add_atomrefs=False)]
)

## Training the model

Now, we can train the model straightforward as we did in the last tutorial

In [9]:
logger = pl.loggers.TensorBoardLogger(save_dir=forcetut)
callbacks = [
    spk.train.ModelCheckpoint(
        inference_path=os.path.join(forcetut, "best_inference_model"),
        save_top_k=1,
        monitor="val_loss"
    )
]

trainer = pl.Trainer(
    callbacks=callbacks,
    logger=logger,
    default_root_dir=forcetut,
    max_epochs=3, # for testing, we restrict the number of epochs
)
trainer.fit(model, datamodule=ethanol_data)

GPU available: True, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
  rank_zero_warn(
  rank_zero_deprecation(

  | Name           | Type       | Params
----------------------------------------------
0 | representation | SchNet     | 16.0 K
1 | outputs        | ModuleList | 0     
2 | input_modules  | ModuleList | 0     
3 | output_modules | ModuleList | 481   
4 | postprocessors | ModuleList | 0     
----------------------------------------------
16.4 K    Trainable params
0         Non-trainable params
16.4 K    Total params
0.066     Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]

  rank_zero_warn(
  rank_zero_warn(
  rank_zero_warn(


Training: -1it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

As in the previous tutorial, training will produce several files in the model directory `forcetut`.
A copy of the best model is stored in `best_inference_model`.
You can have a look at the training log using Tensorboard:
```
tensorboard --logdir=forcetut/default
```

It should be noted, that the model trained here is used exclusively for demonstrative purposes.
Accordingly, its size and the training time have been reduced.
This puts strong constraints on the accuracy that can be obtained.
For practical applications, one would e.g. increase the number of features, the interaction layers,
the learning rate schedule and train until convergence (e.g., increasing the `max_epochs` and using a learning rate
scheduler with early stopping).

## Interface to ASE

As was shown in the [QM9 tutorial](tutorial_02_qm9.ipynb), one can also use the `AtomsConverter` to directly
operate on ASE atoms objects.

Having access to molecular forces also makes it possible to perform a variety of different simulations.
The `SpkCalculator` offers a simple way to perform all computations available in the ASE package.
Below, we create an ASE calculator from the trained model and the previously generated `atoms` object
(see [Preparing the data](#Preparing-the-data)).
One important point is, that the MD17 dataset uses kcal/mol and kcal/mol/&#8491; as units for energies and forces.
For the ASE interface, these need to be converted to the standard internal ASE units eV and eV/&#8491;.
This can be done by either passing the conversion factor or a string denoting the unit to the keywords `energy_units` and `force_units`. 

In [23]:
best_model = torch.load(os.path.join(forcetut, 'best_inference_model'))
converter = spk.interfaces.AtomsConverter(neighbor_list=trn.ASENeighborList(cutoff=5.), dtype=torch.float32)
calculator = spk.interfaces.SpkCalculator(model=best_model, converter=converter, energy=MD17.energy,
    forces=MD17.forces,
    energy_units='kcal/mol',
    forces_units='kcal/mol/A')

The calculator then the be set for an ASE atoms object. Then all ASE operations can be carried out
while using it as an interface to SchNetPack.

In [24]:
from ase import Atoms
props = ethanol_data.test_dataset[0]
atoms = Atoms(numbers=props[prp.Z], positions=props[prp.position])
atoms.set_calculator(calculator)

print('Prediction:')
print('energy:', atoms.get_total_energy())
print('forces:', atoms.get_forces())

print('Truth:')
print('energy:', props[MD17.energy])
print('forces:', props[MD17.forces])
print(best_model.postprocessors)

Prediction:
energy: -0.19704322751237244
forces: [[ 8.7245922e-07  1.7629648e-06  6.5420005e-07]
 [-8.6583617e-07  1.1417425e-06 -5.8072220e-07]
 [ 4.7002676e-08  6.0385139e-07  1.1046465e-06]
 [-1.4309309e-06 -3.2397054e-06  8.4199087e-08]
 [-1.1446546e-06  1.8793406e-06 -8.9261255e-07]
 [-2.5736330e-07  7.9135998e-07 -1.7031572e-06]
 [ 5.3865847e-07 -3.3917049e-06  1.6135522e-06]
 [ 1.2972332e-06  1.3266040e-06  1.5048033e-06]
 [ 9.4343136e-07 -8.7445295e-07 -1.7849093e-06]]
Truth:
energy: tensor([6.0858])
forces: tensor([[ 90.4475,  25.0147, -31.5468],
        [-60.5194,  33.3531, -29.9104],
        [ 23.3405,  16.5439,  38.4760],
        [-18.8251, -49.9789,   1.9826],
        [-23.6364,  20.5905,   1.6389],
        [  4.5124,   4.9655, -26.1553],
        [ 19.3486, -57.7242,  32.3682],
        [ -0.5251,   8.9837,  27.1222],
        [-34.1095,  -1.6611, -13.9780]])
ModuleList(
  (0): CastTo64()
  (1): AddOffsets()
)


Among the simulations which can be done by using ASE and a force model are geometry optimisation, normal mode analysis and simple molecular dynamics simulations.

The `AseInterface` of SchNetPack offers a convenient way to perform basic versions of these computations.
Only a file specifying the geometry of the molecule and a pretrained model are needed.

We will first generate a XYZ file containing an ethanol configuration:

In [None]:
from ase import io

# Generate a directory for the ASE computations
ase_dir = os.path.join(forcetut, 'ase_calcs')

if not os.path.exists(ase_dir):
    os.mkdir(ase_dir)

# Write a sample molecule
molecule_path = os.path.join( ase_dir, 'ethanol.xyz')
io.write(molecule_path, atoms, format='xyz')

The `AseInterface` is initialized by passing the path to the molecule, the model and a computation directory.
In addion, the computation device for the force model and how energies and forces are called in the output, as well as their units, need to be provided.

In [None]:
ethanol_ase = spk.interfaces.AseInterface(
    molecule_path,
    best_model,
    ase_dir,
    device,
    energy=MD17.energy,
    forces=MD17.forces,
    energy_units='kcal/mol',
    forces_units='kcal/mol/A'
)

### Geometry optimization

For some applications it is neccessary to relax a molecule to an energy minimum.
In order to perform this optimization of the molecular geometry, we can simply call

In [None]:
ethanol_ase.optimize(fmax=1e-4)

Since we trained only a reduced model, the accuracy of energies and forces is not optimal and several steps are needed to optimize the geometry.

### Normal mode analysis

Once the geometry was optimized, normal mode frequencies can be obtained from the Hessian (matrix of second derivatives) of the molecule.
The Hessian is a measure of the curvature of the potential energy surface and normal mode frequencies are useful for determining, whether an optimization has reached a minimum.
Using the `AseInterface`, normal mode frequencies can be obtained via:

In [None]:
ethanol_ase.compute_normal_modes()

Imaginary frequencies indicate, that the geometry optimisation has not yet reached a minimum.
The `AseInterface` also creates an `normal_modes.xyz` file which can be used to visualize the vibrations with jmol.

### Molecular dynamics

Finally, it is also possible to basic run molecular dynamics simulations using this interface.
To do so, we first need to prepare the system, where we specify the simulation file.
This routine automatically initializes the velocities of the atoms to a random number corresponding to a certain average kinetic energy.

In [None]:
ethanol_ase.init_md(
    'simulation'
)

The actual simulation is performed by calling the function `run_md` with a certain number of steps:

In [None]:
ethanol_ase.run_md(1000)

During simulation, energies and geometries are logged to `simulation.log` and `simulation.traj`, respectively.

We can for example visualize the evolution of the systems total and potential energies as

In [None]:
# Load logged results
results = np.loadtxt(os.path.join(ase_dir, 'simulation.log'), skiprows=1)

# Determine time axis
time = results[:,0]

# Load energies
energy_tot = results[:,1]
energy_pot = results[:,2]
energy_kin = results[:,3]

# Construct figure
plt.figure(figsize=(14,6))

# Plot energies
plt.subplot(2,1,1)
plt.plot(time, energy_tot, label='Total energy')
plt.plot(time, energy_pot, label='Potential energy')
plt.ylabel('E [eV]')
plt.legend()

plt.subplot(2,1,2)
plt.plot(time, energy_kin, label='Kinetic energy')
plt.ylabel('E [eV]')
plt.xlabel('Time [ps]')
plt.legend()

temperature = results[:,4]
print('Average temperature: {:10.2f} K'.format(np.mean(temperature)))

plt.show()

As can be seen, the potential and kinetic energies fluctuate, while the total energy (sum of potential and kinetic energy) remains approximately constant.
This is a good demonstration for the energy conservation obtained by modeling forces as energy derivatives.
Unfortunately, this also means that energy conservation is not a sufficient measure for the quality of the potential.

However, frequently one is interested in simulations where the system is coupled to an external heat bath.
This is the same as saying that we wish to keep the average kinetic energy of the system and hence temperature close a certain value.
Currently the average temperature only depends on the random velocities drawn during the initialization of the dynamics.
Keeping a constant temperature average be achived by using a so-called thermostat.
In the `AseInterface`, simulations with a thermostat (to be precise a Langevin thermostat) can be carried out by providing the `temp_bath` keyword.
A simulation with e.g. the target temperature of 300K is performed via:

In [None]:
ethanol_ase.init_md(
    'simulation_300K',
    temp_bath=300,
    reset=True
)
ethanol_ase.run_md(20000)

We can now once again plot total and potential energies.
Instead of the kinetic energy, we now plot the temperature (both quantities are directly related).

In [None]:
# Load logged results
results = np.loadtxt(os.path.join(ase_dir, 'simulation_300K.log'), skiprows=1)

# Determine time axis
time = results[:,0]
#0.02585
# Load energies
energy_tot = results[:,1]
energy_pot = results[:,2]

# Construct figure
plt.figure(figsize=(14,6))

# Plot energies
plt.subplot(2,1,1)
plt.plot(time, energy_tot, label='Total energy')
plt.plot(time, energy_pot, label='Potential energy')
plt.ylabel('Energies [eV]')
plt.legend()

# Plot Temperature
temperature = results[:,4]

# Compute average temperature
print('Average temperature: {:10.2f} K'.format(np.mean(temperature)))

plt.subplot(2,1,2)
plt.plot(time, temperature, label='Simulation')
plt.ylabel('Temperature [K]')
plt.xlabel('Time [ps]')
plt.plot(time, np.ones_like(temperature)*300, label='Target')
plt.legend()
plt.show()

Since our molecule is now subjected to external influences via the thermostat the total energy is no longer conserved.
However, the simulation temperature now fluctuates near to the requested 300K.
This can also be seen by computing the temperature average over time, which is now close to the desired value in contrast to the previous simulation.

## Summary

In this tutorial, we have trained a SchNet model on energies and forces using the MD17 ethanol dataset as an example. 
We have then evaluated the performance of the model and performed geometry optimisation, normal mode analysis and basic molecular dynamic simulations using the SchNetPack ASE interface.

While these simulations can already be useful for practical applications, SchNetPack also comes with its own molecular dynamics package.
This package makes it possible to run efficient simulations on GPU and also offers access to advanced techniques, such as ring polymer dynamics.
In the next tutorial, we will cover how to perform molecular dynamics simulations directly with SchNetPack.