# Force Fields for Materials

Machine learning force fields for studying materials has become increasingly popular. However, the large size of the physical systems associated with most materials require some tricks to keep the amount of compute and required data sufficiently low. 

In this tutorial, we will describe some of those tricks and how they can be implemented in the SchNetPack pipeline, namely:
- **periodic boundary conditions (PBC)**: PBC allow to effectively reduce the number of simulated particles to only a fraction of the actual system's size. This is achieved by considering a relatively small simulation box, which is periodically repeated at its boundaries. In most cases, the resulting simulated periodic structure is a good approximation if the system under consideration.
- **cached neighbor lists**: For large systems, the computation of all neighbors is expensive. In the training procedure this problem can be cirumvented by utilizing neihbor list caching. This way, the neighbors must only be computed in the first epoch. In the subsequent epochs the cached neihbor lists can be loaded, which reduces the training time tremendously.
- **neighbor lists with skin buffer**: In the scope of molecular dynamics simulations or structure relaxations, caching neighbor lists is not util since the neighborhoods change with each integration step. Hence, it is recommended to use a neighbor list that utilizes a so called skin buffer. Latter represents a thin layer that extends the cutoff region. It allows to reuse the calculated neighbor lists for samples with similar atomic positions. Only when at least one atom penetrates the skin buffer, the neighbor list is recalculated.
- **filtering out neighbors (neighbor list postprocessing)**: Also in the feed forward pass of the network, a large number of neighbors, and thus interactions, can result in slow inference and training. In some scenarios it is crucial to have as few operations as possible in the model to ensure fast inference. This can be achieved, e.g., by filtering out some neighbors from the neighbor list.
- **prediction target emphasizing**: In some occations it may be useful to exclude the properties of some atoms from the training procedure. For example you might want to focus the training on the forces of only some atoms and neglect the rest. An examplary use case would be a model used for structure optimization where some atoms are fixed during the simulation (zero forces). Or when filtering out neighbors of certain atoms, it might be reasonable to exclude the corresponding atomic properties from the training loss. 


In the following tutorial, we will first describe how the dataset must be prepared to allow for utilizing the above mentioned tricks. Subsequently, we explain how the configs in the SchNetPack framework must be adapted for training and inference, accordingly. The dataset preparation part is based on the tutorial script "tutorial_01_preparing_data". Please make sure you have understood the concepts introduced there, before continuing with this tutorial.

## Preparing the Dataset

First we will demonstrate how to prepare the dataset to enable the use of periodic boundary conditions (pbc). "tutorial_01_preparing_data" describes how to prepare your own data for SchNetPack. For this purpose, it is shown how to convert the publicly available dataset of uracil in vaccuum to a SchNetPack readable database (ase). Here we expand this tutorial by describing how to add periodic boundary conditions to your system. This is achieved by adding the periodic boundary conditions to the ASE Atoms object before storing it in the database. 

The periodic boundary of an ASE Atoms object can be specified as follows:

In [None]:
from ase import Atoms
import numpy as np


periodic_cell_x = 28.
periodic_cell_y = 24.
periodic_cell_z = 72.

ats = Atoms()
ats.pbc = np.array([True, True, True])
ats.set_cell(np.array([periodic_cell_x, periodic_cell_y, periodic_cell_z]))

For the purpose of filtering out certain neighbors from the neighbor list, one has to specify a list of atom indices. Between the corresponding atoms, all interactions are neglected. This set of atom indices must be stored in the dataset in form of a numpy array. In our examplary system we neglect all interactions between the atoms with index 4, 10, and 15.

In [None]:
filtered_out_neighbors = np.array([4, 10, 15])

To specify the atoms, which targets should be considered in the model optimization, one must define a list of booleans, indicating considered and neglected atoms. This boolean array should be stored in the database along with other sample properties such as, e.g., energy, forces, and the array of filtered out neighbors.

For an exemplary system of 20 atoms, the array of considered atoms could be defined as follows:

In [None]:
n_atoms = 20 # number of total atoms in the system
# initialize array specifying considered atoms
considered_atoms = np.ones(n_atoms, dtype=bool)

# atom 4 and atom 5 should be neglected in the model optimization
considered_atoms[[4, 10, 15]] = False

For the example of the uracil dataset, the data preparation with pbc, where some neighbors are filtered out and the targets of the corresponding atoms are neglected in the training procedure, would look as follows:

In [None]:
from ase import Atoms
import numpy as np

# load atoms from npz file. Here, we only parse the first 10 molecules
data = np.load('./uracil_dft.npz')

numbers = data["z"]
atoms_list = []
property_list = []
for positions, energies, forces in zip(data["R"], data["E"], data["F"]):
    ats = Atoms(positions=positions, numbers=numbers)

    # pbc
    ats.pbc = np.array([True, True, True])
    ats.set_cell(np.array([periodic_cell_x, periodic_cell_y, periodic_cell_z]))    
    
    # prediction target emphazising
    n_atoms = len(at_nums) 
    considered_atoms = np.ones(n_atoms, dtype=bool)
    considered_atoms[filtered_out_neighbors.tolist()] = False
    
    properties = {"energy": energies, 
                  "forces": forces, 
                  "considered_atoms": considered_atoms, 
                  "filtered_out_neighbors", filtered_out_neighbors}
    property_list.append(properties)
    atoms_list.append(ats)
    
print('Properties:', property_list[0])

Subsquently, the dataset is created the same way as for urcil in vacuum, with the additional unitless properties "considered_atoms" and "filtered_out_neighbors".

In [None]:
new_dataset = ASEAtomsData.create(
    './new_dataset.db', 
    distance_unit='Ang',
    property_unit_dict={
        'energy':'kcal/mol', 'forces':'kcal/mol/Ang', "considered_atoms": "", "filtered_out_neighbors": ""
    }
)
new_dataset.add_systems(property_list, atoms_list)

# Adapting the Configs in the SchNetPack Framework

Now we will cover, how to adapt config files to enable the use of the above mentioned tricks in SchNetPack's traning procedure and MD framework

### SchNetPack Training

Provided that appropriate neighbor list providers (ASE or MatScipy) are used, this is sufficient for using pbc in the SchNet/PaiNN framework.

The neighbor list caching is implemented in the schnetpack transform ``schnetpack.transform.CachedNeighborList``. SchNetPack provides a transform for caching neighbor lists. It basically functions as a wrapper around a common neighbor lists. For further information regarding the CachedNeighborList please refer to the corresponding docstring in the schnetpack code.

Neighbors can be filtered out by using the neighbor list postprocessing transform ``schnetpack.transform.FilterNeighbors``.

To ensure that only the specified atoms are considered for the training on a certain property, the respective ModelOutput object has to be adapted. This is achieved by using so called constraints. Each ModelOutput object takes a list of constraints. For a precise explanation on how to use ``schnetpack.task.ModelOutput`` please refer to notebook "tutorial_02_qm9". To specify the selection of atoms for training we use the constraint transform ``schnetpack.task.ConsiderOnlySelectedAtoms``. It has the attribute selection_name, which should be a string linking to the array of specified atoms stored in the database.

The following is an example of an experiment config file that utilizes the above mentioned tricks.

### SchNetPack MD and Structure Relaxation

In the framework of MD simulations and structure relaxations it is preferable to utilize neighbor lists with skin buffers. The corresponding class in SchNetPack is called ``schnetpack.transform.SkinNeighborList``. It takes as an argument a conventional neighbor list class such as, e.g., ASENeighborList, post-processing transforms for manipulating the neighbor lists and the cutoff skin which defines the size of the skin buffer around the actual cutoff region. Please choose a sufficiently large cutoff skin value to ensure that between two subsequent samples no atom can penetrate through the skin into the cutoff sphere of another atom if it is not in the neighbor list of that atom.