# __ML NNP model comparison__
This notebook provides the reference to different ML-based interatomic potentials to perform atomistic simulations. For each model, an introduction and detailed description is provided, including the molecular descriptors and approximations used in the model, as well as the architecture (if relevant) of each model. 

Furthermore, a comparison of each model is provided, which will turn out to be useful when evaluating the models and comparing and  **interpreting** the predictions. 

### __Quick literature and references for each model__
- **MACE**
    - [Repo](https://github.com/ACEsuit/mace)
    - [Documentation](https://mace-docs.readthedocs.io/en/latest/)

    - Papers
        - https://colab.research.google.com/drive/10F257UXRmnxi9tL4LoYtJ64K9e_Bjra9?authuser=1#scrollTo=ow8aZiGNnPGG
        - https://arxiv.org/abs/2205.06643
        - https://arxiv.org/abs/2206.07697
        - https://pubs.aip.org/aip/jcp/article/159/4/044118/2904837/Evaluation-of-the-MACE-force-field-architecture
        - https://arxiv.org/abs/2401.00096
        - https://arxiv.org/abs/2308.14920
        - https://arxiv.org/abs/2312.15211


- **TorchANI**
    - [Repo](https://github.com/aiqm/torchani)
    - [Documentation](https://aiqm.github.io/torchani/)

    - Papers
        - https://arxiv.org/abs/2410.22570
        - ANI-1: an extensible neural network potential with DFT accuracy at force eld computational cost
        - Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and Halogens
        - TorchANI: A Free and Open Source PyTorch-Based Deep Learning Implementation of the ANI Neural Network Potentials

- **Orbital Materials** ORB NNP
    - [Repo](https://github.com/orbital-materials/orb-models)
    - ~~[Documentation]()~~ (non existent)

    - Papers
        - a
    
- **SchNetPack**
    - [Repo](https://github.com/atomistic-machine-learning/schnetpack/tree/master)
    - [Documentation](https://schnetpack.readthedocs.io/en/latest/)

    - Papers
        - a

- **Other NNPs**
    - [MatterSim](https://github.com/microsoft/mattersim). Note that Microsoft MatterSim is not supported with Apple Silicon, and no ASE interface is provided. Due to the lack of convenience, it will not be tested. 
    - [FeNNol](https://github.com/thomasple/FeNNol). Does provide an [ASE calculator](https://thomasple.github.io/FeNNol/fennol/ase.html).
    - [CHGNet](https://github.com/CederGroupHub/chgnet). [Webpage](https://chgnet.lbl.gov) and [Article](https://www.nature.com/articles/s42256-023-00716-3)


# __TO classifify__
A neural network potential (NNP) utilizes the regression capabilities of NNs to predict molecular potential surfaces


- GNNs references:
    - https://www.youtube.com/watch?v=GXhBEj1ZtE8&t=224s
    - https://www.youtube.com/watch?v=ijmxpItkRjc
    - https://www.youtube.com/watch?v=lLlIOTtnMW8
    - https://www.youtube.com/watch?v=zCEYiCxrL_0
    - https://www.youtube.com/watch?v=cka4Fa4TTI4

# __MACE-MP__

## __1. Overview of the model__


## __2. In depth exploration__

### __2.1 Descriptors__

### __2.2 Approximations__

### __2.3 DL Architecture__

### __2.4 Dataset__

# __MACE-OFF__

## __1. Overview of the model__


## __2. In depth exploration__

### __2.1 Descriptors__

### __2.2 Approximations__

### __2.3 DL Architecture__

### __2.4 Dataset [TODO]__

# __ANI models. TorchANI.__

## __1. Overview of the model__

The ANI family of models aim to develop a transferable NNP that utilize a highly modified version of the Behler and Parrinello symmetry functions to build single-atom atomic environment vectors (AEV) as a molecular representation. The AEV examines the atom’s radial and angular chemical environment and is a numerical representation of its local chemical environment. AEVs provide a way to exploit chemical locality and the ability to train neural networks to data that spans both configurational and conformational space.

As discussed by Behler, there are three criteria that such representations must adhere to in order to ensure energy conservation and be useful for ML models: they must be rotationally and translationally invariant, the exchange of two identical atoms must yield the same result, and given a set of atomic positions and types the representation must describe a molecule's conformation in a unique way.

In 2007, Behler and Parrinello (BP) developed an approximate molecular representation, called symmetry functions (SFs), that take advantage of chemical locality in order to make neural network potentials25 (NNPs) transferable. Bartók et al. also suggested an alternative representation called smooth overlap of atomic positions (SOAP), where the similarity between two neighborhood environments is directly defined.

BPSFs have been employed innumerous studies where neural network potentials (NNPs) are trained to molecular total energies sampled from MD data to produce a function that can predict total energies of molecular conformations outside of the training set. In general, the NNPs developed in these studies are non-transferable.

SFs lack the functional form to create recognizable features (spatial arrangements of atoms found in common organic molecules, e.g. a benzene ring, alkenes, functional groups) in the molecular representation and have limited atomic number diferentiation, which empirically hinders training in complex chemical environments. 

## __2. In depth exploration__

### __2.1 Descriptors__
ANI utilizes a modified version of the original SFs to build single-atom atomic environment vectors (AEVs) as a molecular representation. AEVs are a form of Behler and Parrinello-type descriptors. This descriptors must be translationally and rotationally invariant, and also under permutation of the same type (atomic number) of atom. BPSFs satisfy the translational and rotational invariance (the functions use distances and internal coordinates). 

A transferable model must rely on chemical locality and invariant descriptors. The NN size must be independent of the number of atoms. The idea to obey these conditions is to compute the global energy $E_T$ as a sum of atom-diferentiated contributions, $E_i^X$, $X$ the atomic number and $i$ the index of the atom. To satisfty the permutation symmetry, if the NN applied to same-atom AEVs is the same (i.e. have the same NN for a given atom type), the permutation symmetry is satisfied. 

Typical inputs, such as internal coordinates or coulomb matrices, lack transferability to diferent molecules since the input size to a neural network must remain constant.

Atomic numbers and coordinates are used as inputs to compute the AEVs ($G_i^X$, descriptor of the chemical environment of each atom), and the same NN is used for same-atom types, satisfying all three invariance conditions. For an atom i, $G_i^X$ is designed to give a numerical representation, accounting for both radial and angular features, of i's local chemical environment.  $G_i^X = {G_1, G_2, G_3,\dots,G_M}$, each $G_k$ prove specific regions of an individual atom's radial and angular environment. The local atomic environment approximation is achieved with a piecewise cutoff function $f_C(R_{ij})$. 

Finally, the total energy of a molecule, $E_T$, is computed from the outputs, $E_i$, all 'atomic contributions' to the total energy.



**Summary.** Total que, donat una certa conformació (distàncies i angles), es calculen els AEV $G_i^X$ per un àtom $i$ i atomic number $X$, que tenen com a contribució modified Behler and Parrinello symmetry functions $G_m^R$ (radial) i $G_m^A$ (angular). Amb això predim la contribució a l'energia de l'àtom $i$, fent servir un MLP específic per $X$, $E_i^X$. En aquest sentit, ANI-1x considera H O N C, per tant tindrem només 4 NN amb diferents weights and biases entre si. Mateix atomic number fa servir la mateixa NN. 



### __2.2 Approximations__
Modified BPSF functions are used to generate local atomic descriptors. These functions have two contributions, $G_m^R$ and $G_m^A$ radial and angular, and include a cutoff function $f_c(R_{ij})$. As locality is used, long range contributions are missing. In this sense, only the short-range interactions are learnt by the model. 

AEVs are type differentiated (discriminate between atomic number), assigning different $m$ parameters to $G_m^R$ and $G_m^A$. This allows for better training as interactions can be modeled better. 

The radial AEV is further divided into subAEVs according to atom species. Similarly, angular AEV is further divided into subAEVs according to pairs of atom species. Each subAEV only cares about neighbor atoms of its corresponding species/pair of species. Loosely speaking, we can think of AEV as counting the number of atoms for diﬀerent species/pair of species, at diﬀerent distances and angles.


### __2.3 DL Architecture__


For the ANI-1 potential, 32 evenly spaced radial shifing parameters are used for the radial part of $G_i^X$ and a total of 8 radial and 8 angular shifing parameters are used for the angular part.

BPSF cutoff radii of 4.6 A for the radial and 3.1 A for the angular symmetry

Fully connected NN (MLP) with 768 : 128 : 128 : 64 : 1 architecture (3 hidden layers), 124 033 optimizable parameters (per atom NN), Gaussian activation function and linear (not ReLU) for the output node. Cost function és exponencial (per donar més penalty a energies molt diferents) i és respecte DFT. ADAM optimizer. max norm regularization method.

### __2.4 Dataset [TODO]__

ANI dataset is generated using Normal Mode sampling. The idea is to generate a set of data points on the potential surface, or a window, around a minima energy structure of a molecule out to some maximum energy. Calcular els normal modes i fer sampling en els normal mode vector, donat un cert displacement en la direcció ortonormal. Una vegada se sap el displacement, es fa un single-point calculation. D'aquesta forma el NMS genera un conjunt d'estructures, sobre les quals es fa un càlcul d'estructura electrònica. perturbing the equilibrium structure along these normal modes out to a maximum energy


Traning data via Normal Mode Sampling (NMS) method for generating molecular conformations. 

training to trajectories can bias the fitness of a model to the speci c trajectory used for training.


The accuracy of any empirical potential, especially an ANI potential, is highly dependent on the amount, quality of, and types of interactions included in the data used to train the model. 

organic molecules with four atom types: H, C, N, and O. We also restrict our data set to near equilibrium conformations. 

Data sets have been developed38 with a similar search of chemical space, however, these data sets only cover con gurational space and not conformational space, which is a requirement for training an ANI class potential

choose uB97X,39 the hybrid meta-GGA DFT functional, with the 6-31G(d) basis set as reference QM data

molecules up to 8 atoms of C, N, and O have been computed. 

sampling of each molecule's potential surface around its equilibrium structure. 

**ANI-2x** automated active learning algorithm to unfeasible for high throughput studies on databases of small include the elements S, F, and Cl. 

To generate nonequilibrium conformations, we employed dimer sampling, normal-mode sampling, N trajectory molecular dynamics sampling, and ML-driven torsion sampling in all iterations of active learning

Puc fer servir RDKit per generar conformers i després fer optimització. 

> This is a blockquote

<div class="alert alert-block alert-success">
<b>Success:</b> This alert box indicates a successful or positive action.
</div>

<div class="alert alert-block alert-warning">
<b>Example:</b> Use yellow boxes for examples that are not inside code cells, or use for mathematical formulas if needed. Typically also used to display warning messages.
</div>

<div class="alert alert-block alert-info">
<b>Tip:</b> Use blue boxes (alert-info) for tips and notes.</div>


<div class="alert alert-block alert-danger">
<b>Danger:</b> This alert box indicates a dangerous or potentially negative action.
</div>

# __ORB Forcefield__

## __1. Overview of the model__
A family of universal interatomic potentials for atomistic modelling of materials. It is based on a scalable graph neural network architecture which learns the complexity of atomic interactions and their invariances from data, rather than using architecturally constrained models which respect rotational equivariance or particular group symmetries.

Additionally, it includes a model with learned dispersion corrections, making it suitable for accurately modeling materials where Van der Waals forces play a significant role — such as layered materials, molecular crystals, or systems with weak intermolecular interactions.

## __2. In depth exploration__

### __2.1 Descriptors__
Only coordinates. It is a GNN, which similarly to a CNN, are independent on the input size due. Each node is an atom with the edges being other atoms within a certain cutoff. 

### __2.2 Approximations__
Orb is non-conservative, it directly predicts forces rather than computing the gradient of an energy function. Conservative models like MACE automatically generate force predictions that have zero net force and (for non-periodic systems) zero net torque. Strictly speaking being conservative is not sufficient for these properties; zero net force also requires translation invariance and zero net torque requires rotational equivariance. Computing gradients of energy functions is not intrinsically necessary for stable MD; it is just one strategy to ensure certain desirable properties for the force predictions.

Forces are not computed as analytical derivatives of the energy.  We do not implement our forcefield as a conservative vector field, forces are predicted 'as is' from the geometric configuration. This does require two adjustments to ensure net zero force and torque: two constraints: net zero force, net torque removal (this last adjustment is only applied to non-pbc systems).

Invariance/equivariance with respect to rotation is often considered to be a critical property of such models. Many popular UIPs, such as MACE respect these symmetries by construction, but it requires a unique set of design rules.Learned equivariance has been demonstrated to be a function of model size, accuracy and architecture. Unconstrained models can both learn fundamental invariances from data whilst also
being more performant.

The idea is to approximate behaviors and properties of both conservative forcefields and models which are equivariant by construction, whilst exploiting the substantial benefits that come with relaxing these
assumptions

NTR (net torque removal) is inexpensive and has a great impact on MD quality. 

By removing the mean force vector from force predictions (Eq. 1), we ensure the net force act- ing on the system is zero - a property of a conservative vector field. By removing torque from predicted forces (Section 6.3), we implement an equivalent constraint to an irrotational forcefield, again a property of conservative forcefields.

We produce models which are approximately roto-invariant/equivariant. Importantly, we can achieve these properties (or good approximations to these properties) without complex architectural constraints or expensive gradient computations


### __2.3 DL Architecture__
The artchitecture used is a non-equivariant (non-conservative) Graph Network Simulator augmented with a smoothed graph attention mechanism, where messages between nodes are updated based on both attention weights and distance-based cutoff functions.

- Nodes.
    - Vector embeddings of the atomic type. Node embeddings explicitly do not contain absolute position information to preserve translational invariance
- Edges
    - Only atoms within a pbc-aware cutoff-radius from each other are connected
- PBCs
    - We do not provide the unit cell as an input to the model; it is only implicit through the edge construction

Three main parts of the NNP model: Encoder, Processor, and Decoder. The decoder includes 3 independent MLPs, one for energies, forces and stress tensor each. 

Training consists of two parts, 1. Denoising Diffusion models, which is used in a second step to initialize a Neural Network Potential which predicts the energy, forces and stress of systems, trained in a supervised manner. This second step involves finetunning the base diffusion model on energies, forces and stresses of DFT optimization trajectories.

Loss function a sum of energy loss, force loss and tensor loss. The energy loss is optionally scaled by a weight λE to balance its
contribution

### __2.4 Dataset__
Dataset comprised of minimum energy configurations of materials, regardless of exchange-correlation functional, DFT software implementation or correction. Our motivation for pretraining is to provide an approximate base model which has broad materials coverage in terms of atomic type, materials class, symmetry groups and usage domain. Combining disparate datasets in this way is possible because all we require are atomic positions and unit cells all other properties and metadata are ignored, and it is usually these properties (such as potential energy) that are incompatible across datasets.

The first diffusion pretraining step only require the coordinates, not the energies. 

For finetuning a UIP (universal interatomic potential), data quality and consistency is of critical importance. MPtraj and Alexandria can be used together because both use PBE exchange-correlation functionals. PBE is a semi-local functional which does not capture long range dispersion interactions. To account for long range intermolecular interactions, additional corrections such as D3 can be applied (additive correction, possible to apply it at inference time). Orb-D3 models are trained on D3 corrected data, rather than performing the D3 correction a posteriori. 

Our use of generative pretraining alleviates this issue by supporting the combined use of all available DFT datasets, irrespective of level-of-theory

# __SchNetPack__

## __1. Overview of the model__


## __2. In depth exploration__

### __2.1 Descriptors__

### __2.2 Approximations__

### __2.3 DL Architecture__

### __2.4 Dataset [TODO]__

# __Final model comparison__
Schematic overview of the architecture, 