Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 13 additions & 9 deletions lambench/metrics/results/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Overview

Large atomic models (LAM), also known as machine learning interatomic potentials (MLIPs), are considered foundation models that predict atomic interactions across diverse systems using data-driven approaches. LAMBench is a benchmark designed to evaluate the performance of such models. It provides a comprehensive suite of tests and metrics to help developers and researchers understand the accuracy and generalizability of their machine learning models.
Large atomistic models (LAM), also known as machine learning interatomic potentials (MLIPs), are considered foundation models that predict atomic interactions across diverse systems using data-driven approaches. **LAMBench** is a benchmark designed to evaluate the performance of such models. It provides a comprehensive suite of tests and metrics to help developers and researchers understand the accuracy and generalizability of their machine learning models.

## Our mission includes

Expand All @@ -26,7 +26,7 @@ Figure 1: Generalizability on force field prediction tasks, 1 - $\bar{M}^m_{FF}$
<!-- scatter plot -->
Figure 2: Accuracy-Efficiency Trade-off, $\bar{M}^m_{FF}$ vs $M_E^m$.

# LAMBench Metrics Calculations
# LAMBench Metrics Calculation

## Generalizability

Expand Down Expand Up @@ -81,12 +81,16 @@ In contrast, an ideal model that perfectly matches Density Functional Theory (DF

### Domain Specific Property Calculation

For the domain-specific property tasks, we adopt the MAE as the error metric.
In the Inorganic Materials domain, the MDR phonon benchmark predicts maximum phonon
frequency, entropy, free energy, and heat capacity at constant volume, with each prediction type assigned a weight of 0.25.
In the Molecules domain, the TorsionNet500 benchmark predicts the torsion profile energy, torsion barrier height, and the number of molecules for which the model's prediction of the torsional barrier height has an error exceeding 1 kcal/mol.
Each prediction type in this domain is assigned a weight of $\frac{1}{3}$.
The resulting score is denoted as $\bar M^{m}_{PC}$.
For the domain-specific property calculation tasks, we adopt the MAE as the primary error metric.

In the Inorganic Materials domain, the MDR phonon benchmark predicts the maximum phonon frequency, entropy, free energy, and heat capacity at constant volume, while the elasticity benchmark evaluates the shear and bulk moduli. Each prediction type
is assigned an equal weight of $\frac{1}{6}$.

In the Molecules domain, the TorsionNet500 benchmark evaluates the torsion profile energy, torsional barrier height, and the number of molecules for which the predicted torsional barrier height error exceeds 1 kcal/mol. The Wiggle150 benchmark assesses the relative conformer energy profile. Each prediction type in this domain is assigned a weight of 0.25.

In the Catalysis domain, the OC20NEB-OOD benchmark evaluates the energy barrier, reaction energy change (delta energy), and the percentage of reactions with predicted energy barrier errors exceeding 0.1 eV for three reaction types: transfer, dissociation, and desorption. Each prediction type in this domain is assigned a weight of 0.2.

The resulting error metric after averaging over all domains is denoted as $\bar M^{m}_{PC}$.

## Applicability

Expand Down Expand Up @@ -122,4 +126,4 @@ The final instability metric is computed as the average over all nine structures

$$M^m_{\mathrm{IS}} = \frac{1}{9}\sum_{i=1}^{9} M^m_{\mathrm{IS},i}$$

This result is bounded within the range $[0, +\infty]$, where a lower value signifies greater stability.
This result is bounded within the range $[0, +\infty)$, where a lower value signifies greater stability.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ classifiers = [
]

[project.optional-dependencies]
deepmd = ["deepmd-kit @ git+https://github.com/iProzd/deepmd-kit.git@P0131_dpa3_optim_angle#egg=deepmd-kit", "torch>=2.1"]
deepmd = ["deepmd-kit==3.1.0", "torch>=2.1"]
fairchem = ["fairchem-core", "torch_geometric", "pyg_lib", "torch_scatter", "torch_sparse", "torch_cluster", "torch_spline_conv"] # See README
mace = ["mace-torch"]
mattersim = ["mattersim >= 1.1.1"]
Expand Down