MLFF Framework

Overview

This repository provides a framework to train and evaluate models for machine learning force fields (MLFF). Machine learning force fields (MLFF) have gained significant attention as a candidate for large-scale molecular dynamics (MD) simulations. MLFF aims to achieve the precision comparable to DFT-based simulations and relieve their computational cost.

Additionally, in this repository, two rich datasets for important semiconductor thin film materials silicon nitride (SiN) and hafnium oxide (HfO) are introduced to foster the development of MLFF for the semiconductors. We conducted DFT simulations with various conditions that include initial structures, stoichiometry, temperature, strain, and defects, resulting in the cost of 2.6k GPU days. Even so, the MD simulations exhibit an enormously wide range of atomic configurations with high degree of freedom. Hence, to properly evaluate simulation performance of MLFF models, it is important to assess extrapolation capability to yield reliable predictions for configurations that are absent from the training dataset. To this end, we employ the evaluation for out-of-distribution datasets.

To start off, we present the experimental benchmark results of 10 MLFF models evaluated using six metrics, out of which five consider the simulation performance. It is with great joy that we can announce the acceptance of our paper "Benchmark of Machine Learning Force Fields for Semiconductor Simulations: Datasets, Metrics, and Comparative Analysis" into NeurIPS 2023 Datasets and Benchmark Track.

Datasets

Download

Our semiconductor datasets (SiN and HfO) can be downloaded from the following links.

The two raw files consist of all the snapshots observed in DFT-based simulation run by VASP. The others include snapshots sampled from the raw files according to the rule described in our paper. In this benchmark, models were trained by using these two files, SiN.tar and HfO.tar.

Anyway, welcome the feedback about the dataset split and any insight of our rich semiconductor datasets.

# extract tar files at the datasets directory
cd datasets
tar xf SiN.tar
tar xf HfO.tar

# optional
rm SiN.tar
rm HfO.tar

The extracted dataset files have the extended-xyz format, whose file extension is .xyz.

Preprocesing

After installing the framework as below, data preprocessing is available.
As in OCP, datasets should be .lmdb when training models.
This link provides how to convert the .xyz into .lmdb.

How to Use the Framework

First, we explain about how to set up the environment for this framework, and then give guidelines to operate five functionals through main.py, named fit-scale, train, validate, run-md, and evaluate. For using more arguments of these functionals, it would be helpful to see scripts/ and configs/.

1. Installation

After following the instructions below, users can install the framework and perform MLFF benchmarks.

git clone https://github.com/SAITPublic/MLFF-Framework.git
cd MLFF-Framework

From now on, the base working directory is the inside of MLFF-Framework/.

By the following instructions, the packages related to MLFF models and MD simulation are downloaded (git clone).

git submodule init
git submodule update

We modify OCP and auto-FOX, which are located in codebases/, with minor modifications.
To enable users apply the modifications, we provide two patch files and the following instructions.

# auto-FOX
cd codebases/auto-FOX
git apply ../patches/auto-FOX-custom.patch
pip install .

# OCP
cd ../ocp
git apply ../patches/ocp-scn-custom.patch

Note : Any other MLFF package can be compatible with our framework if some requirements are satified as follows.

The wrapper for models supported by the package should be implemented (see src/common/models).
If the package is located at codebases/, sys.path should include its path (see main.py).
If data format used by models is different from that of OCP, data that is loaded from .lmdb (prepared by our script) should be converted into the data format of the package (see src/common/collaters/).
If some training conditions need to be handled, a tailored trainer class should be implemented (see src/common/trainers/)

2. Training MLFF Models

Before training and validating models, please prepare .lmdb files. For detailed guidlines, please refer to the links provided below.

Fit scale (optional for GemNet)
Train
Validate

3. Evaluating Trained Models

After training models, users can evaluate the models through our six metrics described in our paper, which are important to appropriately evaluate the performance of models in the simulations: errors of energy and force prediction, radial distribution function, angular distribution function, bulk modulus, equilibrium volume, and potential energy curves.

Evaluate Prediction Performance of Energy and Force : The functionality is same with the Validate, but the extended-xyz format (.xyz) is used as input and the snapshots are inferred one-by-one.
Run MD simulation
Evaluate using Simulation Indicators

4. Helpful Scripts For NequIP and Allegro

Materials Intelligence Group at Harvard University provides the pair codes for NequIP and Allegro to be compatible with LAMMPS, one of the most popular MD simulation tools. To prepare the pair from the trained models of NequIP and Allegro through the framework, we provide some helpful scripts.

Acknowledge and Reference Code

OCP github
NequIP github
Allegro github
MACE github
SIMPLE-NN github

Citing Our Datasets and MLFF Framework

If you use these datasets and/or this repository for research purposes, please cite our work as follows:

@article{sait_mlff_framework_sin_hfo,
    title = {Benchmark of Machine Learning Force Fields for Semiconductor Simulations: Datasets, Metrics, and Comparative Analysis},
    author = {Geonu Kim, Byunggook Na, Gunhee Kim, Hyuntae Cho, Seungjin Kang, Hee Sun Lee, Saerom Choi, Heejae Kim, Seungwon Lee, Yongdeok Kim},
    journal = {Advances in Neural Information Processing Systems},
    year = {2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
asset		asset
codebases		codebases
configs		configs
datasets		datasets
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLFF Framework

Overview

Datasets

Download

Preprocesing

How to Use the Framework

1. Installation

2. Training MLFF Models

3. Evaluating Trained Models

4. Helpful Scripts For NequIP and Allegro

Acknowledge and Reference Code

Citing Our Datasets and MLFF Framework

About

Releases

Packages

Contributors 3

Languages

License

SAITPublic/MLFF-Framework

Folders and files

Latest commit

History

Repository files navigation

MLFF Framework

Overview

Datasets

Download

Preprocesing

How to Use the Framework

1. Installation

2. Training MLFF Models

3. Evaluating Trained Models

4. Helpful Scripts For NequIP and Allegro

Acknowledge and Reference Code

Citing Our Datasets and MLFF Framework

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages