Skip to content

abukharin3/asteroid

Repository files navigation

ASTEROID: Machine Learning Force Fields with Data Cost Aware Training

Reference implementation of Machine Learning Force Fields with Data Cost Aware Training, accepted to ICML 2023. This codebase is based on GemNet.

alt text

Preparing the data

Make a directory called raw_data and download RMD17 as well as the relevant CCSD(T) data from SGDML

To process the data, we provide some helper function in process.py
python process.py

We also provide data we generate through MD simulation using empirical force field methods. These datasets (containing data for each of the MD17 molecules) can be downloaded from here.

Pre-training

To pre-train GemNet with the simplest version of ASTEROID, we can run:

mkdir model_dir

bash scripts/aspirin_pretrain_simple.sh

The model_path argument needs to be changed depending on which checkpoint you use. Notice that the model_name in get_predictions.py and the load_name in pretrain_asteroid.py should correspond to one another.

Fine-tuning

For fine-tuning with randomly inititalized model do
bash scripts/aspirin_base_200.sh

To finetune GNN's pre-trained with ASTEROID, do
bash scripts/aspirin_finetune.sh

Cite

Please cite our paper and GemNet if you use the model or this code in your own work:

@inproceedings{gasteiger_gemnet_2021,
  title = {GemNet: Universal Directional Graph Neural Networks for Molecules},
  author = {Gasteiger, Johannes and Becker, Florian and G{\"u}nnemann, Stephan},
  booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
  year = {2021}
}

@article{bukharin2023machine,
  title={Machine Learning Force Fields with Data Cost Aware Training},
  author={Bukharin, Alexander and Liu, Tianyi and Wang, Shengjie and Zuo, Simiao and Gao, Weihao and Yan, Wen and Zhao, Tuo},
  journal={arXiv preprint arXiv:2306.03109},
  year={2023}
}

About

Code for the ICML 2023 paper: Machine Learning Force Fields with Data Cost Aware Training

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published