# Learning many-body Bayesian force fields on the fly: A tutorial introduction to the FLARE code

Materials Intelligence Research Group

Tutorial created by Cameron Owen, Yu Xie, and Jonathan Vandermause

<center><img src="https://github.com/mir-group/FLARE-Tutorials/raw/master/OTF/logo.png" width="60%"></center>

<img src="https://github.com/mir-group/FLARE-Tutorials/raw/master/OTF/Team.png" width="100%">

## What will you learn in this tutorial?

  * **Theory review**: what is FLARE? ACE descriptors and sparse Gaussian Processes (SGPs).
  * **Active learning (online training) of a Bayesian force field**: Use the uncertainties of the SGP to train a force field on the fly using the [Flare](https://github.com/mir-group/flare) code with the ASE MD engine (LAMMPS tutorial can be found [here](https://colab.research.google.com/drive/1qgGlfu1BlXQgSrnolS4c4AYeZ-2TaX5Y?usp=sharing)).
  * **Restart an active-learning**: to continue an interrupted or finished on-the-fly training.
  * **Warm start**: start a new on-the-fly training from a previously trained SGP model.
  * **Model instantiation parameter testing**: discuss testing static model parameters prior to training the final force field.
  * **Offline training**: train the final force field "offline" using previously generated DFT training frames, without running MD. Also create files to be used for uncertainty quantification during production MD.


## An example of training workflow


In this tutorial, we will use aluminum (Al) as a simple example to demonstrate the usage of all different training cases. These scripts can then be modified for your own trainings for more complex systems. To allow for minimal confusion, we provide detailed comments in each script to address inclusion of more species.

To understand the overall workflow, an example is provided below for training a reactive Bayesian force field for the H/Pt system, as presented in our publications [1](https://arxiv.org/abs/2106.01949) and [2](https://arxiv.org/abs/2204.12573).

![](https://mermaid.ink/svg/pako:eNp90MFqwzAMBuBXET4lkDLYMYdBt6bNYYey7eiLmsiJmOMUW2GMpO8-p14vPdQngT9J6J9VM7akStV5PPfw_qEdxLfN6mfoMOSw2bwsxlPoIQh6WaDKttZCi4J5sq_ZUeA02e8rhjutnRYtb6sJFk8PzC6rn6JiJ-QNNvSAVulrNMayIxCP7BbYz-xYGC2Y0TcEhsm2l9SxTx0_6IfbrEPa13PXk4evAo55soerrWfD7m6WKtRAfkBuY2Lzer5W0tNAWpWxjJSCaKXdKqdzDImqlmX0qjRoAxUKJxk_f12jSvET3dCOMeY__KvLHwOth6I)

# 1 Installation

> **NOTE: if you want to install flare on your machine, please follow our [documentation](https://mir-group.github.io/flare/installation/install.html) instead of the steps below!**

Let's begin by installing the `flare` and dependencies. This will take a few minutes, so make sure to run the blocks right away. While this block is running, you can proceed to the next section, where the underlying theory of the FLARE model is introduced.

## 1.1 Install conda

We need `conda` to install some necessary dependencies to compile the code. Google Colab does not have `conda`, so we first install `conda`.
> Running the cell below, you will see a warning "Your session crashed for an unknown reason" in the lower left corner, which is normal.

However, if your machine has already installed `conda`, you can head to the next cell directly. If your machine does not have `conda`, you can follow the [instruction here](https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html) to install.

In [None]:
# We don't need mkl and the inherent mkl will break down our code installation, so we uninstall it
!pip uninstall -y mkl mkl-devel mkl-include
!sudo apt remove *mkl*

# Install conda in this Colab environment
!pip install -q condacolab
import condacolab
condacolab.install()

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'libmkl-intel-thread' for glob '*mkl*'
Note, selecting 'libmkl-gf-lp64' for glob '*mkl*'
Note, selecting 'libmkl-blacs-intelmpi-lp64' for glob '*mkl*'
Note, selecting 'intel-mkl-linktool' for glob '*mkl*'
Note, selecting 'libmkl-scalapack-lp64' for glob '*mkl*'
Note, selecting 'libmkl-gf-ilp64' for glob '*mkl*'
Note, selecting 'libmkl-gnu-thread' for glob '*mkl*'
Note, selecting 'libmkl-avx2' for glob '*mkl*'
Note, selecting 'libmkl-full-dev' for glob '*mkl*'
Note, selecting 'libmkl-vml-mc' for glob '*mkl*'
Note, selecting 'intel-mkl-cluster' for glob '*mkl*'
Note, selecting 'libmkl-mc' for glob '*mkl*'
Note, selecting 'libmkl-rt' for glob '*mkl*'
Note, selecting 'libmkldnn-dev' for glob '*mkl*'
Note, selecting 'php-srmklive-flysystem-dropbox-v2' for glob '*mkl*'
Note, selecting 'libmkl-pgi-thread' for glob '*mkl*'
Note, selecting 'libmkl-cluster-dev' for glob '*mkl*'
Note, 

## 1.2 Install FLARE and dependencies

We suggest you to install flare in a new conda environment, which you can do `conda create --name flare`. However, since Google Colab can not create a new conda environment, here we will install it in the `base` environment.

We use `conda` to install compilers and dependencies for flare including `gcc`, `cmake` etc. And then we download the flare code from Github repository and do `pip install`.

<!-- Install a working copy of lapack/lapacke.
! pip uninstall -y mkl
! sudo apt install liblapacke liblapacke-dev
Install ase
! pip install ase
Switch the Colab C++ compiler to g++-9.
! sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
! sudo apt update
! sudo apt install gcc-9 g++-9
! update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 50
! update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 50
!git clone -b development https://github.com/mir-group/flare.git
!cd flare && pip install . -->

In [None]:
# Create a new conda environment to install flare
#!conda create --name flare python=3.8
#!conda activate flare

# Use conda to install compilers and dependencies for flare
!conda install -y gcc gxx cmake liblapacke openblas -c conda-forge

# Download flare code from github repo
!git clone -b 1.3.3 https://github.com/mir-group/flare.git

# Pip install flare
%cd flare
!pip install -U ipython
!pip install .
%cd /content/
# !pip install mir-flare

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 23.3.1
  latest version: 23.9.0

Please update conda by running

    $ conda update -n base -c conda-forge conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.9.0



## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - cmake
    - gcc
    - gxx
    - liblapacke
    - openblas


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-3.1.4              |       hd590300_0         2.5 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following packages will be UPDATED:

  openssl                                  3.1.3-hd590300_0 --> 3.1.4-hd590300_0 



Downloading and Extracting Packages
             

In [1]:
import flare
import flare.bffs.sgp
print(flare.__file__)

/home/andi/Flare/MyScript/flare/__init__.py




# 2 How to set up the machine learning force field of FLARE?

## 2.1 Descriptors

In FLARE, we consider the local model where the total energy is decomposed into atomic energies: $E_{total}=\sum_i E_i$. And the atomic energy $E_i$ of an atom $i$ only relies on the geometry of its neighbor atoms within a hard cutoff $r_{cut}$.

To describe the local environment around an atom $i$, we choose a set of descriptors, which are functions of the bonds between atoms.

Our approach follows the Gaussian Approximation Potential framework first introduced in Ref. [4] (see [5] for an excellent introduction), with a multi-element generalization of the Atomic Cluster Expansion [6] used to build rotationally-invariant many-body descriptors of local atomic environments.

<img src="https://github.com/mir-group/FLARE-Tutorials/raw/master/APS-2020/mb_descriptors2.png" width="100%">

We'll use the $B_2$ descriptor from the Atomic Cluster Expansion, which requires us to define:

*   The cutoff function and radius.
*   The number of radial basis functions (radial resolution).
*   The number of spherical harmonics (angular resolution).

These are chosen by the user, and it's generally a good idea to check how different choices influence the model accuracy. We provide a script at the end of this tutorial to do just this prior to final training of the model, see [the last section](#scrollTo=5_Evaluate_Model_Instantiation_Parameters).

The parameters provided below have been found to provide reasonable results for most systems, where the only difference for systems with more chemical species would be the dimension of the cutoff matrix, where an example is also provided below.

```yaml
    descriptors:
        - name: B2                    # Atomic Cluster Expansion (ACE) descriptor from R. Drautz (2019). FLARE can only go from B1 up to B3 currently.
          nmax: 8                     # Radial fidelity of the descriptor (higher value = higher cost)
          lmax: 3                     # Angular fidelity of the descriptor (higher value = higher cost)
          cutoff_function: quadratic  # Cutoff behavior
          radial_basis: chebyshev     # Formalism for the radial basis functions
          cutoff_matrix: [[5.0]]      # In angstroms. NxN array for N_species in a system.
#         cutoff_matrix: [[5.0,5.0],[5.0,5.0]]      # the order of the matrix corresponds to the list of chemical species (e.g., matrix element 1,1 is the cutoff for element 1 interacting with itself)
```

## 2.2 Kernel

Next, we define the kernel function, which quantifies the similarity between two atomic environments. We'll use a simple normalized dot product kernel:
\begin{equation}
k(\vec{d}_1, \vec{d}_2) = \sigma^2 \left(\frac{\vec{d}_1 \cdot \vec{d}_2}{d_1 d_2}\right)^2.
\end{equation}

This kernel has proven reliable for a variety of systems, but users are welcome to select from a variety of kernels, as provided in the `flare/kernels` folder in the github, or implement their own.

```yaml
    kernels:
        - name: NormalizedDotProduct                                            # select kernel for comparison of atomic environments
          sigma: 2.0                                                            # signal variance, this hyperparameter will be trained, and is typically between 1 and 10.
          power: 2                                                              # power of the kernel, influences body-order
```


## 2.3 Sparse Gaussian Process Regression (SGP)

Unlike linear regression or neural networks, Gaussian process regression is
- **a non-parametric method**, i.e. it does not have a huge number of parameters to optimize,
- **data efficient** from the smooth interpolation, and
- **inherently with an uncertainty metric** from the variance of the Gaussian distribution

<img src="https://github.com/mir-group/FLARE-Tutorials/raw/master/APS-2020/gpff2.png" width="100%">

With the kernel object defined, we can construct a sparse GP object. To do this, we need to choose noise values for each type of label that we will learn from: energies, forces, and stresses. It's a good idea to initialize these values to the expected error level for each quantity.

```yaml
    energy_noise: 0.1                                                           # Energy noise hyperparameter, will be trained later. Typically set to 1 meV * N_atoms.
    forces_noise: 0.05                                                          # Force noise hyperparameter, will be trained later. System dependent, typically between 0.05 meV/A and 0.2 meV/A.
    stress_noise: 0.001                                                         # Stress noise hyperparameter, will be trained later. Typically set to 0.001 meV/A^3.
    species:
        - 13                                                                    # Atomic number of your species (here, 13 = Al). To add additional species, insert a new line below this one in the same format (e.g., - A#), where A# is the atomic number of the second species.
    single_atom_energies:
        - 0                                                                     # Single atom energies to bias the energy prediction of the model. Can help in systems with poor initial energy estimations. Length must equal the number of species.
    variance_type: local                                                        # Calculate atomic uncertainties.
    max_iterations: 20                                                          # Maximum steps taken during each hyperparameter optimization call. Can sometimes be helpful to increase this value if hyperparameter optimization is unstable.
    use_mapping: True                                                           # Print mapped model (ready for use in LAMMPS) during trajectory. Model is re-mapped and replaced if new DFT calls are made throughout the trajectory.
```

We now compute the descriptors and descriptor gradients of the training and validation structures and assign force labels to the training structures.

> Note: `sigma`, `energy_noise`, `forces_noise` and `stress_noise` are trainable parameters, by maximizing marginal log likelihood.

Finally, we train the SGP and check its performance on the validation set as more data is added. When we add structures to the SGP, we need to choose which environments get added to the sparse set. In this example, we'll use the SGP uncertainties to select the atomic environments in an online fashion during molecular dynamics.

# 3 How to set up active learning workflow

<center><img src="https://github.com/mir-group/FLARE-Tutorials/raw/master/OTF/Active.png" width="30%"></center>

The active learning framework in FLARE is provided above, where we can observe the coupled MD and DFT loops, driven by uncertainty quantification of atomic environments. Briefly,

1. MD is run with FLARE force field. At each MD step, each atomic environment is provided an uncertainty.

2. The maximal atomic uncertainty is compared to a predetermined threshold set by the user.

3. If a single atomic environment lies above this threshold, the MD is interrupted and the DFT is called on the current configuration.
    
    Otherwise, continue MD, go back to step (1).

4. After DFT is done, high uncertainty environments are added to the training (sparse) set. The SGP model is then retrained.

5. Resume MD, go back to step (1).

```yaml
otf: # On-the-fly training and MD
    mode: fresh                                                                 # Start from an empty SGP
    md_engine: VelocityVerlet                                                   # Define MD engine, here we use the Velocity Verlet engine from ASE. LAMMPS examples can be found in the `flare/examples` directory in the repo
    md_kwargs: {}                                                               # Define MD kwargs
    initial_velocity: 1000                                                      # Initialize the velocities (units of Kelvin)
    dt: 0.001                                                                   # Set the time step in picoseconds (1 fs here)
    number_of_steps: 100                                                        # Total number of MD steps to be taken
    output_name: Al                                                             # Name of output
    init_atoms: [0]                          # Initial atoms to be added to the sparse set
    std_tolerance_factor: -0.01                                                 # The uncertainty threshold above which the DFT will be called. This value is typically scaled with the number of species (e.g., -0.05 for 2 species, -0.1 for 3, etc.)
    max_atoms_added: -1                                                         # Allow for all atoms in a given frame to be added to the sparse set if uncertainties permit
    train_hyps: [20,200]                                                        # Define range in which hyperparameters will be optimized. Here, hyps are optimized at every DFT call after the 5th call.
    write_model: 4                                                              # Verbosity of model output.
    update_style: threshold                                                     # Sparse set update style. Atoms above a defined "threshold" will be added using this method
    update_threshold: 0.001                                                     # Threshold for adding atoms if "update_style = threshold". Threshold represents relative uncertainty to mean atomic uncertainty, where atoms above are added to sparse set. This value is typically set to be 0.1*std_tolerance_factor.
    force_only: False
    wandb_log: <project_name>                                                   # Monitor the training by Weight and Bias (https://wandb.ai). Default is None, and results will not be uploaded to wandb.
```

This methodological advancement allows for a drastic increase in computational efficiency relative to *ab initio* methods, where DFT would typically be computed at every timestep.

After the training is finished, we map the SGP model onto an equivalent but much faster polynomial model, making FLARE models superior in accuracy relative to empirical methods, and competitive in speed and scaling.

> **Note**: Here we use the MD engines provided by ASE, including VelocityVerlet, NVTBerendsen, NPTBerendsen, NPT and Langevin only. If you want more robust and variety of MD engines (`fix/nvt`, `fix/npt`...) and operations (`group`, `region`, `compute`...) on atoms, check out our tutorial on [FLARE active learning with LAMMPS](https://colab.research.google.com/drive/1qgGlfu1BlXQgSrnolS4c4AYeZ-2TaX5Y#scrollTo=Bayesian_active_learning_with_FLARE_and_LAMMPS_MD)

## 3.1 Preprocessing

To start an on-the-fly training from scratch, you only need three files:
1. A data file including the initial atomic structure, can be .xyz, POSCAR, LAMMPS data, etc.
2. A yaml file that set up parameters of the GP model and training
3. *A job submission script to the supercomputer

### Prepare an initial structure

We can now import everything else we will need for the tutorial, where we will first perform "on-the-fly" active learning on a slab of pure aluminum using ASE dynamics.

Note: in order to visualize the structure, zoom out in the window below this code block.

In [2]:
# Import numpy and matplotlib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

matplotlib.rc('font', size=12)
matplotlib.rcParams['figure.dpi'] = 100

# ASE imports
from ase import Atoms, units
from ase.build import supercells
from ase.visualize import view
from ase.build import fcc111, add_adsorbate
from ase.io import read, write

# Create a slab with an adatom.
atoms = fcc111("Al", (4, 4, 6), vacuum=10.0)
add_adsorbate(atoms, "Al", 2.5, "ontop")
n_atoms = len(atoms)

write("init_Al.xyz", atoms)

view(atoms, viewer='x3d')



### Set up training with a yaml file

Now that we have all of our necessary imports and have built and visualized our initial geometry, we can now construct our yaml input script for FLARE "on-the-fly" active-learning using the code snippets provided above. This training starts from an empty SGP model, where no atomic environments are initially in the sparse set, and we are using an aluminum EAM potential as our surrogate DFT method that will provide our ground-truth labels.

In [4]:
# Download an aluminum EAM potential from the NIST potential database.
# ! git clone https://github.com/mir-group/FLARE-Tutorials.git

# ! git clone https://github.com/andim53/FLARE-Tutorials.git
! mkdir Al_otf/
! cp init_Al.xyz Al_otf/
! cp FLARE-Tutorials/OTF/otf_train.yaml Al_otf/
! cat Al_otf/otf_train.yaml

mkdir: cannot create directory ‘Al_otf/’: File exists
# Super cell is read from a file such as POSCAR, xyz, lammps-data
# or any format that ASE supports
supercell: 
    file: init_Al.xyz
    format: extxyz
    replicate: [1, 1, 1]                                                        # supercell creation. Be mindful of DFT limitations and periodicity of your cell.
    jitter: 0.1                                                                 # perturb the initial atomic positions by 0.1 A, so initial atomic environments added to the sparse set are not the same

# Set up FLARE calculator with (sparse) Gaussian process
flare_calc:
    gp: SGP_Wrapper
    kernels:
        - name: NormalizedDotProduct                                            # select kernel for comparison of atomic environments
          sigma: 2.0                                                            # signal variance, this hyperparameter will be trained, and is typically between 1 and 10.
          power: 2    

**Here is an example of setting VASP as the DFT calculator**

```yaml
dft_calc:
    name: Vasp
    kwargs:
        command: "mpirun -np <n_cpus> <vasp_executable>"
        # pseudo-potential
        xc: PBE
        # k points
        kpts: [1, 1, 1]
        # INCAR
        istart: 0
        npar: 8
        ediff: 1.0e-5
        encut: 450
        ismear: 1
        sigma: 0.2
        lreal: Auto
        prec: Accurate
        algo: Very_Fast
        lscalapack: False
    params: {}
```

**Here is an example of Quantum Espresso**

```yaml
dft_calc:
    name: Espresso
    kwargs:
        command: "mpirun -np <n_cpus> <qe_executable> -in C.pwi -out C.pwo"
        pseudopotentials:
            C: C.pz-rrkjus.UPF
        label: C
        tstress: True
        tprnfor: True
        nosym: True
        kpts: (8, 8, 8)
        input_data:
            control:
                prefix: C
                pseudo_dir: ./
                outdir: ./out
                calculation: scf
            system:
                ibrav: 0
                ecutwfc: 60
                ecutrho: 360
            electrons:
                conv_thr: 1.0e-9
                electron_maxstep: 100
                mixing_beta: 0.7
    params: {}
```

## 3.2 Training

We can now purge our directory, and run active learning using the `flare-otf` command while pointing to our yaml input script.

In [5]:
%cd Al_otf/
! flare-otf otf_train.yaml

[Errno 2] No such file or directory: 'Al_otf/'
/home/andi/Flare/MyScript/Al_otf
ERROR: The install method you used for conda--probably either `pip install conda`
or `easy_install conda`--is not compatible with using conda as an application.
If your intention is to install conda as a standalone application, currently
supported install methods include the Anaconda installer and the miniconda
installer.  You can download the miniconda installer from
https://conda.io/miniconda.html.

/bin/bash: flare-otf: command not found


### Output files
In the same folder as you run `flare-otf`, there will be multiple output files dumped during the training. We provided a table containing explanations for each below.

| File name | Description |
| --------- | ----------- |
|`<output_name>.out` | log file of the OTF training |
|`<output_name>_dft.xyz` | all the DFT computed frames and the corresponding DFT energy/forces/stress |
|  `<output_name>_md.xyz` | complete MD trajectory from the on-the-fly training |
| `<output_name>_dft.pickle`| ASE DFT calculator (used for restarting OTF) |
| `<output_name>_flare.json`| FLARE calculator with training data collected from OTF (used for restarting) |
| `<output_name>_thermo.txt`| thermal outputs from LAMMPS of the complete MD trajectory |
| `<output_name>_atoms.json`| atomic structure at the current step (ASE Atoms object) |
| `<output_name>_checkpt.json`| checkpoint file that saved the OTF information at the current step, and can be used to restart an OTF training |
| `<output_name>_ckpt_<n>` | (optional) If you set `write_model: 4`,  those folders back up the checkpoint files at step n |

## 3.3 Postprocessing

### Analyze the active learning trajectory
In the same folder that our active learning simulation is run, there will be a set of FLARE output files that contain a variety of important information which we will parse here.

The frequency of DFT calls, simulation time, and hyperparameters are contained in the flare_output_name.out file. One should parse this information, and visualize the number of DFT calls as a function of simulation time. As a result, we can get a glimpse into the performance of our model by comparing energy predictions in this same figure, by plotting the FLARE and DFT predicted potential energies as a function of simulation time.

We can also visualize the atomic uncertainties as the simulation progresses, provided below, where atoms labeled as red denote high uncertainty (and thus a call to DFT) whereas blue atoms are low uncertainty.

- To realize the visualization, we can parse the trajectory from the `<output_name>.out` log file through the `flare.io.otf_parser`. An example script is shown here: [FLARE tutorial of Python API: Analyzing the OTF trajectory](https://colab.research.google.com/drive/18_pTcWM19AUiksaRyCgg9BCpVyw744xv#scrollTo=Analyzing_the_simulation).

- If the OTF is done with `md_engine: PyLAMMPS` or an MD is simulated by LAMMPS with uncertainty, you can directly plot them from the xyz or LAMMPS dumped file:
[FLARE tutorial of LAMMPS: Color Atoms by Uncertainty](https://colab.research.google.com/drive/1qgGlfu1BlXQgSrnolS4c4AYeZ-2TaX5Y#scrollTo=Color_Atoms_by_Uncertainty).

After obtaining a trajectory with atomic uncertainty information, we can use OVITO's `color coding` to assign colors based on uncertainties.

<img src="https://github.com/mir-group/FLARE-Tutorials/raw/master/APS-2020/al.gif" width="100%">

### Get coefficient files for LAMMPS

After the training is done, you will probably want to deploy the force field to large-scale molecular dynamics in LAMMPS. To get the coefficient files for FLARE pair style in LAMMPS, you need the json file of the sparse GP calculator which can be found in the same folder as the training. In the above example, it is `Al_flare.json`. Then run the simple script below, where `lmp.flare` is the name of the resulting coefficient file, and you can put your name as the contributor.

> **NOTE**: This script is useful for constructing potential files. You should keep it and can use it after restarting an active learning trajectory, offline training, etc.

There will be three files generated in the current directory:

- `lmp.flare`: coefficient file for pair_flare energy/forces/stress
- `sparse_desc_lmp.flare` and `L_inv_lmp.flare`: coefficient files for uncertainty calculation

In [9]:
from flare.bffs.sgp.calculator import SGP_Calculator

sgp_calc, _ = SGP_Calculator.from_file("Al_otf_flare.json")
sgp_calc.build_map("lmp.flare", "name")



NameError: name 'NormalizedDotProduct' is not defined

To use FLARE in LAMMPS (including using LAMMPS for active learning and pure MD), please check our [FLARE - LAMMPS tutorial](https://colab.research.google.com/drive/1qgGlfu1BlXQgSrnolS4c4AYeZ-2TaX5Y).

## 3.4 Restart an active learning trajectory

Here, we provide an example script with which we can restart an active-learning simulation. This is useful, as one can encounter machine issues, or you can simply extend the length of your training trajectory if the first attempt was not sufficient.

> **NOTE**: Before you restart, we suggest you to either back up the previous trainining folder or run restarted training in a new folder, because some files will be overwritten.

Suppose a new folder `Al_restart` is created for restarting the training, you will need to copy the following files from the previous training (`Al_otf` folder) to the current folder:
- Al_atoms.json
- Al_checkpt.json
- Al_dft.pickle
- Al_flare.json

(Optional, only if you use LAMMPS MD in the training)
- lmp.flare
- sparse_desc_lmp.flare
- L_inv_lmp.flare

If you want to restart from a previous checkpoint (not the latest), for example, step 10. You just need to copy the four `Al_*.*` files from `Al_ckpt_10` folder into `Al_restart`.

A super simple yaml file is used for the restarting.

In [None]:
%cd /content/
! mkdir Al_restart/
! cp FLARE-Tutorials/OTF/restart.yaml Al_restart/
! cp Al_otf/*.json Al_otf/*.pickle Al_restart/
%cd Al_restart/
! cat restart.yaml

/content
mkdir: cannot create directory ‘Al_restart/’: File exists
/content/Al_restart
# Set up On-the-fly training and MD
otf: # On-the-fly training and MD
    mode: restart                                                               # Restart an active learning trajectory.
    number_of_steps: 20                                                         # set the maximum number of MD steps. This is not additive on previous steps. E.g., if you already ran a simulation for 10 steps, this restart set-up will stop at 20 total steps, not 10 + 20.
    checkpoint: Al_otf_checkpt.json                                   # Point to the checkpoint file from which you want to restart. The lmp.flare file also needs to be present in this directory.


In [None]:
!pwd
!flare-otf restart.yaml

/content/Al_restart
Precomputing KnK for hyps optimization
Done precomputing. Time: 0.03399157524108887
Hyperparameters:
[2.0e+00 9.6e-02 5.0e-02 1.0e-03]
Likelihood gradient:
[   -32.65788668    227.6963764  -27502.89431043 -28623.69076946]
Likelihood:
3039.338992420928


Hyperparameters:
[ 1.99916907  0.10179335 -0.64976511 -0.7272819 ]
Likelihood gradient:
[ -24.42265345  -55.32602544 2202.98136416   41.2479367 ]
Likelihood:
-765.8418275150224


Hyperparameters:
[ 1.99976305  0.09765209 -0.14955201 -0.20668414]
Likelihood gradient:
[ 6.12736635e+00 -3.50421565e+01  9.39601599e+03  1.45139258e+02]
Likelihood:
1357.0532647611772


Hyperparameters:
[ 1.99993442  0.09645722 -0.00522669 -0.05647728]
Likelihood gradient:
[-3.06459214e+01 -1.04573435e+02  1.41375089e+05  5.31183599e+02]
Likelihood:
5684.492664654028


Hyperparameters:
[ 1.99998114  0.09613149  0.03411729 -0.01552996]
Likelihood gradient:
[-7.06395617e+00 -2.36952695e+02 -3.97797622e+04  1.92939275e+03]
Likelihood:
3480.823

## 3.5 Warm-start a training from Previous GP

Another important method is the ability to continue an active learning trajectory from a previously trained GP. This previous GP could come from another active-learning trajectory or an offline training. Sequential training in this manner allows for a reduction in the number of non-unique atomic environments that may arise from running several active-learning trajecories in parallel. An example `yaml` script is provided below.

In [None]:
%cd /content/
! mkdir Al_warm_start/
! cp FLARE-Tutorials/OTF/warm-start.yaml Al_warm_start/Al_warm_start.yaml
! cp Al_otf/*.flare Al_warm_start/
! cp Al_otf/*flare.json Al_otf/Al_otf_dft.xyz Al_warm_start/
%cd Al_warm_start
! cat Al_warm_start.yaml

/content
mkdir: cannot create directory ‘Al_warm_start/’: File exists
/content/Al_warm_start
# Super cell is read from a file such as vasp, extxyz, lammps-data
# or any format that ASE supports
# to start from previous run, needs 3 .flare files
supercell: 
    file: Al_otf_dft.xyz 
    format: extxyz
    index: -1

# Set up FLARE calculator with (sparse) Gaussian process
flare_calc:
    gp: SGP_Wrapper
    file: Al_otf_flare.json

# call dft calculator
dft_calc:
    name: LennardJones
    kwargs: {}

# for MD
otf: # On-the-fly training and MD
    mode: fresh                                                                 # Start from an empty SGP
    md_engine: VelocityVerlet                                                   # Define MD engine, here we use the Velocity Verlet engine from ASE. LAMMPS examples can be found in the `flare/examples` directory in the repo
    md_kwargs: {}                                                               # Define MD kwargs
    initial_velocity: 

In [None]:
!pwd
!flare-otf Al_warm_start.yaml

/content/Al_warm_start
Traceback (most recent call last):
  File "/usr/local/bin/flare-otf", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/flare/scripts/otf_train.py", line 378, in main
    fresh_start_otf(config)
  File "/usr/local/lib/python3.10/site-packages/flare/scripts/otf_train.py", line 345, in fresh_start_otf
    otf.run()
  File "/usr/local/lib/python3.10/site-packages/flare/learners/otf.py", line 413, in run
    self.backup_checkpoint()
  File "/usr/local/lib/python3.10/site-packages/flare/learners/otf.py", line 936, in backup_checkpoint
    os.mkdir(dir_name)
FileExistsError: [Errno 17] File exists: 'Al_warm_ckpt_2'


# 4 Offline training from available DFT data
We will train an SGP model from the DFT labels generated from our previous "on-the-fly" active-learning trajectory as an example of FLARE's utility on existing DFT labels (e.g., if you want to build a MLFF from an existing AIMD trajectory or relaxation pathway). An "offline" active learning yaml config file is provided here, where we perform "Fake MD" on the already computed ground-truth DFT labels and build our SGP model from these frames. We also provide an example script that is able to take previously generated VASP `OUTCAR` files and convert them to `.extxyz` for easy use of the provided yaml script.

> **Note:**
>
> - The offline training syntax also changed a bit. Explain if there are multiple xx_dft.xyz files
>
> - Here we only display and explain the different yaml blocks from the online training

In the yaml file, you can keep the settings the same as the on-the-fly training, only changing the `supercell`, `dft_calc` and `md`:
```yaml
supercell:
    file: aimd_trajectory1.xyz
    format: extxyz
    index: 0
```

```yaml
dft_calc:
    name: FakeDFT                                                               # We are going to perform "FakeDFT" since our extxyz input already contains the DFT labels
    kwargs: {}
    params: {}
```

```yaml
otf:
    ...same as on-the-fly
    md_engine: Fake                                                             # Do not perform MD, just read frames sequentially
    md_kwargs:
        filenames: [aimd_trajectory1.xyz, aimd_trajectory2.xyz]
        format: extxyz
        index: ":"
        io_kwargs: {}
    train_hyps: [inf, inf]
    ...same as on-the-fly
```

> **Note:**
>
> Here we can set the `train_hyps` to be infinitely large, such that the hyperparameters are not optimized during the offline training. This will make the offline training much faster, and the selective sparse atomic environments will not be affected. The MAE numbers reported in the output file will be different from optimized hyperparameters. But we can do an optimization with a short python script after the offline training is done.

In [None]:
%cd /content/
! mkdir Al_offline/
! cp FLARE-Tutorials/OTF/offline.yaml Al_offline/Al_offline.yaml
! cp Al_restart/Al_otf_dft.xyz Al_offline/
%cd Al_offline/
! cat Al_offline.yaml

/content
mkdir: cannot create directory ‘Al_offline/’: File exists
/content/Al_offline
# Super cell is read from a file such as POSCAR, xyz, lammps-data
# # or any format that ASE supports
supercell: 
    file: Al_otf_dft.xyz                                              # Use previously generated DFT frames as input
    format: extxyz
    index: 0
    replicate: [1, 1, 1]                                                        # Do not replicate periodically
    jitter: 0.0                                                                 # Do not jitter atoms, since we our input is DFT frames

# Set up FLARE calculator with (sparse) Gaussian process                        # This section stays the same as previous
flare_calc:
    gp: SGP_Wrapper
    kernels:
        - name: NormalizedDotProduct
          sigma: 2
          power: 2
    descriptors:
        - name: B2
          nmax: 8
          lmax: 3
          cutoff_function: quadratic
          radial_basis: chebyshev
          cutoff

In [None]:
!flare-otf Al_offline.yaml

Precomputing KnK for hyps optimization
Done precomputing. Time: 5.7220458984375e-05
Hyperparameters:
[2.e+00 1.e-01 5.e-02 1.e-03]
Likelihood gradient:
[-9.99062599e-01  3.75229658e-02 -5.06231488e+03  1.33615569e+03]
Likelihood:
607.4043287671393


Hyperparameters:
[ 1.99980727  0.10000724 -0.92655663  0.25875396]
Likelihood gradient:
[-9.83190337e-01 -7.98528447e-06  3.12906439e+02 -2.31870762e+01]
Likelihood:
-250.51598182320626


Hyperparameters:
[ 1.99995028  0.10000187 -0.2019443   0.06749859]
Likelihood gradient:
[-9.98504703e-01 -2.58727565e-05  1.42489390e+03 -8.88643608e+01]
Likelihood:
198.29880650929476


Hyperparameters:
[ 1.99998855  0.10000043 -0.00799475  0.01630723]
Likelihood gradient:
[-9.99274121e-01  3.44912290e+01 -1.42627390e+05 -3.66181138e+02]
Likelihood:
427.54264151941567


Hyperparameters:
[1.99999980e+00 1.00000007e-01 4.90107121e-02 1.26111427e-03]
Likelihood gradient:
[-9.99129648e-01  3.32087356e-02 -5.13587293e+03 -1.03271904e+03]
Likelihood:
612.431140

Hence, we have built our final FLARE potential from our previously generated DFT training labels, and can now perform production-level MD.

# 5 Evaluate Model Instantiation Parameters
Prior to the final training of any SGP model, the user should evaluate the model instantiation parameters, i.e., those parameters that are not optimized during training. These parameters build the descriptor representation of atomic environments in the FLARE framework, namely, the cutoff, n$_{max}$, and l$_{max}$. Here, we recommend using a bash script to build the directory structure, and tools like `sed` to easily replace strings of interest in the `yaml` files. We recommend the following ranges of parameters, which have found to be sufficient to scan for a variety of systems: cutoff = (3.0,9.0,1.0), n$_{max}$ = (2,14,2), and l$_{max}$ = (0,6,1). The highest marginal log likelihood, as well as the corresponding force, energy and stress MAEs, are all typically used to select the best model parameters, which can be parsed from the training files once completed. Since the number of force labels is highest, relative to energies and stresses, the force MAE is usually given the most weight in selecting the highest performance model. One should also keep in mind the total cost of the model when sweeping through model instantiation parameters, which grows with the dimension of the descriptor, and is expressed as:

$n_d$ = ($n_{max}$ $\cdot$ $n_{species}$ + 1) $\cdot$ $n_{max}$ $\cdot$ $\frac{n_{species}}{2}$ $\cdot$ ($l_{max}$ + 1).

We can then select appropriate model instantiation parameters that provide a minimum in the force MAE, while keeping the cost of the descriptor minimal. We can then perform another offline training on the entire dataset generated from our consecutive active-learning trajectories to yield the final FLARE potential. This is the final step in creating a FLARE potential for your workflow, where additional MD and uncertainty quantification steps can be found elsewhere.


# 6 Validation and Test

After a model is trained, we want to know the accuracy of the model. Notice that the mean absolute errors reported during the on-the-fly/offline training are not the real test error, since those are evaluated with the models at the current steps.

Here are a few approaches that we can use for testing the potential.

1. Use the LAMMPS coefficient file of the trained potential to run an MD with LAMMPS, and randomly pick up a few frames for DFT calculations. Then the error can be calculated by directly comparing the LAMMPS and DFT energy/forces/stress.

2. With extra AIMD data that are not used for training, there are two ways:
  -  Warm start an offline training with a very large `std_tolerance_factor`, such that no frame will be added to the model during the offline training, but each frame will be predicted and the MAE will be computed.
  - Use the [ASE LAMMPS Calculator](https://wiki.fysik.dtu.dk/ase/ase/calculators/lammps.html) or the [LAMMPS Python API](https://docs.lammps.org/Python_head.html) and the coefficient file of FLARE potential to make predictions on the frames.

We can compute the MAE and make the parity plot as shown in [this example](https://colab.research.google.com/drive/1VzbIPmx1z-uygKstOYTj2Nqr53AMC5NL#scrollTo=Plot_the_Predictions).


# References

[1] Jonathan Vandermause, Yu Xie, Jin Soo Lim, Cameron J. Owen, Boris Kozinsky. Active learning of reactive Bayesian force fields: Application to heterogeneous hydrogen-platinum catalysis dynamics. https://arxiv.org/abs/2106.01949

[2] Anders Johansson, Yu Xie, Cameron J. Owen, Jin Soo Lim, Lixin Sun, Jonathan Vandermause, Boris Kozinsky. Micron-scale heterogeneous catalysis with Bayesian force fields from first principles and active learning. https://arxiv.org/abs/2204.12573

[1] S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt, K.-R. Müller. Sci. Adv. 3(5), e1603015, 2017.

[2] K. T. Schütt, F. Arbabzadah, S. Chmiela, K.-R. Müller, A. Tkatchenko. Nat. Commun. 8, 13890, 2017.

[3] S. Chmiela, H. E. Sauceda, K.-R. Müller, A. Tkatchenko. Nat. Commun. 9, 3887, 2018.

[4] Bartók, A. P., Payne, M. C., Kondor, R., & Csányi, G. (2010). Physical review letters, 104(13), 136403.

[5] Bartók, A. P., & Csányi, G. (2015). International Journal of Quantum Chemistry, 115(16), 1051-1057.

[6] Drautz, R. (2019). Physical Review B, 99(1), 014104.