# Model Ensemble Calculator

The accuracy of machine learning (ML) force field predictions depends strongly on the training distribution. This means for structures outside the training distribution, the ML models may provide in-accurate predictions. When performing molecular dynamics simulations or structure relaxations, the structures used as input for the model change and may visit regions of the chemical space, where the model is not confident.

Using ensembles of models allows to estimate the confidence of ML models for the respective structure property predictions while reducing the effect of random-like predictions outside the training distribution.

Here we introduce an calculator that allows for using model ensembles to calculate properties alongside with confidence measures. 

In [None]:
import os
import torch
import random
import shutil
import time
import numpy as np
from tqdm import tqdm 
from matplotlib import pyplot as plt
from typing import Optional, List, Union, Dict

import ase
from ase.io import read, write
from ase import Atoms
from ase.optimize.lbfgs import LBFGS
from ase.optimize import QuasiNewton

import schnetpack as spk
from schnetpack import properties
from schnetpack.interfaces.ase_interface import SpkEnsembleCalculator, SpkCalculator
from schnetpack.interfaces.batchwise_optimization import ASEBatchwiseLBFGS, BatchwiseCalculator

First we define the calculator. It requires a list of models (model paths or loaded models) and a neighbor list.

In [None]:
model_path_0 = "/home/jonas/Documents/schnetpack/tests/testdata/md_ethanol.model"
model_path_1 = "/home/jonas/Documents/schnetpack/tests/testdata/md_ethanol.model"

# set device
device = torch.device("cuda")

# define neighbor list
cutoff = 5.0
nbh_list=spk.transform.MatScipyNeighborList(cutoff=cutoff)

# Set up calculator
calculator = SpkEnsembleCalculator(
    model=[model_path_0, model_path_1],
    neighbor_list=nbh_list,
    energy_unit="kcal/mol",
    position_unit="Angstrom",
    device=device,
)

Subsequently, we define an intial ethanol structure. This is achieved by random distortion starting from a relaxed structure loaded from the ```input_structure_file```

In [None]:
input_structure_file = "../../tests/testdata/md_ethanol.xyz"
random.seed(42)

# Setup directory
if not os.path.exists('howto_ensemble_relaxations_outputs'):
    os.makedirs('howto_ensemble_relaxations_outputs')
working_dir = "howto_ensemble_relaxations_outputs/relax"
if os.path.exists(working_dir):
    shutil.rmtree(working_dir)
os.makedirs(working_dir)

# load initial structure
mol = read(input_structure_file)
pos = mol.get_positions()
# distort the structure
for n in range(pos.shape[0]):
    pos[n] = pos[n] * random.uniform(0.95,1.05)
molecule = Atoms(positions=pos, numbers=mol.get_atomic_numbers())
molecule.calc = calculator

Finally we run the relaxation.

In [None]:
# run relaxation
optimize_file = os.path.join(working_dir, "optimization")
optimizer = LBFGS(
    molecule,
    trajectory="{:s}.traj".format(optimize_file),
    restart="{:s}.pkl".format(optimize_file),
)
optimizer.run(fmax=0.001, steps=1000)

In [None]:
# TODO: uncertainty evaluation example