In [1]:
import numpy as np

from sklearn.gaussian_process import GaussianProcessRegressor

from autocat.adsorption import place_adsorbate
from autocat.surface import generate_surface_structures
from autocat.perturbations import generate_perturbed_dataset
from autocat.learning.predictors import AutoCatStructureCorrector

In this tutorial we show how to use the AutoCatStructureCorrector for both training on relaxed structures and predicting corrections to initial structure guesses

# Creating an AutoCatStructureCorrector object

Let's start with creating our AutoCatStructureCorrector object

In [2]:
acsc = AutoCatStructureCorrector(
    model_class = GaussianProcessRegressor, # regressor model class
    structure_featurizer="sine_matrix",
    adsorbate_featurizer="soap",
    adsorbate_featurization_kwargs={"rcut": 5.0, "nmax": 8, "lmax": 6},
    refine_structures = True,
    maximum_structure_size = None,
    maximum_adsorbate_size = None,
)

The model class provided should have fit and predict (with return_std) methods. The AutoCatStructureCorrector class behaves similarly to sci-kit learn regressor objects with fit, predict and score methods.

# Generating Perturbed Structure Datasets

Before we can demonstrate the capabilities of the AutoCatStructureCorrector class, we need to create a perturbed dataset from given base structures. These base structures would typically be already relaxed with DFT. For simplicity let's consider a single base structure, which would correspond to an interpolative prediction on the surface, generated directly by AutoCat (in practice this would ideally be relaxed before use)

In [3]:
sub = generate_surface_structures(["Pd"], facets={"Pd": ["100"]})["Pd"]["fcc100"][
    "structure"
]
base_struct = place_adsorbate(sub, "CO")["custom"]["structure"]

Now that we have the base structure, we can now perturb it to create the dataset.

In [4]:
train_set = generate_perturbed_dataset(
    [base_struct],
    num_of_perturbations=20,
    minimum_perturbation_distance = 0.01,
    maximum_perturbation_distance = 0.75,
    maximum_adsorbate_size = None,
)

# separating out the corrected structures and corresponding correction matrix
train_structures = train_set["collected_structures"]
train_correction_matrix = train_set["correction_matrix"] # matrix which has padding
train_corrections_list = train_set["corrections_list"] # list of corrections (which will have variable length)

**A note on directionality:** Perturbations are controlled via the tags associated with each atom in the Atoms object. By default, slabs generated with ASE/AutoCat will have the tags going from 1 (top-layer) to # of layers (bottom layer) and adsorbates will have a tag of 0. Atoms given a tag of 0 will be free to move and atoms with tags >=1 will be fixed. Constraints in directionality may be specified via the following specially assigned tag-values (see documentation for further details)

For future use let's also make a test set

In [5]:
test_set = generate_perturbed_dataset(
    [base_struct],
    num_of_perturbations=20,
    minimum_perturbation_distance = 0.01,
    maximum_perturbation_distance = 0.75,
    maximum_adsorbate_size = None,
)

# separating out the corrected structures and corresponding correction matrix
test_structures = test_set["collected_structures"]
test_corrections_list = test_set["corrections_list"]

# Fitting to perturbation data

To fit our model to the generated perturbation data, we can use the fit method

In [6]:
acsc.fit(
    train_structures,
    correction_matrix = train_correction_matrix
)

Alternatively, we can also provide the corrections as a list

In [7]:
acsc.fit(
    train_structures,
    corrections_list = train_corrections_list
)

If we want to check whether our model has been fit to data, we can use the is_fit method. Note that changing any of the model or featurizer settings will automatically change this value back to False

In [8]:
print(f"Has the model been fit? {acsc.is_fit}")

acsc.structure_featurizer = "elemental_property"
print(f"Is the model still been fit? {acsc.is_fit}")

Has the model been fit? True
Is the model still been fit? False


Let's refit the model to be used for the next section

In [9]:
acsc.structure_featurizer = "sine_matrix"

acsc.fit(
    train_structures,
    corrections_list = train_corrections_list
)

# Structure Correction Predictions and Performance Metrics

Now that we have a fit model, we can now make predictions on the test set we made earlier. To do this we can make use of the predict method

In [10]:
predicted_corrections, corrected_structures, unc = (
    acsc.predict(
        test_structures
    )
)

When we make predictions we are given three quantities: a list of the predicted corrections for each structure, Atoms objects of the corrected structures, and the uncertainties attributed to each prediction.

Since we know what the corrections need to be for each of the test structures, we can evaluate performance metrics on each of the predictions. At present, both MAE and RMSE are implemented within the score method

In [11]:
MAE = acsc.score(test_structures, test_corrections_list)
print(f"MAE = {MAE}")

RMSE = acsc.score(test_structures, test_corrections_list, metric="rmse")
print(f"RMSE = {RMSE}")

MAE = 0.7263375790848622
RMSE = 0.5777037271424246
