In [10]:
import numpy as np

from autocat.adsorption import generate_rxn_structures
from autocat.surface import generate_surface_structures
from autocat.learning.featurizers import full_structure_featurization
from autocat.learning.featurizers import adsorbate_featurization
from autocat.learning.featurizers import catalyst_featurization
from autocat.learning.featurizers import _get_number_of_features
from autocat.learning.featurizers import get_X

# Adsorbate Featurization

In this example we are going to show how to featurize a catalyst structure solely based on local features of the adsorbate. First, let's start with making our adsorbate+slab structure

In [11]:
# generate the surface structure to place an adsorbate on
surf = generate_surface_structures(["Fe"])["Fe"]["bcc100"]["structure"]

# place an adsorbate onto the surface
ads_struct = generate_rxn_structures(surf, ads=["Li"])["Li"]["ontop"]["0.0_0.0"][
    "structure"
]

Now that we have our structure, we can now featurize the OOH adsorbate using a multitude of techniques. Let's use SOAP:

In [12]:
soap_feat = adsorbate_featurization(
    ads_struct,
    featurizer="soap",
    species_list = ["Fe", "Li", "H"], # species that should be accounted for by the representation
    rcut = 6, # cutoff radius
    nmax=4,
    lmax=4, # maximum order of spherical harmonic
    maximum_adsorbate_size = 5, # maximum adsorbate size that can be handled by the representation
    refine_structure = True # this refines the structure to only include surface and adsorbate atoms
)
print(f"Adsorbate featurized into a vector of shape: {soap_feat.shape}")

Adsorbate featurized into a vector of shape: (1950,)


Similarly, we can use the same function with a different featurization technique. For example we can use the order parameter site fingerprint:

In [13]:
opsf_feat = adsorbate_featurization(
    ads_struct,
    featurizer="op_sitefingerprint",
    maximum_adsorbate_size=5,
    refine_structure=False,
)
print(f"Adsorbate featurized into a vector of shape: {opsf_feat.shape}")

Adsorbate featurized into a vector of shape: (185,)


Note that different featurization techniques will result in varying representation sizes, but will also have varying degrees of learning success. We encourage the user to try different approaches for their specific application

# Long-range full structure featurization

In the previous section we looked at featurizing the adsorbate by its local environment. Now we can take a look at how we might want to capture longer-range phenomena via full structure featurization. Again, we start with making an example structure

In [14]:
# generate the surface structure to place an adsorbate on
surf = generate_surface_structures(["Cu"])["Cu"]["fcc111"]["structure"]

# place an adsorbate onto the surface
ads_struct = generate_rxn_structures(surf, ads=["OH"])["OH"]["ontop"]["0.0_0.0"][
    "structure"
]

One of the simpler forms of larger scale featurization is just taking elemental properties based upon the structure composition:

In [20]:
elem_prop = full_structure_featurization(
    ads_struct,
    featurizer="elemental_property",
    elementalproperty_preset="magpie", # here we can choose other presets as implemented within `matminer`
    refine_structure = False,
    maximum_structure_size = None, # by default we will just take the max size as the size of the slab+adsorbate
)
print(f"Full structure featurized into representation of shape {elem_prop.shape}")

[1.00000000e+00 4.40000000e+01 4.30000000e+01 4.08461538e+01
 5.82248521e+00 4.40000000e+01 5.60000000e+01 9.20000000e+01
 3.60000000e+01 5.85128205e+01 4.63905325e+00 5.60000000e+01
 1.00794000e+00 1.01070000e+02 1.00062060e+02 9.37062200e+01
 1.35946708e+01 1.01070000e+02 1.40100000e+01 2.60700000e+03
 2.59299000e+03 2.40879667e+03 3.65913846e+02 2.60700000e+03
 1.00000000e+00 1.50000000e+01 1.40000000e+01 7.82051282e+00
 6.99539776e-01 8.00000000e+00 1.00000000e+00 5.00000000e+00
 4.00000000e+00 4.71794872e+00 5.20710059e-01 5.00000000e+00
 3.10000000e+01 1.46000000e+02 1.15000000e+02 1.38179487e+02
 1.44378698e+01 1.46000000e+02 2.20000000e+00 3.04000000e+00
 8.40000000e-01 2.22153846e+00 4.19723866e-02 2.20000000e+00
 1.00000000e+00 2.00000000e+00 1.00000000e+00 1.02564103e+00
 4.99671269e-02 1.00000000e+00 0.00000000e+00 3.00000000e+00
 3.00000000e+00 7.69230769e-02 1.49901381e-01 0.00000000e+00
 0.00000000e+00 7.00000000e+00 7.00000000e+00 6.46153846e+00
 9.94082840e-01 7.000000

We can also featurize the full structure through use of a sine matrix, an extension of the coulomb matrix:

In [18]:
sine_matrix = full_structure_featurization(
    ads_struct,
    maximum_structure_size = None,
    refine_structure = True # this helps reduce the number of features to only
)
print(f"Full structure featurized into representation of shape {sine_matrix.shape}")

Full structure featurized into representation of shape (1600,)


We note here the use of refine_structure for the sine matrix. The main motivation for this is that since the sine matrix consists of pairwise interactions, and we are most interested in the adsorbate + slab interactions, only considering the surface and adsorbate atoms is an approach to reduce the representation length 

# Catalyst Featurization (Adsorbate + Full Structure)

As a means to incorporate both the short-range and longer-range features into a single representation, we can combine them in a consistent, generalizable manner using autocat. First, let's make our structure:

In [21]:
# generate the surface structure to place an adsorbate on
surf = generate_surface_structures(["Ru"])["Ru"]["hcp0001"]["structure"]

# place an adsorbate onto the surface
ads_struct = generate_rxn_structures(surf, ads=["NH2"])["NH2"]["ontop"]["0.0_0.0"][
    "structure"
]

Here we have the freedom to combine any of autocat's local featurization techniques with any of its longer range features. As an example, let's combine the sine matrix with SOAP

In [24]:
cat = catalyst_featurization(
    ads_struct,
    maximum_structure_size=40,
    maximum_adsorbate_size=4,
    structure_featurizer = "sine_matrix",
    adsorbate_featurizer = "soap",
    adsorbate_featurization_kwargs={"rcut": 6.0, "nmax": 4, "lmax": 4},
    refine_structure=False,
)
print(f"Catalyst featurized into a representation of shape {cat.shape}")

Catalyst featurized into a representation of shape (3160,)


Here is where the use of maximum structure size and maximum adsorbate size becomes most useful, as it allows for consistently sized vectors when incorporating structures and adsorbates of different sizes into a single representation. This is done through the use of zero-padding for both full-structure and short-range components of the representation. This leads into our next section of featurizing into an input matrix

# Featurizing many structures at once

As is often the case, we are generally most interested in featurizing multiple structures at once rather than just a single structure in isolation. While if the user opts to use AutoCats structure corrector predictor this is done under-the-hood, but we showcase this feature here to show how it can be used if one then wants to feed this into an alternate ML software

In [25]:
# generate multiple slab+adsorbate structures
structs = []
surf1 = generate_surface_structures(["Pt"])["Pt"]["fcc111"]["structure"]
ads1 = generate_rxn_structures(
    surf1,
    ads=["NH3", "CO"],
    all_sym_sites=False,
    sites={"origin": [(0.0, 0.0)]},
    height={"CO": 1.5},
    rots={"NH3": [[180.0, "x"], [90.0, "z"]], "CO": [[180.0, "y"]]},
)
structs.append(ads1["NH3"]["origin"]["0.0_0.0"]["structure"])
structs.append(ads1["CO"]["origin"]["0.0_0.0"]["structure"])
surf2 = generate_surface_structures(["Ru"])["Ru"]["hcp0001"]["structure"]
ads2 = generate_rxn_structures(
    surf2, ads=["N"], all_sym_sites=False, sites={"origin": [(0.0, 0.0)]},
)
structs.append(ads2["N"]["origin"]["0.0_0.0"]["structure"])

In [27]:
X = get_X(
    structs,
    maximum_structure_size=50,
    maximum_adsorbate_size=5,
    structure_featurizer = "sine_matrix",
    adsorbate_featurizer = "soap",
    refine_structures = True,
    adsorbate_featurization_kwargs={"rcut": 5.0, "nmax": 4, "lmax": 4},
    write_to_disk = False, # we can write the matrix to disk as a json
    write_location = ".",
)
print(f"Input matrix, X of shape {X.shape} generated")

Input matrix, X of shape (3, 10000) generated


We note that the shape of the matrix X generated is (# of structures, full_structure_feat(max_structure_size) + adsorbate_feat(max_adsorbate_size))