This notebook aims to compute the descriptor for combination of two "center" atoms

# Create a test system

In [1]:
import ase
import numpy as np
atoms = ase.Atoms("SSNO", positions=[[0, 0, 0], [0, 0, 0.1], [0, 0, 1], [0, 0, 2]])
frames = [atoms]

# Common Hyperparameters

In [2]:
r_cut = 4
n_max = 12
l_max = 6 
sigma = 0.3

In [3]:
# this is for one frame only for now... (but we can assume a nested list of lists if there are multiple frames)

list_S = [1, 2]  # list of all indices we label as "start" atom
list_M = [2, 3]  # list of all indices we label as "middle" atom
list_E = [3, 1]  # list of all indices we label as "end" atom

assert len(list_S) == len(list_M)
assert len(list_S) == len(list_E)

# `dscribe` descriptor

In [4]:
from dscribe.descriptors import SOAP

soaper = SOAP(
    r_cut=r_cut,
    n_max=n_max,
    l_max=l_max,
    sigma=sigma,
    sparse=False,
    species=["S", "O", "N"],
)

ModuleNotFoundError: No module named 'dscribe'

In [None]:
soap_water = soaper.create(frames[0], centers=list_S)

# pair descriptor

The code for the descriptor calculations is extracted from 

https://github.com/curiosity54/mlelec

In [5]:
from utils.acdc import pair_features

In [6]:
hypers = {
    "cutoff": r_cut,
    "max_radial": n_max,
    "max_angular": l_max,
    "atomic_gaussian_width": sigma,
    "center_atom_weight": 1,
    "radial_basis": {"Gto": {}},
    "cutoff_function": {"ShiftedCosine": {"width": 0.1}},
}

hypers_pair = {
    "cutoff": 10, # we can specify a larger cutoff here to find pairs that are much further away than the cutoff 
                   #used for describing local densities like in SOAP
    "max_radial": n_max,
    "max_angular": l_max,
    "atomic_gaussian_width": sigma,
    "center_atom_weight": 1,
    "radial_basis": {"Gto": {}},
    "cutoff_function": {"ShiftedCosine": {"width": 0.1}},
}

The pair feature combines a local feature like SOAP ($\nu=2$) with the expression $\rho_i^{\otimes \nu} \otimes g_{ij}$. We usually use $\nu=1$ so that the feature resulting from the tensor product instead has a soap like behavior. 
One can also create a pair feature of the form $\rho_i^{\otimes \nu} \otimes g_{ij} \otimes \rho_j^{\otimes \nu}$, (for $\nu=1$, this is similar in dimensions to the bispectrum)

In [17]:

both_centers = False # whether we computing the pair feature as (rho_i)^\nu \otimes g_ij (when False) or (rho_i)^\nu \otimes g_ij \otimes (rho_j)^\nu (when True)
# The latter is more informative as it has local environment info on both atoms but it is also more costly to compute 
all_pairs=False #when true, this resets the cutoff so that the resulting environment captures all pairs in the system 

rhoij = pair_features(
    frames=frames,
    hypers=hypers,
    hypers_pair = hypers_pair, #if not specified, hypers are used instead
    cg=None,
    order_nu=1, # specifies what kind of local densities to combine to create pair features, (i.
    both_centers=both_centers,
    lcut=0 # so that the resulting features are always scalar (or indexed by spherical_harmonics 0)
    # CAUTION: you might want to change this value if computing features with both_centers=True or trying to use these features to learn non-scalar 
    #properties. A reasonable number is ~3 or 4. 
)

the pair feature returned here has all the possible (and allowed based on cutoffs) pairs in the system. To match how the centers are accessed in the DScribe framework, we create a list of pairs for which we want the pair descriptor

In [14]:
# frame_index, i, j
list_ij = np.array([[0, atom1, atom2] for atom1,atom2 in zip(list_S, list_E)]) #<<< we are creating pairs between S and E atoms. Please change this if you want to construct SM pairs instead 
# Change this to a list of [ifr, atom1, atom2] when working with multiple frames

The resulting pair descriptor is not a numpy array, instead it is in the Metatensor format (specialized storage format that stores metadata) and organizes features in _blocks_. For each desired pair of atoms, we must look up the block corresponding to the chemical species of the two atoms in the pair. So, below we do a "preprocessing" to organize the desired pairs by their species

In [10]:
# convert list of indices to list of species
species_ij = []
for ifr, i, j in list_ij:
    atomic_species = frames[ifr].numbers
    species_i = atomic_species[i]
    species_j = atomic_species[j]

    species_ij.append((species_i, species_j))

In [11]:
species_ij
unique_species_ij, inverse = np.unique(species_ij, return_inverse=True, axis=0)

In [12]:
pair_soap_features = []

for inverse_index, (species_i, species_j) in enumerate(unique_species_ij):
    block = rhoij.block(spherical_harmonics_l=0, inversion_sigma=1, species_center=species_i, species_neighbor=species_j)

    values = block.values
    sample_values = block.samples.values

    mask = inverse == inverse_index
    selected_samples = list_ij[mask]

    value_indices = np.array([np.where(np.all(sample_values == s, axis=1)) for s in selected_samples])

    values_selected = values[value_indices]

    pair_soap_features.append(values_selected.numpy().flatten())

We now have collected the pair features corresponding to the pairs between the specified atoms. Please be careful in considering the *order* of pairs that the feature correspond to here, and in matching these features to targets. Number of features rows is the same between describe and pair features...

In [25]:
len(pair_soap_features)

2

In [None]:
len(soap_water)