# Adsorbate Fingerprints Setup

In this tutorial we will try the adsorbate fingerprint generator, which is useful for converting adsorbates on extended surfaces into fingerprints for predicting their chemisorption energies, bond lengths or other properties.

In other machine learning codes, the data usually comes as a matrix where rows represent training examples or unexplored data points, whereas columns represent features or properties of the data points. Therefore the CatLearn fingerprinters expect atoms objects as inputs and they return the data in an array.

In [None]:
# Import packages.
import numpy as np
import ase.io
from ase.data import atomic_numbers
from ase.build import fcc111, add_adsorbate
from ase.constraints import FixAtoms
from catlearn.fingerprint.setup import FeatureGenerator
from catlearn.fingerprint.periodic_table_data import get_radius, default_catlearn_radius
from catlearn.fingerprint.adsorbate_prep import autogen_info
try:
    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    plot = True
except ImportError:
    print('Pandas and seaborn modules are needed for this tutorial.')

### Generate some adsorbate/surface systems from ASE.

We return the atoms objects in a list, which is the simplest format and easily transferable to CatLearn.

In [None]:
"""Make a list of atoms objects."""
adsorbates = ['H', 'O', 'C', 'N', 'S', 'Cl', 'F']
symbols = ['Ag', 'Au', 'Cu', 'Pt', 'Pd', 'Ir', 'Rh', 'Ni', 'Co']
images = []
for i, s in enumerate(symbols):
    rs = get_radius(atomic_numbers[s])
    a = 2 * rs * 2 ** 0.5
    for ads in adsorbates:
        atoms = fcc111(s, (2, 2, 3), a=a)
        atoms.center(vacuum=6, axis=2)
        c_atoms = [a.index for a in atoms if
                   a.z < atoms.cell[2, 2] / 2. + 0.1]
        atoms.set_constraint(FixAtoms(c_atoms))
        h = (default_catlearn_radius(
            atomic_numbers[ads]) + rs) / 2 ** 0.5
        add_adsorbate(atoms, ads, h, 'bridge')
        images.append(atoms)
print(len(images), ' atoms objects created.')

Here we have our list of atoms stored in `images`.

### Attach meta data automatically.

The adsorbate fingerprinter generates fingerprints based on connectivity of atoms in the adsorbate/slab system. It therefore uses certain metadata as intermediates between the atoms object and the fingerprint. A connectivity matrix is one of those metadata which can some times be computationally time consuming to generate and therefore needs to be made only once.

A list of raw atoms without the metadata can be feed through `autogen_info` to attach the connectivity matrix and metadata.

In [None]:
images = autogen_info(images)

Now let's go ahead and generate our fingerprint matrix.

First we instantiate the FeatureGenerator object and define the fingerprinting functions we want to call. These define what information we retrieve and include in our fingerprints.

In [None]:
# Get the fingerprint generator.
fingerprint_generator = FeatureGenerator()

# List of functions to call.
feature_functions = [fingerprint_generator.mean_site,
                     fingerprint_generator.mean_surf_ligands]
# There are many more available.

# Run the fingerprinter.
training_data = fingerprint_generator.return_vec(images, feature_functions)

# Get a list of names of the features.
feature_names = fingerprint_generator.return_names(feature_functions)

print(np.shape(training_data), ' data matrix created.')

The data matrix is now stored in `training_data`.

### Let's analyse the output.

First lets see what features were returned by the `feature_functions`:

In [None]:
for l in range(len(feature_names)):
    print(l, feature_names[l])

Lets try and compare some of the features about atomic radii using violinplots.

In [None]:
# Select some features to plot.
selection = [10, 11, 14]

# Plot selected of the feature distributions.
data = {}
traint = np.transpose(training_data[:, selection])
for i, j in zip(traint, selection):
    data[j] = i
df = pd.DataFrame(data)
fig = plt.figure(figsize=(20, 10))
ax = sns.violinplot(data=df, inner=None)
plt.title('Feature distributions', fontsize=20)
plt.xlabel('Feature No.', fontsize=20)
plt.ylabel('Distribution.', fontsize=20)

string = 'Plotting:'
for s in selection:
    string += '\n' + str(s) + ' ' + feature_names[s]
print(string)

### Analysis of meta data.

Attached to the atoms objects, the fingerprinter needs information about the atoms belonging to the adsorbate. 
This was generate automatically by `autogen_info`, but we can take a closer look at how this meta data is formatted:

In [None]:
# Look at meta data for the first atoms object.
images[0].subsets

E.g. Atomic indices of atoms belonging to the adsorbate are put in `atoms.subsets['ads_atoms']`
There is only one index in that subset, which shows that this system had a monoatomic adsorbate.

In [None]:
# Let's see which one it was.
print('adsorbate:', images[0].get_chemical_symbols()[12])

# What was the site?
print('adsorbate:', np.array(images[0].get_chemical_symbols())[images[0].subsets['site_atoms']])

It was a H* sitting on a Ag-Ag bridge site.

As a user, you can always choose to attach this information and avoid relying on `autogen_info`, if you prefer. There could be various reasons, why the accuracy of `autogen_info` is not always optimal.

`autogen_info` will respect any subsets already present.

Furthermore `autogen_info` builds the subsets using information from a connectivity matrix that is stored in `atoms.connectivity`. If the atoms object already has `atoms.connectivity`, that will be kept and used, otherwise a new one will be created using default cutoffs for neighbor distances.

In [None]:
# Lets look at a connectivity matrix.
images[0].connectivity