# Adsorbate Group Fingerprints Setup

In this tutorial we will try the adsorbate fingerprint generator, which is useful for converting adsorbates on extended surfaces into fingerprints for predicting their chemisorption energies.

Attached to the atoms objects, the fingerprinter needs information about the atoms belonging to the adsorbate.
Either one can identify the atomic indices in `atoms.info['ads_index']`, or one can specify the chemical formula in `atoms.info['key_value_pairs']['species']`.

  ```python
    atoms.info['ads_atoms'] = dictionary[f]['ads_index']
    atoms.info['key_value_pairs']['species'] = 'CH3'
    structures.append(atoms)
  ```

In [None]:
import numpy as np
import ase.io
from atoml.fingerprint.setup import FeatureGenerator
from atoml.fingerprint.adsorbate_prep import autogen_info
try:
    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    plot = True
except ImportError:
    print('Pandas and seaborn modules are needed for this tutorial.')

In [None]:
# Data in the form of a dictionary
dictionary = {'Ag': {'E': 1.44, 'ads_index': [30, 31, 32, 33]},
              'Au': {'E': 1.16, 'ads_index': [30, 31, 32, 33]},
              'Cu': {'E': 1.11, 'ads_index': [30, 31, 32, 33]}}

# We first create a list of atoms objects from a simple dataset.
structures = []
targets = []
for i, f in enumerate(dictionary):
    # Loading the atoms objects from traj files.
    atoms = ase.io.read(f + '.traj')
    # Attach indices of adsorbate atoms to the info dict in the key 'add_atoms'
    atoms.info['ads_atoms'] = dictionary[f]['ads_index']
    atoms.info['dbid'] = i
    # Get other information about the surface/adsorbate nearest neighbors.
    # Append atoms objects to a list.
    structures.append(atoms)
    targets.append(dictionary[f]['E'])

structure = autogen_info(structures)

# Get the fingerprint generator.
fingerprint_generator = FeatureGenerator()

# List of functions to call.
feature_functions = [fingerprint_generator.mean_site,
                     fingerprint_generator.mean_surf_ligands]
# There are many more available.

# Generate the data
training_data = fingerprint_generator.return_vec(structures, feature_functions)

# Get a list of names of the features.
feature_names = fingerprint_generator.return_names(feature_functions)

for l in range(len(feature_names)):
    print(l, feature_names[l])

# Select some features to plot.
selection = [10, 11, 14]

# Plot selected of the feature distributions.
data = {}
traint = np.transpose(training_data[:, selection])
for i, j in zip(traint, selection):
    data[j] = i
df = pd.DataFrame(data)
fig = plt.figure(figsize=(20, 10))
ax = sns.violinplot(data=df, inner=None)
plt.title('Feature distributions')
plt.xlabel('Feature No.')
plt.ylabel('Distribution.')

string = 'Plotting:'
for s in selection:
    string += '\n' + feature_names[s]
print(string)