# Analyse bond distances, check cutoffs

In this notebook we will plot pair distribution functions (pdf) over our dataset or subset. This is necessary to convince ourselves that a we can rely on connectivities to fingerprint the atomic structures. 

In [None]:
# Import modules.
import numpy as np

from ase.visualize import view
from ase.data import chemical_symbols, covalent_radii

from catlearn.utilities.distribution import pair_distribution, pair_deviation
from catlearn.api.ase_atoms_api import images_connectivity, database_to_list
from catlearn.featurize.adsorbate_prep import autogen_info, auto_layers, layers_termination, check_reconstructions
from catlearn.featurize.periodic_table_data import get_radius, default_catlearn_radius
from catlearn import __path__
from matplotlib import pyplot as plt

### Import data
Select a dataset and import it to a list of atoms objects.

In [None]:
selection = []
work_path = __path__[0]
fname = work_path + '/../data/ads_example.db'
print('Importing from database.')
images = database_to_list(fname, selection=selection)
print(len(images), 'structures imported.')

### Pair distribution function
The pair distribution function is a histrogram over distances between the atoms in our dataset. Our pdf utility in catlearn can optionally select a one or two elements to include in the analysis.

In [None]:
# int for bonds between a single element and all other atoms. 
# tuple (A, B) for bonds between A and B only.
element = 6

# Generate pdf.
pdf, x = pair_distribution(images, bins=257, bounds=(0.3, 3.), element=element)

In [None]:
# Plot pdf.
plt.plot(x, pdf)
plt.xlabel('$r$ [$10^{-10}$ m]')

We can see on this plot that carbon

The pdf does not directly show us the appropriate cutoff unless we select a specific pair of elements to count bonds between.

### Set cutoffs

Lets set some cutoff in order to evaluate them.

In [None]:
cutoff_dictionary = {}
for z, s in enumerate(chemical_symbols[:104]):
    if z == 0:
        continue
    elif z == 6:
        radius = covalent_radii[z] * 1.1 + 0.1
    else:
        radius = get_radius(z) * 1.1 + 0.1
    cutoff_dictionary[z] = radius

In [None]:
# int for bonds between a single element and all other atoms. 
# tuple (A, B) for bonds between A and B only.
element = 78

# Generate pdf.
pdf, x = pair_distribution(images, bins=257, bounds=(0.3, 3.), element=element)

In [None]:
# Plot pdf.
plt.plot(x, pdf)

# Print and plot bond lenght
bond = 0.
if isinstance(element, int):
    print('radius', cutoff_dictionary[element])
elif isinstance(element, tuple):
    for z in element:
        print('radius', cutoff_dictionary[z])
        bond += cutoff_dictionary[z]
    print('bond', bond)
plt.axvline(bond, color='0.5')

# Axis label.
plt.xlabel('$r$ [$10^{-10}$ m]')

If the line is after the first peak and clear of any other peaks, the cutoff will clearly distinguish first nearest neigbors.

### Check cutoffs

When our dataset has a larger number of elements, we don't really want to evaluate every pair of elements as shown above. We can instead plot a histogram of bond distances, where the element specific cutoff radii have been subtracted, $r - (r_a + r_b)$.

In [None]:
deviation, xd = pair_deviation(images, bins=257, bounds=(-.5, 0.5), cutoffs=cutoff_dictionary)

In [None]:
plt.plot(xd, deviation)
plt.xlabel('$r - (r_a + r_b)$ [$10^{-10}$ m]')

If the distribution is 0 where $r = r_a + r_b$, we can unambigously represent structures by their connectivity. If the distribution is not 0 at $r = r_a + r_b$, we may be able to tune our cutoff radii to obtain more accurate connectivities, depending on the dataset.