# Classifying Defect Sites on TiO Surfaces

## Imports

Get packages needed for computation and load trajectory from `.lammpstrj` file.

Some packages might not already be in your Python installation, make sure you have the following installed:
This requires the [ASE](https://wiki.fysik.dtu.dk/ase/), [dscribe](https://singroup.github.io/dscribe/latest/) and [sklearn](https://scikit-learn.org/stable/index.html) Python libraries for neighborlists, trajectory handling and building and analysis of atomic descriptors.

In the future, or alternatively, [ASAP](https://github.com/BingqingCheng/ASAP) could be used to handle the building and analysis part.

In [None]:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd

from ase.neighborlist import NeighborList, natural_cutoffs
from ase.io import read as ase_read, write as ase_write
from dscribe.descriptors import SOAP
from sklearn.decomposition import PCA, KernelPCA

In [None]:
trajectory_path = "defect_test_out.lammpstrj"
trajectory = ase_read(trajectory_path, index=':100')

Get indices of all individual elements for later use.

In [None]:
ti_indices, o_indices, h_indices = [], [], []

for ii_symbol, symbol in enumerate(trajectory[0].get_chemical_symbols()):
    if symbol == 'Ti':
        ti_indices.append(ii_symbol)
    elif symbol == 'H':
        h_indices.append(ii_symbol)
    elif symbol == 'O':
        o_indices.append(ii_symbol)
    else:
        raise ValueError("Undefined element {symbol} found in initial snapshot at index {ii_symbol}.")


## Building Atomic Descriptors

Currently `r_cut` is being set to just be any number, in the future it would make sense to make some sort of educated guess.
Some multiple of the distance to the mean of nearest neighbour?
The number of basis functions in `n_max` and `l_max` is also chosen arbitrarily as of now.

In [None]:
soap = SOAP(r_cut=5., n_max=12, l_max=12, sigma=0.4, species=['Ti', 'H', 'O'])
soaps = soap.create(trajectory[0], positions=o_indices)
print(soaps.shape)

## Get Atomic Configurations via Neighbor Lists

Use the `dscribe.neighborlist` package to gain a rough idea of the atomic configuration of each oxygen atom.

In [None]:
neighbor_list = NeighborList(cutoffs=natural_cutoffs(trajectory[0], mult=1.), self_interaction=False, bothways=True)
neighbor_list.update(trajectory[0])

o_neighbors = []
for o_index in o_indices:
    o_neighbors.append(np.append([o_index], neighbor_list.get_neighbors(o_index)[0]))

In [None]:
o_symbols = []
for o_neighbor in o_neighbors:
    o_symbols.append(trajectory[0][o_neighbor].get_chemical_formula(mode='hill'))

configs = pd.Series(o_symbols).value_counts()
print(configs)

## Analyse Atomic Descriptors

Use kernelPCA or PCA to build a low dimensional representation of the SOAP operators.
Ideally, the defect site should show up seperately on this representation.

In [None]:
is_bulk = np.asarray(['Ti' in o_symbol for o_symbol in o_symbols], dtype=bool)
bulk_symbols = [o_symbol for ii_symbol, o_symbol in enumerate(o_symbols) if is_bulk[ii_symbol]]

pca = PCA(n_components=2)
reduction = pca.fit_transform(soaps)
kernel_pca = KernelPCA(n_components=2)
kernel_reduction = kernel_pca.fit_transform(soaps)  # , kernel='rbf')
bulk_pca = PCA(n_components=2)
bulk_reduction = bulk_pca.fit_transform(soaps[is_bulk])
bulk_kernel_reduction = KernelPCA(n_components=2).fit_transform(soaps[is_bulk])

In [None]:
def draw_2d_scatter(fig, ax, reduction, color_dict, o_symbols):
    colors = [color_dict[config] for config in o_symbols]
    ax.scatter(reduction[:, 0], reduction[:, 1], c=colors)
    return fig, ax

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(8, 8))

cmap = plt.cm.get_cmap('tab20', len(configs))
norm = plt.Normalize(vmin=-0.5, vmax=len(configs)-0.5)
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)

color_dict = {config: sm.to_rgba(ii_config) for ii_config, config in enumerate(configs.keys())}
colors = [color_dict[config] for config in o_symbols]

draw_2d_scatter(fig, axes[0, 0], reduction, color_dict, o_symbols)
draw_2d_scatter(fig, axes[0, 1], kernel_reduction, color_dict, o_symbols)
draw_2d_scatter(fig, axes[1, 0], bulk_reduction, color_dict, bulk_symbols)
draw_2d_scatter(fig, axes[1, 1], bulk_kernel_reduction, color_dict, bulk_symbols)

axes[0, 0].set_title("PCA Map of all O SOAPs")
axes[0, 1].set_title("Kernel PCA Map of all O SOAPs")
axes[1, 0].set_title("PCA Map of bulk O SOAPs")
axes[1, 1].set_title("Kernel PCA Map of bulk O SOAPs")

cb = fig.colorbar(mappable=sm)
cb.set_ticks(range(len(configs)), labels=configs.keys())

## Building SOAPs and Projections along the whole TiO Surface

The previous dimensionality reductions show, that bulk oxygen separates from surface oxygens.
Maybe by creating a grid over the surface of the TiO and calculating the SOAPs at certain spaced intervals, a definition of the defect site can be found.
Maybe by optimising on this plane (for eg.: largest difference to previous soaps) the site can be found.

Although none-surface molecules (such as water) could have an impact on this process, so maybe the molecules should be filtered first for being part of the "bulk".