# Featurize structures using aenet from Python

**Note: This requires the ænet executables to be set up correctly.**

In [1]:
import glob
import aenet.io.structure
from aenet.featurize import AenetAUCFeaturizer
from aenet.trainset import TrnSet

## Example 1

This example shows how to quickly featurize an atomic structure that may or may not be labeled with an energy and atomic forces.

First, we read an atomic structure from a file.  This can be in any of the supported structure formats.

In [2]:
struc = aenet.io.structure.read('water.xyz')

Next, we configure the featurizer.  The atom type are taken from the structure above.

In [3]:
fzer = AenetAUCFeaturizer(struc.typenames,
                          rad_cutoff=4.0, rad_order=10,
                          ang_cutoff=1.5, ang_order=3)

A featurized version of the structure can then be obtained with the following:

In [4]:
featurized_structure = fzer.featurize_structure(struc)

There is also a similar method named `featurize_structures()` (with an additional **s**) that can be used to featurize a list of structures.

The featurized structure includes the atom-site feature vectors.  For example, the following yields the feature vector of the first atomic site (starting with 0), which is the oxygen atom of the water molecule.

In [7]:
featurized_structure.atom_features[0]

array([ 1.73009825, -0.90147023, -0.79067349,  1.72543339, -1.00740569,
       -0.67561297,  1.71146394, -1.10790861, -0.55690914,  1.68826525,
       -1.20243702,  0.0835817 , -0.02038995, -0.07363335,  0.05631601,
        1.73009825, -0.90147023, -0.79067349,  1.72543339, -1.00740569,
       -0.67561297,  1.71146394, -1.10790861, -0.55690914,  1.68826525,
       -1.20243702,  0.0835817 , -0.02038995, -0.07363335,  0.05631601])

A feature vector for the entire structure can be computed with the moment-expansion approach:

In [11]:
featurized_structure.global_moment_fingerprint(outer_moment=2, inner_moment=2)

array([ 1.64126377e+00, -7.60152079e-01, -8.95586635e-01,  1.52611707e+00,
       -5.66464507e-01, -8.31235559e-01,  1.25323378e+00, -4.91554510e-01,
       -5.53306944e-01,  9.93156224e-01, -6.32657483e-01,  4.17908515e-02,
       -1.01949774e-02, -3.68166736e-02,  2.81580073e-02,  7.76214644e-01,
       -3.09416966e-01, -5.00249888e-01,  6.63400375e-01, -6.27616633e-02,
       -4.93429072e-01,  3.97501810e-01,  6.23997947e-02, -2.74852374e-01,
        1.49023597e-01, -3.14389735e-02,  4.17908515e-02, -1.01949774e-02,
       -3.68166736e-02,  2.81580073e-02,  0.00000000e+00,  3.08148791e-33,
        0.00000000e+00,  0.00000000e+00,  1.38666956e-32,  4.00593428e-32,
        3.08148791e-33,  3.85185989e-32,  9.86076132e-32,  6.16297582e-33,
        7.54964538e-32,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  1.54074396e-33,  3.46667390e-33,
        0.00000000e+00,  6.16297582e-33,  3.85185989e-32,  3.08148791e-33,
        6.16297582e-32,  

## Example 2

This example dives a bit deeper and shows a workflow that is applicable to large structure databases.
Here, we assume that a set of atomic structures in XSF format is already available in `./xsf/`.

In [12]:
# the AUC featurizer uses the Chebyshev method (Artrith 2017)
fzer = AenetAUCFeaturizer(['Li', 'Mo', 'Ni', 'Ti', 'O'],
                          rad_cutoff=4.0, rad_order=10, 
                          ang_cutoff=1.5, ang_order=3)

# aenet's generate.x will be run in the specified subdirectory ('run').
# If no work directory is given, a temporary directory is created and
# removed after completion.
fzer.run_aenet_generate(glob.glob("./xsf/*.xsf"), 
                        atomic_energies={
                            'Li': -2.5197301758568920,
                            'Mo': -0.6299325439642232,
                            'Ni': -2.2047639038747695,
                            'O': -10.0789207034275830,
                            'Ti': -2.2047639038747695},
                        workdir='run')

The above creates the files `generate.out` and `features.h5` containing the output written by `generate.x` and the data set in HDF5 format, respectively.  Since we specified the work directory `run` above, this directory is also kept with all files used to run `generate.x`.

To access the featurized structures, the data set can be read with the `TrnSet` class.

In [13]:
with TrnSet.from_file('features.h5') as ts:
    print(ts)


Training set info:
  Name           : run/data.train
  Atom types     : Li Mo Ni Ti O
  Atomic energies: -2.520 -0.630 -2.205 -2.205 -10.079
  #atom, #struc. : 46144 824
  E_min, max, av : -4.587 -4.548 -4.568
  File (format)  : features.h5 (hdf5)



The information for each atomic structure (including the atomic features) can be accessed with `ts.read_structure(i)` where `i` is the index of the structure.  However, for large data sets it is more efficient to access all structures sequentially.  This can be done using the method `ts.read_next_structure()` or by iterating over the training set object.

In [14]:
with TrnSet.from_file('features.h5') as ts:
    for i, s in enumerate(ts):
        print(i, s.path)

0 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure001.xsf
1 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure002.xsf
2 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure003.xsf
3 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure004.xsf
4 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure005.xsf
5 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure006.xsf
6 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure007.xsf
7 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure008.xsf
8 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure009.xsf
9 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure010.xsf
10 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure011.xsf
11 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure012.xsf
12 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/structure013.xsf
13 /data/home/au2229/c