# Featurize structures using aenet from Python

**Note: This requires the ænet executables to be set up correctly.**

In [1]:
import glob
import aenet.io.structure
from aenet.featurize import AenetAUCFeaturizer
from aenet.trainset import TrnSet

## Example 1

This example shows how to quickly featurize an atomic structure that may or may not be labeled with an energy and atomic forces.

First, we read an atomic structure from a file.  This can be in any of the supported structure formats.

In [2]:
struc = aenet.io.structure.read('water.xyz')

Next, we configure the featurizer.  The atom type are taken from the structure above.

In [3]:
fzer = AenetAUCFeaturizer(struc.typenames,
                          rad_cutoff=4.0, rad_order=10,
                          ang_cutoff=1.5, ang_order=3)

A featurized version of the structure can then be obtained with the following:

In [4]:
featurized_structure = fzer.featurize_structure(struc)

There is also a similar method named `featurize_structures()` (with an additional **s**) that can be used to featurize a list of structures.

The featurized structure includes the atom-site feature vectors.  For example, the following yields the feature vector of the first atomic site (starting with 0), which is the oxygen atom of the water molecule.

In [5]:
featurized_structure.atom_features

array([[ 1.73009825, -0.90147023, -0.79067349,  1.72543339, -1.00740569,
        -0.67561297,  1.71146394, -1.10790861, -0.55690914,  1.68826525,
        -1.20243702,  0.0835817 , -0.02038995, -0.07363335,  0.05631601,
         1.73009825, -0.90147023, -0.79067349,  1.72543339, -1.00740569,
        -0.67561297,  1.71146394, -1.10790861, -0.55690914,  1.68826525,
        -1.20243702,  0.0835817 , -0.02038995, -0.07363335,  0.05631601],
       [ 1.55242929, -0.61883393, -1.00049978,  1.32680075, -0.12552333,
        -0.98685814,  0.79500362,  0.12479959, -0.54970475,  0.29804719,
        -0.06287795,  0.        ,  0.        ,  0.        ,  0.        ,
        -0.17766897,  0.28263629, -0.20982628, -0.39863264,  0.88188236,
        -0.31124517, -0.91646032,  1.2327082 ,  0.00720439, -1.39021806,
         1.13955907,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 1.55242929, -0.61883393, -1.00049978,  1.32680075, -0.12552333,
        -0.98685814,  0.79500362,  0.12479959, -0

A feature vector for the entire structure can be computed with the moment-expansion approach:

In [6]:
print(featurized_structure.global_moment_fingerprint(outer_moment=1, inner_moment=1, 
                                                     weighted=True, append_weighted=True))

[ 1.64126377 -0.76015208 -0.89558663  1.52611707 -0.56646451 -0.83123556
  1.25323378 -0.49155451 -0.55330694  0.99315622 -0.63265748  0.04179085
 -0.01019498 -0.03681667  0.02815801  0.77621464 -0.30941697 -0.50024989
  0.66340037 -0.06276166 -0.49342907  0.39750181  0.06239979 -0.27485237
  0.1490236  -0.03143897  0.04179085 -0.01019498 -0.03681667  0.02815801
 -0.08883448  0.14131815 -0.10491314 -0.19931632  0.44094118 -0.15562259
 -0.45823016  0.6163541   0.0036022  -0.69510903  0.56977954 -0.04179085
  0.01019498  0.03681667 -0.02815801 -0.95388361  0.59205326  0.29042361
 -1.06203301  0.94464402  0.1821839  -1.31396213  1.1703084   0.28205677
 -1.53924166  1.17099805 -0.04179085  0.01019498  0.03681667 -0.02815801]


## Example 2

This example dives a bit deeper and shows a workflow that is applicable to large structure databases.
Here, we assume that a set of atomic structures in XSF format is already available in `./xsf/`.

In [7]:
# the AUC featurizer uses the Chebyshev method (Artrith 2017)
fzer = AenetAUCFeaturizer(['Li', 'Mo', 'Ni', 'Ti', 'O'],
                          rad_cutoff=4.0, rad_order=10, 
                          ang_cutoff=1.5, ang_order=3)

# aenet's generate.x will be run in the specified subdirectory ('run').
# If no work directory is given, a temporary directory is created and
# removed after completion.
fzer.run_aenet_generate(glob.glob("./xsf/*.xsf"), 
                        atomic_energies={
                            'Li': -2.5197301758568920,
                            'Mo': -0.6299325439642232,
                            'Ni': -2.2047639038747695,
                            'O': -10.0789207034275830,
                            'Ti': -2.2047639038747695},
                        workdir='run')

The above creates the files `generate.out` and `features.h5` containing the output written by `generate.x` and the data set in HDF5 format, respectively.  Since we specified the work directory `run` above, this directory is also kept with all files used to run `generate.x`.

To access the featurized structures, the data set can be read with the `TrnSet` class.

In [8]:
with TrnSet.from_file('features.h5') as ts:
    print(ts)


Training set info:
  Name           : run/data.train
  Atom types     : Li Mo Ni Ti O
  Atomic energies: -2.520 -0.630 -2.205 -2.205 -10.079
  #atom, #struc. : 46144 824
  E_min, max, av : -4.587 -4.548 -4.568
  File (format)  : features.h5 (hdf5)



The information for each atomic structure (including the atomic features) can be accessed with `ts.read_structure(i)` where `i` is the index of the structure.  However, for large data sets it is more efficient to access all structures sequentially.  This can be done using the method `ts.read_next_structure()` or by iterating over the training set object.

In [None]:
with TrnSet.from_file('features.h5') as ts:
    for i, s in enumerate(ts):
        print(i, s.path)