# Featurize structures using aenet from Python

This requires the ænet executables to be set up correctly.

In [1]:
import glob
from aenet.featurize import AenetAUCFeaturizer
from aenet.trainset import TrnSet

We assume here that a set of atomic structures in XSF format is already available in `./xsf/`.

In [2]:
# the AUC featurizer uses the Chebyshev method (Artrith 2017)
fzer = AenetAUCFeaturizer(['H', 'C', 'O'], 
                          rad_cutoff=4.0, rad_order=10, 
                          ang_cutoff=1.5, ang_order=3)

# aenet's generate.x will be run in the specified subdirectory ('run').
# If no work directory is given, a temporary directory is created and
# removed after completion.
fzer.run_aenet_generate(glob.glob("./xsf/*.xsf"), workdir='run')

The above creates the files `generate.out` and `features.h5` containing the output written by `generate.x` and the data set in HDF5 format, respectively.  Since we specified the work directory `run` above, this directory is also kept with all files used to run `generate.x`.

To access the featurized structures, the data set can be read with the `TrnSet` class.

In [3]:
ts = TrnSet.from_file('features.h5')
print(ts)


Training set info:
  Name           : run/data.train
  Atom types     : H C O
  Atomic energies: 0.000 0.000 0.000
  #atom, #struc. : 2480 248
  E_min, max, av : -932.146 -923.571 -932.009
  File (format)  : features.h5 (hdf5)



The information for each atomic structure (including the atomic features) can be accessed with `ts.read_structure(i)` where `i` is the index of the structure.  However, for large data sets it is more efficient to access all structures sequentially.  This can be done using the method `ts.read_next_structure()` or by iterating over the training set object.

In [4]:
for i, s in enumerate(ts):
    print(i, s.path)

0 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-001.xsf
1 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-002.xsf
2 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-003.xsf
3 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-004.xsf
4 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-005.xsf
5 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-006.xsf
6 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-007.xsf
7 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-008.xsf
8 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-009.xsf
9 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-010.xsf
10 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-011.xsf
11 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-012.xsf
12 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/frame-013.xsf
13 /data/home/au2229/code/aenet/aenet-python/notebooks/xsf/fr

Make sure to close training set files.

In [5]:
ts.close()