# Voronoi Feature Generators <a name="head"></a>

In this tutorial, we will look at generating features from a database of organic donor-acceptor molecules from the [Computational Materials Repository](https://cmrdb.fysik.dtu.dk/?project=solar). This has been downloaded in the [ase-db](https://wiki.fysik.dtu.dk/ase/ase/db/db.html#module-ase.db) format so first off we load the atoms objects and get a target property. Then we convert the atoms objects into a feature array and test out a couple of different models.

This tutorial will give an indication of one way in which it is possible to handle atoms objects of different sizes. In particular, we focus on a feature set that scales with the number of atoms. We pad the feature vectors to a constant size to overcome this problem.

## Table of Contents
[(Back to top)](#head)

-   [Requirements](#requirements)
-   [Data Setup](#data-setup)
-   [Feature Generation](#feature-generation)
-   [Predictions](#predictions)
-   [Cross-validation](#cross-validation)

## Requirements <a name="requirements"></a>
[(Back to top)](#head)

-   [AtoML](https://gitlab.com/atoml/AtoML)
-   [ASE](https://wiki.fysik.dtu.dk/ase/)
-   [numpy](http://www.numpy.org/)
-   [matplotlib](https://matplotlib.org/)
-   [pandas](http://pandas.pydata.org/)
-   [seaborn](http://seaborn.pydata.org/index.html)

## Data Setup <a name="data-setup"></a>
[(Back to top)](#head)

First, we need to import some functions.

In [25]:
import ase
import random
from atoml.fingerprint.voro_fingerprint import VoronoiFingerprintGenerator

In [26]:
# Connect the ase-db.
db = ase.db.connect('../../data/cubic_perovskites.db')
atoms = list(db.select(combination= 'ABO3'))
random.shuffle(atoms)

# Compile a list of atoms and target values.
alist = []
for row in atoms:
    try:
        alist.append(row.toatoms())
    except AttributeError:
        continue

# Analyze the size of molecules in the db.
print('pulled {} molecules from db'.format(len(alist)))
size = []
for a in alist:
    size.append(len(a))

print('min: {0}, mean: {1:.0f}, max: {2} molecule size'.format(
    min(size), sum(size)/len(size), max(size)))

pulled 2704 molecules from db
min: 5, mean: 5, max: 5 molecule size


In [27]:
voro=VoronoiFingerprintGenerator(alist, )

In [28]:
voro.generate()

Generate Voronoi fingerprint of 2704 structures


Unnamed: 0,mean_EffectiveCoordination,var_EffectiveCoordination,min_EffectiveCoordination,max_EffectiveCoordination,var_MeanBondLength,min_MeanBondLength,max_MeanBondLength,mean_BondLengthVariation,var_BondLengthVariation,min_BondLengthVariation,...,min_SpaceGroupNumber,most_SpaceGroupNumber,frac_sValence,frac_pValence,frac_dValence,frac_fValence,CanFormIonic,MaxIonicChar,MeanIonicChar,id
0,,,,,,,,,,,...,12.0,12.0,0.277778,0.416667,0.305556,0.000000,1.0,0.660947,0.249630,0.0
1,,,,,,,,,,,...,12.0,12.0,0.400000,0.560000,0.040000,0.000000,0.0,0.660947,0.271599,1.0
2,,,,,,,,,,,...,12.0,12.0,0.169492,0.254237,0.338983,0.237288,1.0,0.404527,0.189077,2.0
3,,,,,,,,,,,...,12.0,12.0,0.256410,0.307692,0.435897,0.000000,0.0,0.551131,0.242710,3.0
4,,,,,,,,,,,...,12.0,12.0,0.285714,0.428571,0.285714,0.000000,0.0,0.774266,0.287965,4.0
5,,,,,,,,,,,...,12.0,12.0,0.200000,0.266667,0.222222,0.311111,0.0,0.660947,0.242443,5.0
6,,,,,,,,,,,...,12.0,12.0,0.204082,0.244898,0.265306,0.285714,1.0,0.455779,0.188003,6.0
7,,,,,,,,,,,...,12.0,12.0,0.200000,0.260000,0.260000,0.280000,0.0,0.609724,0.264711,7.0
8,,,,,,,,,,,...,12.0,12.0,0.281250,0.375000,0.343750,0.000000,0.0,0.660947,0.269146,8.0
9,,,,,,,,,,,...,12.0,12.0,0.312500,0.375000,0.312500,0.000000,0.0,0.678329,0.294025,9.0
