We will show how to work with the dataset. We must always provide it in the form of a pandas dataframe, with columns energy, forces, ase_atoms, energy_corrected (total energy of a structure minus a reference energy). We will always use energy_corrected in our fitting rather than energy. 

In [2]:
import pandas as pd 

df = pd.read_pickle('../Cu_df1_A1_A2_A3_EV_elast_phon.pckl.gzip', compression="gzip")

df

Unnamed: 0,name,ase_atoms,energy_corrected,forces,energy_corrected_per_atom
0,A1:FHI-aims/PBE/tight:elastic:s_e_0,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.699843,"[[0.0, 0.0, 0.0]]",-3.699843
1,A1:FHI-aims/PBE/tight:murnaghan:strain_1_0,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.699841,"[[0.0, 0.0, 0.0]]",-3.699841
2,A1:FHI-aims/PBE/tight:phonon:supercell_phonon_0,"(Atom('Cu', [4.3368086899420173e-19, 0.0070709...",-236.789603,"[[-1.13852957740976e-06, -0.0464638907314277, ...",-3.699838
3,A1:FHI-aims/PBE/tight:murnaghan:strain_1_02,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.697932,"[[0.0, 0.0, 0.0]]",-3.697932
4,A1:FHI-aims/PBE/tight:murnaghan:strain_0_98,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.697559,"[[0.0, 0.0, 0.0]]",-3.697559
...,...,...,...,...,...
100,A2:FHI-aims/PBE/tight:elastic:s_01_e_0_05000,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.573436,"[[0.0, 0.0, 0.0]]",-3.573436
101,A1:FHI-aims/PBE/tight:elastic:s_01_e_m0_05000,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.546222,"[[0.0, 0.0, 0.0]]",-3.546222
102,A3:FHI-aims/PBE/tight:elastic:s_01_e_m0_05000,"(Atom('Cu', [1.2150849, 0.70152958, 0.99981864...",-7.079448,"[[-3.5446112180968e-23, -1.13427558979097e-22,...",-3.539724
103,A2:FHI-aims/PBE/tight:elastic:s_01_e_m0_05000,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.513068,"[[0.0, 0.0, 0.0]]",-3.513068


Every row in this dataset consists of a name, the ase_atoms, energy_corected, and forces. Let's take a look at the third row (index 2). 

In [9]:
row = df.iloc[2]
row

name                           A1:FHI-aims/PBE/tight:phonon:supercell_phonon_0
ase_atoms                    (Atom('Cu', [4.3368086899420173e-19, 0.0070709...
energy_corrected                                                   -236.789603
forces                       [[-1.13852957740976e-06, -0.0464638907314277, ...
energy_corrected_per_atom                                            -3.699838
Name: 2, dtype: object

Now we can define a function that generates the atomic neighborhood of every atom by calling the `generate_tp_atoms` function on the `Atoms` object. Say that we have a configuration $\mathcal{N} = \{a_i\}_{i=1}^{N}$ of $N$ atoms, and say for each (center) atom $a_i$, let's denote the set of all indices $j$ of the atoms $a_j$ within its atomic environment $\mathcal{N}_i \subset \mathbb{N}$. So, we have the set of atoms $\{a_j\}_{j \in \mathcal{N}_i}$ within the atomic environment of $a_i$ and the number of atoms within this environment we will denote $N_i$. 

This returns a dictionary with the following relevant keys: 
- `_ind_i` : An array of all the center atoms indices of length $\sum_i N_i$, with $N_0$ zeross, followed by $N_1$ ones, $N_2$ twos, etc. and $N_N$ Ns. 
$$[\underbrace{0 \; 0 \ldots 0 \;0}_{N_0} \; \underbrace{1 \;1 \ldots 1 \;1}_{N_1} \; \underbrace{2 \; 2 \ldots 2 \;2}_{N_2} \; 3 \ldots N-1 \; \underbrace{N \; N\ldots N}_{N_N}]$$
- `_ind_j` : An array of all the atoms indices within the atomic neighborhood (determined by the cutoff radius) of the center atom of length $\sum_i N_i$. It is of form 
$$[\mathcal{N}_0 \; \mathcal{N}_1 \; \mathcal{N}_2 \; \ldots \; \mathcal{N}_{N-1} \; \mathcal{N}_N] $$
- `_mu_i` : An array of the atomic number of the center atoms. 
- `_mu_j` : An array of the atomic number of all the atoms within the atomic neighborhood of the center atom. 
- `_eweights` : Weights of the energy (?), which defaults to just $1$ 
- `_fweights` : Weights of the forces (?), which defaults to an array of ones with length $N$. 
- `_offsets` 
- `_energy` 
- `_forces` 
- `_positions` 
- `_cell`

In [22]:
at = row['ase_atoms'] 
at.get_positions()

array([[4.33680869e-19, 7.07100000e-03, 7.07100000e-03],
       [3.33066907e-16, 1.81563000e+00, 1.81563000e+00],
       [7.26251800e+00, 1.08937770e+01, 3.63125900e+00],
       [7.26251800e+00, 1.27094070e+01, 5.44688900e+00],
       [1.81563000e+00, 0.00000000e+00, 1.81563000e+00],
       [1.81563000e+00, 1.81563000e+00, 3.63125900e+00],
       [9.07814800e+00, 1.08937770e+01, 5.44688900e+00],
       [1.81563000e+00, 5.44688900e+00, 7.26251800e+00],
       [3.63125900e+00, 0.00000000e+00, 3.63125900e+00],
       [1.08937770e+01, 9.07814800e+00, 5.44688900e+00],
       [1.08937770e+01, 1.08937770e+01, 7.26251800e+00],
       [1.08937770e+01, 1.27094070e+01, 9.07814800e+00],
       [5.44688900e+00, 0.00000000e+00, 5.44688900e+00],
       [5.44688900e+00, 1.81563000e+00, 7.26251800e+00],
       [1.27094070e+01, 1.08937770e+01, 9.07814800e+00],
       [5.44688900e+00, 5.44688900e+00, 1.08937770e+01],
       [1.81563000e+00, 1.81563000e+00, 0.00000000e+00],
       [1.81563000e+00, 3.63125

In [21]:
from tensorpotential.utils.utilities import generate_tp_atoms
from ase.calculators.singlepoint import SinglePointCalculator

cutoff = 6

t = generate_tp_atoms(at, cutoff=cutoff)

t['_positions']

array([[9.73629256e-04, 8.95724188e-20, 0.00000000e+00],
       [2.50000069e-01, 6.87916176e-17, 0.00000000e+00],
       [5.00000000e-01, 1.22296209e-16, 1.00000000e+00],
       [7.50000069e-01, 3.66888627e-16, 1.00000000e+00],
       [1.52870261e-17, 2.50000069e-01, 0.00000000e+00],
       [2.50000000e-01, 2.50000000e-01, 6.88466452e-08],
       [5.00000000e-01, 2.50000069e-01, 1.00000000e+00],
       [7.50000000e-01, 2.50000000e-01, 6.88466450e-08],
       [3.05740523e-17, 5.00000000e-01, 0.00000000e+00],
       [2.50000069e-01, 5.00000000e-01, 1.00000000e+00],
       [5.00000000e-01, 5.00000000e-01, 1.00000000e+00],
       [7.50000069e-01, 5.00000000e-01, 1.00000000e+00],
       [6.11481045e-17, 7.50000069e-01, 1.00000000e+00],
       [2.50000000e-01, 7.50000000e-01, 6.88466451e-08],
       [5.00000000e-01, 7.50000069e-01, 1.00000000e+00],
       [7.50000000e-01, 7.50000000e-01, 6.88466449e-08],
       [0.00000000e+00, 0.00000000e+00, 2.50000069e-01],
       [2.50000000e-01, 6.88466

In [23]:
t['_ind_i']

array([ 0,  0,  0, ..., 63, 63, 63])

We want to get this information from every single `Atoms` object in every row, so we define a function that applies it to every row and store it in a new column. 

In [11]:
def do_tp_atoms(row):
    at = row['ase_atoms']
    en = row['energy_corrected']
    f = row['forces']
    calc = SinglePointCalculator(at, energy=en, forces=f)
    at.set_calculator(calc)     # attaches calculator object to atoms 
    
    return generate_tp_atoms(at, cutoff=cutoff)


# Apply to entire dataframe
df['tp_atoms'] = df.apply(do_tp_atoms, axis=1)

df

Unnamed: 0,name,ase_atoms,energy_corrected,forces,energy_corrected_per_atom,tp_atoms
0,A1:FHI-aims/PBE/tight:elastic:s_e_0,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.699843,"[[0.0, 0.0, 0.0]]",-3.699843,"{'_ind_i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..."
1,A1:FHI-aims/PBE/tight:murnaghan:strain_1_0,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.699841,"[[0.0, 0.0, 0.0]]",-3.699841,"{'_ind_i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..."
2,A1:FHI-aims/PBE/tight:phonon:supercell_phonon_0,"(Atom('Cu', [4.3368086899420173e-19, 0.0070709...",-236.789603,"[[-1.13852957740976e-06, -0.0464638907314277, ...",-3.699838,"{'_ind_i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..."
3,A1:FHI-aims/PBE/tight:murnaghan:strain_1_02,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.697932,"[[0.0, 0.0, 0.0]]",-3.697932,"{'_ind_i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..."
4,A1:FHI-aims/PBE/tight:murnaghan:strain_0_98,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.697559,"[[0.0, 0.0, 0.0]]",-3.697559,"{'_ind_i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..."
...,...,...,...,...,...,...
100,A2:FHI-aims/PBE/tight:elastic:s_01_e_0_05000,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.573436,"[[0.0, 0.0, 0.0]]",-3.573436,"{'_ind_i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..."
101,A1:FHI-aims/PBE/tight:elastic:s_01_e_m0_05000,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.546222,"[[0.0, 0.0, 0.0]]",-3.546222,"{'_ind_i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..."
102,A3:FHI-aims/PBE/tight:elastic:s_01_e_m0_05000,"(Atom('Cu', [1.2150849, 0.70152958, 0.99981864...",-7.079448,"[[-3.5446112180968e-23, -1.13427558979097e-22,...",-3.539724,"{'_ind_i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..."
103,A2:FHI-aims/PBE/tight:elastic:s_01_e_m0_05000,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.513068,"[[0.0, 0.0, 0.0]]",-3.513068,"{'_ind_i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..."


Now our dataset is ready to be fitted. Move on to building the ACE model. 