# Computing Energies with Second Calculator
As part of this assignment, our team is using the [ISO17 dataset](http://quantum-machine.org/datasets/) which already has the DFT energies and forces for a bunch of molecules.

Here, we show how to read those structures from ASE and compute a new set of energies

In [1]:
from ase.db import connect
from random import random
from tqdm import tqdm
import pandas as pd
import torchani

Random subset

In [2]:
select_fraction = 0.1  # Get only these many molecules

## Connect to the database
The training data is held in an ASE database, from which we can read out data in the ASE atoms format.

In [3]:
db = connect('iso17/reference.db')
print(f'Connected to a databae with {len(db)} entries')

Connected to a databae with 404000 entries


You can run queries on this database and loop through it

In [4]:
for row in db.select(limit=10):
    continue

Get row contains information about a specific conformation of a molecule, such as its position and energy

In [5]:
row.positions

array([[-3.11646501,  1.89986285,  3.0495241 ],
       [-2.21817165,  1.97908439,  1.99830164],
       [-0.78832964,  1.38052355,  2.12453068],
       [-0.68239071,  1.35022847,  0.59146194],
       [-0.2802637 ,  0.01290723, -0.06247984],
       [-0.13454049,  2.60224892, -0.10084665],
       [-1.05225141,  3.67823066,  0.31274451],
       [-2.31316433,  2.91253363,  0.70081334],
       [-2.28285394,  1.40649469,  0.52789212],
       [-2.71081245,  1.2190427 ,  3.61083382],
       [-0.06270836,  1.91015879,  2.8383502 ],
       [-0.84189361,  0.39963048,  2.46140193],
       [ 0.59547256,  0.09324091, -0.41709544],
       [ 0.841095  ,  2.85777003,  0.14659712],
       [-0.36648051,  2.44966582, -1.20104078],
       [-0.68128343,  4.15640562,  1.1964438 ],
       [-1.29154858,  4.38039313, -0.47507581],
       [-3.24327275,  3.43178577,  0.65829838],
       [-2.81851911,  0.66727706, -0.10936832]])

In [6]:
row.total_energy

-11504.0223943466

It is often convenient to just combine them into an [Atoms object](https://wiki.fysik.dtu.dk/ase/ase/atoms.html) we can manipulate easily.

In [7]:
atoms = row.toatoms()
atoms

Atoms(symbols='OC3OC4H10', pbc=False)

## Computing new Energies
We can use the Atoms object from each row to compute new properties, such as by using the interatomic potential from TorchANI to generate a different energy.

In [8]:
calc = torchani.models.ANI2x().ase()

  return _VF.cartesian_prod(tensors)  # type: ignore[attr-defined]


In [9]:
new_energy = calc.get_potential_energy(atoms)
print(f'The DFT energy for this geometry is {row.total_energy:.2f} and the ANI energy is {new_energy:.2f}')

The DFT energy for this geometry is -11504.02 and the ANI energy is -11509.91


  dtype=self.dtype, device=self.device)
  pair_sizes = counts * (counts - 1) // 2


## Combining it together
Let's make a big array where we store all of energies

In [10]:
data = []
for row in tqdm(db.select(limit=1000), total=db.count()):  # Remove limit if you want to read the whole dataset
    # Skip a certain proportion of the entries
    if random() < select_fraction:
        continue
        
    # For those we don't skip, compute a new energy
    atoms = row.toatoms()
    ani_energy = calc.get_potential_energy(atoms)
    
    # Save them to a list
    data.append({
        'atoms': atoms,
        'ani_energy': ani_energy,
        'dft_energy': row.total_energy
    })

  0%|▍                                                                                                                                                                                              | 1000/404000 [00:08<58:15, 115.30it/s]


In [11]:
data = pd.DataFrame(data)

Now safe it to disk in Pickle format so that we can read them in later

In [12]:
data.to_pickle('train_data.pkl.gz')  # ".gz" means that Pandas will compress this data for you (it's really nice!)