# Heats of Formation Calculator

## This notebook calculates the Heat of Formation for each molecule in the QM9 dataset and returns a CSV file of the dataset with the counts of the C, H, O, N, and F and the Heats of Formation.

## Installing requirements

In [None]:
!curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py
!python get-pip.py

!pip install pysmiles
!pip install pandas

## Setting up a DataFrame with the QM9 dataset

Load QM9 from GitHub and create a dataframe to hold the values

In [None]:
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/OpenDrugAI/AttentiveFP/master/data/qm9.csv')
display(df)

## Testing pysmiles

pysmiles is a lightweight SMILES reader and writer. It allows us to get molecular information (like the constituent elements and their count) from a SMILES string.

In [None]:
from pysmiles import read_smiles

smiles = 'C#C'
mol = read_smiles(smiles, explicit_hydrogen=True)

print(mol.nodes(data='element'))

### Getting data about constituent atoms from the compound we retrieved

In [None]:
def get_elements(smiles):
    elements = {'C': 0, 'H': 0, 'O': 0, 'N': 0, 'F': 0}
    
    mol = read_smiles(smiles, explicit_hydrogen=True)

    for e in mol.nodes(data='element'):
        elements[e[1]] = elements[e[1]] + 1
    
    return elements

elements = get_elements('C')
print(elements)


## Standard Enthalpies of Formation

Values for the enthalpy of formation of each atom was found in “Wired Chemist.” Standard Enthalpies of Formation of Gaseous Atoms, www.wiredchemist.com/chemistry/data/enthalpies. 

In [None]:
def get_std_form_energy(elements):
    # in kcal/mol
    std_form_energies = {
        'C': 171.367,
        'H': 52.1033,
        'O': 59.5124,
        'N': 113.05,
        'F': 18.8815
        }
    
    energy = 0

    for e in elements:
        energy = energy + std_form_energies[e] * elements[e]
    
    return energy

print(get_std_form_energy(elements))


### Calculating counts of C,H,O,N,F and the heats of formation

We call the function "get_elements()" to return the count of the constituent elements of the molecule associated with the SMILES string passed in. We pass this information to "get_std_form_energy()" to get the experimental formation energy for the constituent elements. 

Equation used to calculate the Heat of Formation: ∆fH°(CmHn; 298 K)&nbsp;)&nbsp;m∆fH°exptl(C; 298 K)&nbsp;+&nbsp;n∆H°&nbsp;(H;298K)-[mH°&nbsp;(C;298K)+f exptl calcdnH°calcd(H; 298 K)&nbsp;-&nbsp;H°calcd(CmHn; 298 K)]

In [None]:
new_data = {'carbon': [], 'hydrogen': [], 'oxygen': [], 'nitrogen': [], 'fluorine': [], 'exp_enthalpy': [], 'hof': []}

for i in range(len(df)):
    molecule = df.loc[i, 'smiles']
    elements = get_elements(molecule)
    exp_enthalpy = get_std_form_energy(elements)
    
    new_data['carbon'].append(elements['C'])
    new_data['hydrogen'].append(elements['H'])
    new_data['oxygen'].append(elements['O'])
    new_data['nitrogen'].append(elements['N'])
    new_data['fluorine'].append(elements['F'])
    new_data['exp_enthalpy'].append(exp_enthalpy)
    
    # Calculating the HoF using internal energy at 298.15 K
    u298_atom = df.loc[i, 'u298_atom']
    new_data['hof'].append(u298_atom + exp_enthalpy)

    sys.stdout.write('\r')
    sys.stdout.write("[%-20s] %d%% %d/%d" % ('#'*int((i / (MAX_POINTS/20))), float(i/MAX_POINTS*100), i, MAX_POINTS))
    sys.stdout.flush()

new_df = pd.DataFrame(new_data)    
display(new_df)

### Append the new information to the QM9 DataFrame

In [None]:
df = pd.concat([df, new_df], axis=1)
display(df)

### Convert the updated QM9 DataFrame into a CSV file

In [None]:
df.to_csv('qm9_HoF.csv')

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=a5ca659c-bddb-45d3-82f8-9d4143ab06ce' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>