# Demo - Storing information in EEX 

## Requirements for Molecular Dynamics Simulations

Molecular dynamics (MD) simulations are widely utilized in chemistry and materials science research. A number of softwares exist to perform simulations - each one using specific file formats. However, the basic information required by each program is the same. Equations and constants are defined which specify the interaction energy between atoms.

In general, the energy expression for MD programs is a sum over all bonds, angles, dihedrals, and nonbonded interactions

\begin{equation*} 
\ U_{total} = \sum_{bonds}{E_{bonds}} + \sum_{angles}{E_{angles}} + \sum_{dihedrals}{E_{dihedrals}} + \sum_{i=1}^{N-1}{\sum_{j=i+1}^{N}{E_{ij} }}
\end{equation*}

Where the forms of $E$ vary depending on the force field.

The parameters which fully define the energy expression are:

1. Topology of system (lists of atoms, bonds, angles, dihedrals and impropers)
2. The functional form of the terms of the energy expression
    e.x. - Bonds can be defined by a harmonic equation:
    <center>
    $E_{bonds} = {k_{b}(r-r_{0})^2}$
    </center>
    <br>
3. The numerical value of constants associated with the functional forms  
   e.x. - Keeping with the above example, we might define ${k_{b}}$ to have a value of ${2.0} kcal  mol^{-1} \mathring A^{-2}$ for carbon - carbon bonds, with an equilbrium distance of 1.54 $\mathring{A}$. The equation defining the bond would become <br>
   <center>
    $E_{bonds} = {2.0(r-1.54)^2}$
   </center>
   <br><br>
4. Atomic Cartesian coordinates

The specification of ${1-3}$ allow for the definition of an energy expression where the internal and external degrees of freedom are variables. Additionally, the atomic Cartesian coordinates fully define the energy expression, as they allow for the calculation of the relevant degrees of freedom of the system and thus the total energy can be computed. Typically, in a classical simulation only ${4}$ changes as the simulation progresses.

Conversions of the energy expressions of systems between different MD programs essentially require reading and writing this information in the correct format for the desired softwares. 

## EEX's Approach

The Energy Expression eXchange (EEX) is a Python program which converts the energy expression among different molecular dynamics programs. EEX uses a plug in architecture that makes the application modular, customizable, and extensible (instead of writing $N^2$, we write only $N$ translators). There are four key elements in the EEX program

1. __Reader Plugins.__ Python modules that parse the relevant energy expression information from different MD programs and populate EEX's internal representation of the energy expression.
2. __Data layer (Host application).__ Contains EEX's internal representation of the energy expression and methods that define the interface of such representation with the external world (i.e. the various reader/writer plugins or Python scripts that use EEX's API).   
3. __Writer Plugins.__ Python modules that retrieve the EEX's energy expression representation into different MD formats.
4. __Internal metadata.__ Relevant information that helps to define the functional forms for all the degrees of freedom (two, three, four body terms and non-bonded terms) as well as other important simulation parameters, such as which electrostatic method to use, cut-offs, box information, etc.

In this demo, we will demonstrate how to create a system (united atom butane molecule) in EEX. We first specify atomic positions and topology (bonded atoms), then give bond, angle, and dihedral functional forms and parameters. 

In order to ensure the system specifications to match, we compare the calculated system energies. Identical systems will have identical energies.

In [1]:
import eex
import os
import pandas as pd
import numpy as np

In [2]:
# Create empty data layer
dl = eex.datalayer.DataLayer("butane", backend="Memory")
dl.summary()

EEX DataLayer Object

System name: butane
----------------------------------------------
Atom Count:                 0
Bond Count:                 0
Angle Count:                0
Dihedral Count:             0
----------------------------------------------
Number of bond parameters:     0
Number of angle parameters:    0
Number of dihedral parameters: 0
----------------------------------------------


In [3]:
"""
First, we add atoms to the system. Atoms have associated metadata. The possible atom metadata is listed here.

"""

dl.list_valid_atom_properties()


['molecule_index',
 'atom_name',
 'atom_type',
 'atomic_number',
 'charge',
 'xyz',
 'mass',
 'residue_index',
 'residue_name']

In [4]:
"""
TOPOLOGY:

Information can be added to the datalayer in the form of pandas dataframes. Here, we add atom metadata. 

The name of the column corresponds to the atom property.

Populate empty dataframe with relevant information and add to EEX datalayer

"""
# Create empty dataframe
atom_df = pd.DataFrame()

# Create atomic system using pandas dataframe
atom_df["atom_index"] = np.arange(0,4)
atom_df["molecule_index"] = [int(x) for x in np.zeros(4)]
atom_df["residue_index"] = [int(x) for x in np.zeros(4)]
atom_df["atom_name"] = ["C1", "C2", "C3", "C4"]
atom_df["charge"] = np.zeros(4)
atom_df["atom_type"] = [1, 2, 2, 1]
atom_df["X"] = [0, 0, 0, -1.474]
atom_df["Y"] = [-0.4597, 0, 1.598, 1.573]
atom_df["Z"] = [-1.5302, 0, 0, -0.6167]
atom_df["mass"] = [15.0452, 14.02658, 14.02658, 15.0452]
# Add atoms to datalayer
dl.add_atoms(atom_df, by_value=True)

# Print datalayer information
dl.summary()

# Print stored atom properties
dl.get_atoms(properties=None, by_value=True)

EEX DataLayer Object

System name: butane
----------------------------------------------
Atom Count:                 4
Bond Count:                 0
Angle Count:                0
Dihedral Count:             0
----------------------------------------------
Number of bond parameters:     0
Number of angle parameters:    0
Number of dihedral parameters: 0
----------------------------------------------


Unnamed: 0_level_0,molecule_index,atom_name,atom_type,charge,X,Y,Z,mass,residue_index
atom_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,0,C1,1,0.0,0.0,-0.4597,-1.5302,15.0452,0
1,0,C2,2,0.0,0.0,0.0,0.0,14.02658,0
2,0,C3,2,0.0,0.0,1.598,0.0,14.02658,0
3,0,C4,1,0.0,-1.474,1.573,-0.6167,15.0452,0


In [5]:
"""
TOPOLOGY:

The EEX datalayer now contains four nonbonded atoms. To create butane, atoms must be bonded
to one another.

Add bonds to system

"""

# Create empty dataframes for bonds
bond_df = pd.DataFrame()

# Create column names. Here, "term_index" refers to the bond type index.
# i.e. - if all bonds are the same, they will have the same term index
bond_column_names = ["atom1", "atom2", "term_index"]

# Create corresponding data. The first row specifies that atom0 is bonded
# to atom 1 and has bond id 0
bond_data = np.array([[0, 1, 0,],
        [1, 2, 0],
        [2, 3, 0]])

for num, name in enumerate(bond_column_names):
    bond_df[name] = bond_data[:,num]

dl.add_bonds(bond_df)

In [6]:
dl.summary()

EEX DataLayer Object

System name: butane
----------------------------------------------
Atom Count:                 4
Bond Count:                 3
Angle Count:                0
Dihedral Count:             0
----------------------------------------------
Number of bond parameters:     0
Number of angle parameters:    0
Number of dihedral parameters: 0
----------------------------------------------


In [7]:
"""
TOPOLOGY:

Add angles and dihedrals to system.
"""

# Follow similar procedure as for bonds

angle_df = pd.DataFrame()
dihedral_df = pd.DataFrame()

angle_column_names = ["atom1", "atom2", "atom3", "term_index"]
dihedral_column_names = ["atom1", "atom2", "atom3", "atom4", "term_index"]

angle_data = np.array([[0, 1, 2, 0,], 
                       [1, 2, 3, 0],])

dihedral_data = np.array([[0, 1, 2, 3, 0,]])

for num, name in enumerate(angle_column_names):
    angle_df[name] = angle_data[:,num]

dl.add_angles(angle_df)
    
for num, name in enumerate(dihedral_column_names):
    dihedral_df[name] = dihedral_data[:,num]
    
dl.add_dihedrals(dihedral_df)

In [8]:
dl.summary()

EEX DataLayer Object

System name: butane
----------------------------------------------
Atom Count:                 4
Bond Count:                 3
Angle Count:                2
Dihedral Count:             1
----------------------------------------------
Number of bond parameters:     0
Number of angle parameters:    0
Number of dihedral parameters: 0
----------------------------------------------


## Storing force field information

So far, only the topology and coordinates of the system are specified, and we are not able to calculate an energy.

To calculate the energy, we need to define the functional form of bond, angle, dihedral, and nonbonded interactions and the associated constants.

In this demo, we store the parameters for the TraPPE United Atom forcefield with harmonic bonds.

\begin{equation*}
\ U_{total} = \sum_{bonds}{k_{b}(r-r_{0})^2} + \sum_{angles}{k_{\theta} (\theta - \theta_{eq} )^{2}} + \sum_{dihedrals}{c_{1}[1 + cos(\phi)] + c_{2}[1 - cos(2\phi)] + c_{3}[1 + cos(3\phi)]} + \sum_{i=1}^{N-1}{\sum_{j=i+1}^{N}{ 4\epsilon_{ij}[(\frac{\sigma_{ij}}r_{ij})^{12} - (\frac{\sigma_{ij}}r_{ij})^6] }}
\end{equation*}

In [9]:
"""
FORCE FIELD PARAMETERS
"""

# Here, in add_term_parameter, the first argument is the term order. '2'
# corresponds to bonded atoms.

dl.add_term_parameter(2, "harmonic", {'K': 300.9, 'R0': 1.540}, uid=0, utype={'K':"kcal * mol **-1 * angstrom ** -2",
                                                                         'R0': "angstrom"})

## Add harmonic angle parameters
dl.add_term_parameter(3, "harmonic", {'K': 62.100, 'theta0': 114}, uid=0, utype={'K':'kcal * mol ** -1 * radian ** -2',
                                                                             'theta0': 'degree'})

# Add OPLS dihedral parameter
dl.add_term_parameter(4, "opls", {'K_1': 1.41103414, 'K_2': -0.27101489, 
                                  'K_3': 3.14502869, 'K_4': 0}, uid=0, utype={'K_1': 'kcal * mol ** -1',
                                                                               'K_2': 'kcal * mol ** -1',
                                                                               'K_3': 'kcal * mol ** -1',
                                                                               'K_4': 'kcal * mol ** -1'})

# Add NB parameters
dl.add_nb_parameter(atom_type=1, atom_type2=1, nb_name="LJ", 
                    nb_model="AB", nb_parameters=[1.0, 1.0])

dl.add_nb_parameter(atom_type=1, atom_type2=2, nb_name="LJ", 
                    nb_model="epsilon/sigma", nb_parameters=[1.0, 1.0])

dl.add_nb_parameter(atom_type=2, atom_type2=2, nb_name="LJ", 
                    nb_model="epsilon/sigma", nb_parameters=[1.0, 1.0])

dl.add_nb_parameter(atom_type=3, atom_type2=1, nb_name="LJ", 
                    nb_model="epsilon/sigma", nb_parameters=[1.0, 1.0])

True

In [10]:
dl.summary()

EEX DataLayer Object

System name: butane
----------------------------------------------
Atom Count:                 4
Bond Count:                 3
Angle Count:                2
Dihedral Count:             1
----------------------------------------------
Number of bond parameters:     1
Number of angle parameters:    1
Number of dihedral parameters: 1
----------------------------------------------


In [11]:
# Evaluate system energy
energy_system1 = dl.evaluate(utype="kcal * mol ** -1")

## Reading from MD input files

Typically, this storage would not be done by hand as shown above. Instead, readers and writers for specific softwares are used.

Below, we use the plugin for AMBER with EEX to read in an amber file with information for the a butane molecule which is equivalent to the one created in the first datalayer. The EEX translator stores all information from the amber prmtop and inpcrd files in the datalayer using the amber reader.

In [12]:
# Preview an amber prmtop (parameter-topology file) for Amber.
butane_file = os.path.join("..", "examples", "amber","alkanes", "trappe_butane_single_molecule.prmtop")

f = open(butane_file)
print(f.read())
f.close()

%VERSION  VERSION_STAMP = V0001.000  DATE = 11/13/17  15:35:01                  
%FLAG TITLE                                                                     
%FORMAT(20a4)                                                                   
Cpptraj                                                                         
%FLAG POINTERS                                                                  
%FORMAT(10I8)                                                                   
       4       2       0       3       0       2       0       4       0       0
       7       1       3       2       4       2       1       4       2       0
       0       0       0       0       0       0       0       0       4       0
       0
%FLAG ATOM_NAME                                                                 
%FORMAT(20a4)                                                                   
C1  C2  C3  C4  
%FLAG CHARGE                                                                    
%F

In [13]:
# Create new datalayer and populate using amber reader

dl_amber = eex.datalayer.DataLayer("butane_amber")
eex.translators.amber.read_amber_file(dl_amber, butane_file)

dl_amber.summary()

EEX DataLayer Object

System name: butane_amber
----------------------------------------------
Atom Count:                 4
Bond Count:                 3
Angle Count:                2
Dihedral Count:             4
----------------------------------------------
Number of bond parameters:     2
Number of angle parameters:    1
Number of dihedral parameters: 4
----------------------------------------------


## Comparing the two datalayers
The summary shows a system with 4 atom, 3 bonds, 2 angles and 4 dihedrals.  This differs from the first datalayer in the number of dihedrals and the number of dihedral parameters. However, the evaluated system energy is equivalent.

This is because AMBER stores dihedral angles using a different functional form. Instead of a single equation, dihedrals with multiple terms are built from multiple harmonic equations. The equations are equivalent when evaluated.

\begin{equation*} 
\sum_{dihedrals}{(V_{0}[1 + cos(0\phi)]}) + \sum_{dihedrals}{(V_{1}[1 + cos(n\phi)]}) + \sum_{dihedrals}{(V_{2}[1 + cos(2\phi - \pi)]}) + \sum_{dihedrals}{(V_{3}[1 + cos(3\phi )]}) = \sum_{dihedrals}{c_{0} + c_{1}[1 + cos(\phi)] + c_{2}[1 - cos(2\phi)] + c_{3}[1 + cos(3\phi)]}
\end{equation*}

In [14]:
energy_system2 = dl_amber.evaluate(utype="kcal * mol ** -1")

In [15]:
for k in energy_system1:
    energy_difference = energy_system1[k] - energy_system2[k]
    print(k," difference:\t %.3f" % energy_difference)

two-body  difference:	 -0.000
three-body  difference:	 -0.000
four-body  difference:	 0.000
total  difference:	 -0.000
