# Anatomy of an MD simulation script File
Recall there are many packages available to perform MD simulations often with different features and meant to support different system types. A few of the common codes include:
* [LAMMPS](http://LAMMPS.sandia.gov)
* [GROMACS](http://gromacs.org)
* [HOOMD-Blue](http://glotzerlab.engin.umich.edu/hoomd-blue/)
* [NAMD](https://www-s.ks.uiuc.edu/Research/namd/)
* [DLPoly](https://www.scd.stfc.ac.uk/Pages/DL_POLY.aspx)
* [CHARMM](https://www.charmm.org)
* [AMBER](https://ambermd.org)
* [OPENMM](https://openmm.org)
* [NWChem](https://www.nwchem-sw.org)
* [BOSS](http://zarbi.chem.yale.edu/software.html)
* [Tinker](https://tinkertools.org)
     
### Basic components of most script/control files
In general, MD simulation engines require the same basic information to be passed to the code, even if the format varies.  We can roughly break these inputs up into four main categories.

#### system initialization 
>definition of the box size, how many particles, particle types, particle initial positions, what particles are bonded to each other, the periodicity, and other system context information.  Typically, this involves providing the name of an data file that defines the system configuration

#### interaction definition 
>the functional and associated parameters form describing how different species interact with each other, cell/neighborlist information.

#### integrator setup
>what algorithm will we use to advance particles in time, time step of integration, thermodynamic state point (i.e., T and/or P).

#### runtime parameters 
>total simulation time, frequency of writing thermodynamic quantities and/or system configuration data, etc.



### Challenges
In general, each simulation engines employs its own unique syntax and file format for defining the inputs to the code.  While there is a growing trend of providing Python interfaces for the software, there still is not a uniform syntax or API for initializing simulations.  

For example, LAMMPS uses its own custom scripting language to define the simulation inputs, as well as their own novel file format for defining the configuration of the system. 

For example, HOOMD-Blue and OpenMM allow users to define the inputs and interact with the software using Python.  However, despite both codes using Python, their APIs are not interchangeable; that is, a HOOMD-Blue python script could not be used to setup and perform a OpenMM simulation. The codes also utilize different file formats for defining the system configuration.

Tools such as GROMACS and NAMD rely upon a different approach altogether, using a control files (files that define keys and their associated values) to define simulation run parameters and methods, rather than scripts.  

OpenMM is somewhat unique in that it provides routines to read not only their file format, but also AMBER, GROMACS, and CHARMM data files that contain configurations and force field information. However, a user must still write their own script file (support is not provided to automatically convert, e.g., a GROMACS .mdp control files).

Overall, this means that switching between different simulation engines is often non-trivial, requiring users to generate new data files and scripts for each code they may wish to use.  As will be discussed later, the [MosDeF](mosdef.org) toolkit has been developed to help address some of these issues in terms of making it easier to generate parameterized data files for different simulation engines and ensuring that such files are accurate.  



## Example HOOMD-Blue script file
The code below is example python script for HOOMD-Blue to perform a basic LJ simulation. This example is adapted from the tutorial found on in [HOOMD-Blue documentation](https://hoomd-blue.readthedocs.io/en/v3.2.0/tutorial/01-Introducing-Molecular-Dynamics/01-Molecular-Dynamics-Simulations.html) which provides a detailed examination of the structure of the script file. Note, the simulation configuration is defined in the file 'lj.gsd' which was created using the [mBuild library](mbuild.mosdef.org), which will be discussed in more detail later in the cybercamp. 

Because HOOMD-Blue uses python and the code is readily available via conda, we can run this script directly from the notebook.

In [None]:
import hoomd
import gsd.hoomd


cpu = hoomd.device.CPU()
sim = hoomd.Simulation(device=cpu, seed=1)

#########################
# system initialization
#########################

sim.create_state_from_gsd(filename='datafiles/lj.gsd')

#########################
# interaction definition
#########################

cell = hoomd.md.nlist.Cell(buffer=1.0)
lj = hoomd.md.pair.LJ(nlist=cell)
lj.params[('LJ', 'LJ')] = dict(epsilon=1, sigma=1)
lj.r_cut[('LJ', 'LJ')] = 2.5

#########################
# integrator setup
#########################

nvt = hoomd.md.methods.NVT(kT=1.5, filter=hoomd.filter.All(), tau=1.0)
integrator = hoomd.md.Integrator(dt=0.005)
integrator.forces.append(lj)
integrator.methods.append(nvt)
sim.operations.integrator = integrator

#########################
# runtime parameters
#########################

log_traj = hoomd.logging.Logger()
gsd_traj = hoomd.write.GSD(filename='trajectory.gsd',
                             trigger=hoomd.trigger.Periodic(5000),
                             mode='wb',
                             filter=hoomd.filter.All(),
                             log=log_traj)
sim.operations.writers.append(gsd_traj)

thermodynamic_properties = hoomd.md.compute.ThermodynamicQuantities(filter=hoomd.filter.All())
sim.operations.computes.append(thermodynamic_properties)

log_thermo = hoomd.logging.Logger()
log_thermo.add(thermodynamic_properties)
gsd_thermo = hoomd.write.GSD(filename='thermo.gsd',
                             trigger=hoomd.trigger.Periodic(100),
                             mode='wb',
                             filter=hoomd.filter.Null(),
                             log=log_thermo)
sim.operations.writers.append(gsd_thermo)


sim.run(50000)

hoomd.write.GSD.write(state=sim.state, filename='final.gsd', mode='wb')
print("simulation complete")

#delete the instances we defined ensure writers are closed
del sim, gsd_thermo, gsd_traj, thermodynamic_properties, log_thermo, log_traj
del integrator, nvt, lj, cell, cpu

With the simulation completed, we can examine various thermodynamic properties that have been saved to the thermo.gsd file.  Below we will loop over each frame in the trajectory and to create two arrays
* ```time``` which contains the simulation time step
* ```pe``` which contains the system potential energy 

In [None]:
thermo_log = gsd.hoomd.open('thermo.gsd', 'rb')

time = []
pe = []
for frame in thermo_log:
    time.append(frame.configuration.step)
    pe.append(frame.log['md/compute/ThermodynamicQuantities/potential_energy'][0])


In [None]:
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('default')


In [None]:
plt.plot(time, pe)
plt.ylabel('potential energy')
plt.xlabel('timestep')

plt.show()

We can quickly render a short movie of the trajectory using the Fresnel package.  

In [None]:
from fresnel_render import render
from fresnel_render import render_movie

traj_log = gsd.hoomd.open('trajectory.gsd', 'rb')

render_movie(traj_log)

## Example LAMMPS input files

Below is an example of a LAMMPS input script file for performing a simple Lennard Jones simulation. Here, each of the commands associated with  the four main categories are labeled. The code below performs effectively the same simulation as the HOOMD-Blue example. For more information on LAMMPS input script files, see the official [documentation](https://docs.lammps.org/Commands_structure.html). 


```
#########################
# system initialization
#########################

units lj
atom_style full
dimension 3
boundary p p p
read_data datafiles/data.lj.txt

#########################
# interaction definition
#########################

pair_style lj/cut 2.5
pair_coeff * * 1.0 1.0
neighbor 1.0 bin
neigh_modify  check yes

#########################
# integrator setup
#########################

fix 1 all nvt temp 1.5 1.5 1.0
timestep 0.005

#########################
# runtime parameters
#########################

thermo 100
dump dump_traj all custom 5000 trajectory.lammpstrj id type  x  y  z
run 50000

write_restart final.restart
```

The code above loads in the system configuration (particle coordinates, box size, bonded connections, etc.) from an external file, here 'data.lj.txt'.  Rules for formatting the data file can found in the [LAMMPS documentation about the read_data command](https://docs.lammps.org/read_data.html).  The first 30 lines of the configuration input file are shown below. I note, this input final was generated using the [mbuild library](http://mbuild.mosdef.org), which will ne discussed in detail later in the cybercamp.

In [None]:
!cat datafiles/data.lj.lammps | head -n 30

# Exercises
As we saw in the example of the ball falling, if we use too large of a timestep (dt), we miss important details.  In the ball falling example, we were not being able to precisely identify when the ball contacts the ground.  For an MD simulation, the same type of issue can occur if the timestep is too large; typically, this means that particles move too close together, resulting in in extremely larger forces between particles in the system causing system failure (i.e., the system to "blow up").   The code below plots the LJ interaction as a function of distance (showing both the energy, `U` and force, `F`), where we can see the force grows very rapidly as the separation reduces below the value of sigma (here, $\sigma= 1$).  

In [None]:
import numpy as np

epsilon = 1.0
sigma = 1.0

r_min = 0.6
r_max = 3.0
steps = 1000
r_step = (r_max-r_min)/steps

U = np.zeros(steps)
F = np.zeros(steps)
r  = np.zeros(steps)
for i in range (0, steps):
        
        r[i] = r_min + r_step*i
        U[i] = 4*epsilon*((sigma/r[i])**12 - (sigma/r[i])**6)
        F[i] = 48*(epsilon/sigma)*((sigma/r[i])**13 - 0.5*(sigma/r[i])**7)

        
plt.plot(r, U, c='blue', label='U(r)')
plt.plot(r, F, c='red', label='F(r)')

plt.ylabel('U(r) or F(r)')
plt.xlabel('r')
plt.ylim(-3, 10)
plt.legend(loc='upper right')

plt.show()

## Exercise 1

Modify the timestep (e.g., increasing it from 0.005 to 0.0075, 0.01, 0.0125, 0.015, etc.) in the code below.  At what value do you observe a failure? While there are other factors to consider when setting the timestep, ensuring the simulation doesn't blow up is the minimal criteria that should be applied.


In [None]:
import hoomd
import gsd.hoomd


cpu = hoomd.device.CPU()
sim = hoomd.Simulation(device=cpu, seed=1)

#########################
# system initialization
#########################

sim.create_state_from_gsd(filename='datafiles/lj.gsd')

#########################
# interaction definition
#########################

cell = hoomd.md.nlist.Cell(buffer=1.0)
lj = hoomd.md.pair.LJ(nlist=cell)
lj.params[('LJ', 'LJ')] = dict(epsilon=1, sigma=1)
lj.r_cut[('LJ', 'LJ')] = 2.5

#########################
# integrator setup
#########################

nvt = hoomd.md.methods.NVT(kT=0.8, filter=hoomd.filter.All(), tau=1.0)
integrator = hoomd.md.Integrator(dt=0.005)
integrator.forces.append(lj)
integrator.methods.append(nvt)
sim.operations.integrator = integrator

#########################
# runtime parameters
#########################

log_traj = hoomd.logging.Logger()
gsd_traj = hoomd.write.GSD(filename='trajectory.gsd',
                             trigger=hoomd.trigger.Periodic(5000),
                             mode='wb',
                             filter=hoomd.filter.All(),
                             log=log_traj)
sim.operations.writers.append(gsd_traj)

thermodynamic_properties = hoomd.md.compute.ThermodynamicQuantities(filter=hoomd.filter.All())
sim.operations.computes.append(thermodynamic_properties)

log_thermo = hoomd.logging.Logger()
log_thermo.add(thermodynamic_properties)
gsd_thermo = hoomd.write.GSD(filename='thermo.gsd',
                             trigger=hoomd.trigger.Periodic(100),
                             mode='wb',
                             filter=hoomd.filter.Null(),
                             log=log_thermo)
sim.operations.writers.append(gsd_thermo)


sim.run(10000)
print("simulation complete")
#delete the instances we defined ensure writers are closed
del sim, gsd_thermo, gsd_traj, thermodynamic_properties, log_thermo, log_traj
del integrator, nvt, lj, cell, cpu

## Exercise 2

Using the same code you used to adjust the timestep, reset the timestep to 0.005. Change the temperature of the simulation to be 0.8 and  plot the potential energy as well (using the code provided below).  Based upon potential energy, was the simulation run time sufficient to reach a steady state? 


Compare the mean potential energy at the end of the simulation to T=1.5.   What do you think the cause of the difference is?

Render the movie of the trajectory below. How does this visually compare to the temperature of 1.5, previously examined above? 

In [None]:
thermo_log = gsd.hoomd.open('thermo.gsd', 'rb')

time = []
pe = []
for frame in thermo_log:
    time.append(frame.configuration.step)
    pe.append(frame.log['md/compute/ThermodynamicQuantities/potential_energy'][0])


In [None]:
plt.plot(time, pe)
plt.ylabel('potential energy')
plt.xlabel('timestep')

plt.show()

In [None]:
traj_log = gsd.hoomd.open('trajectory.gsd', 'rb')

render_movie(traj_log)