# Performing a molecular dynamics simulation with OpenMM

In this lab, we will perform a molecular dynamics simulations of a protein-ligand complex prepared in a previous lab.

This lab was adapted from the end of [talktorial 19 of teachopencadd](https://github.com/volkamerlab/teachopencadd/tree/master/teachopencadd/talktorials/T019_md_simulation) to simulate MPro instead of EGFR.

### References

- Review on the impact of MD simulations in drug discovery ([_J Med Chem_ (2016), **59**(9), 4035‐4061](https://doi.org/10.1021/acs.jmedchem.5b01684))
- Review on the physics behind MD simulations and best practices ([_Living J Comp Mol Sci_ (2019), **1**(1), 5957](https://doi.org/10.33011/livecoms.1.1.5957))
- Review on force fields ([_J Chem Inf Model_ (2018), **58**(3), 565-578](https://doi.org/10.1021/acs.jcim.8b00042))
- Review on EGFR in cancer ([_Cancers (Basel)_ (2017), **9**(5), 52](https://dx.doi.org/10.3390%2Fcancers9050052))
- Summarized statistical knowledge from Pierre-Simon Laplace ([Théorie Analytique des Probabilités _Gauthier-Villars_ (1820), **3**)](https://archive.org/details/uvrescompltesde31fragoog/page/n15/mode/2up)
- Inspired by a notebook form Jaime Rodríguez-Guerra ([github](https://github.com/jaimergp/uab-msc-bioinf/blob/master/MD%20Simulation%20and%20Analysis%20in%20a%20Notebook.ipynb))
- Repositories of [OpenMM](https://github.com/openmm/openmm) and [OpenMM Forcefields](https://github.com/openmm/openmmforcefields), [RDKit](https://github.com/rdkit/rdkit), [PyPDB](https://github.com/williamgilpin/pypdb), [MDTraj](https://github.com/mdtraj/mdtraj), [PDBFixer](https://github.com/openmm/pdbfixer)
- Wikipedia articles about [MD simulations](https://en.wikipedia.org/wiki/Molecular_dynamics), [AMBER](https://en.wikipedia.org/wiki/AMBER) and [force fields](https://en.wikipedia.org/wiki/Force_field_(chemistry)) in general

## Theory

### Molecular dynamics

Molecular dynamics is a computational method analyzing the movements and interactions of atoms and molecules of a defined system. The method stems from theoretical physics, where it was developed in the 1950s (Alder and Wainwright in [_J Chem Phys_ (1959), **31**(2), 459](https://doi.org/10.1063/1.1730376)), although the ideas behind it can be dated much earlier:

> An intelligence which could, at  any moment, comprehend all the forces by  which nature is animated and the  respective positions of the  beings of which it is  composed, and moreover, if this intelligence were far-reaching enough to subject these data to  analysis, it would encompass in that formula both the movements of the  largest bodies in  the universe and those of the lightest atom: to it nothing would be uncertain, and the  future, as well as the past, would be present to its eyes. The human mind offers us, in the perfection which it has  given to  astronomy, a faint sketch of this intelligence. (Pierre-Simon Laplace, 1820)


Let us just take this statement by Laplace as the ideological substrate underneath molecular dynamics simulations. In other terms, we can approximate the behavior of a physical system by knowing the characteristics of its components and applying Newton's laws of motion. By solving the equations of motion, we can obtain a molecular trajectory of the system, which is a series of snapshots with the positions and velocities of all its particles, as well as its potential energy. To do so, we define functions, called force fields, which provide an approximate description of all the forces applied to each particle in the system. We then use numerical integrators to solve the initial value problem for the system and obtain the trajectory. As it sounds, the process requires quite a bit of processing power and it was only few years ago that MD started seeing a more widespread use, especially in the field of computational chemistry and biology, as well as in drug discovery ([_J Med Chem_ (2016), **59**(9), 4035‐4061](https://doi.org/10.1021/acs.jmedchem.5b01684)).

![MD_rotor_250K_1ns.gif](https://github.com/volkamerlab/teachopencadd/raw/d1ded86bb2c82ef088cc5145d0bcb997f6eab7dd/teachopencadd/talktorials/018_md_simulation/images/MD_rotor_250K_1ns.gif)

**Figure 1**: Molecular dynamics simulation of the rotation of a supramolecule composed of three molecules in a confined nanoscopic pore (Palma et al. via [Wikimedia](https://commons.wikimedia.org/w/index.php?curid=34866205)).

### MD simulations and drug design

MD simulations give valuable insights into the highly dynamic process of ligand binding to their target. When a ligand (or a drug) approaches a macromolecule (protein) in solution, it encounters a structure in constant motion. Also, ligands may induce conformational changes in the macromolecule that can best accommodate the small molecule. Such conformations may not be discovered with static methods. Accordingly, binding sites that are not observed in static ligand-free structures, but can be discovered with MD simulations, are sometimes called *cryptic binding sites* ([_J Med Chem_ (2016), **59**(9), 4035‐4061](https://doi.org/10.1021/acs.jmedchem.5b01684)). The identification of such binding sites with MD simulation can kickstart new drug discovery campaigns. Later in the drug discovery process, MD simulations can also be used to estimate the quality of computationally identified small molecules before performing more costly and time-intensive *in vitro* tests. Altogether, MD simulations pose a valuable asset in computational drug design.

In this lab, we use the PDB structure **7VH8** of this MPro, which is in complex with the small molecule inhibitor **nirmatrelvir**, to perform an MD simulation ([PDB: 7VH8](https://www.rcsb.org/structure/7vh8)).

## Practical

We will now proceed to perform an MD simulation using the molecular dynamics engine [OpenMM](https://github.com/openmm/openmm), a high performance toolkit for molecular simulation. It is open source and can be used as application or library. We will use it as Python library.

### Installation on Google Colab

The following code cells will install all required packages, if you are working on [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb). Installing the [condacolab](https://github.com/jaimergp/condacolab) package will restart the kernel, which is intended. This notebook can also be used on a local computer but requires considerable computing power.

In [1]:
try:
    import google.colab
    !pip install condacolab
    import condacolab
    condacolab.install()
except ModuleNotFoundError:
    pass

Collecting condacolab
  Downloading condacolab-0.1.3-py3-none-any.whl (6.8 kB)
Installing collected packages: condacolab
Successfully installed condacolab-0.1.3
⏬ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:28
🔁 Restarting kernel...


In [11]:
try:
    import condacolab
    from google.colab import files
    from IPython.display import clear_output
    condacolab.check()
    !conda install -q -y -c conda-forge openmm mdtraj
    # !conda install -q -y -c conda-forge mdtraj openmm openmmforcefields openff-toolkit pdbfixer pypdb rdkit
except ModuleNotFoundError:
    on_colab = False
else:
    #check if installation was succesful
    try:
        import openmm
        on_colab = True
        clear_output()  # clear the excessive installation outputs
        print("Dependencies successfully installed!")
    except ModuleNotFoundError:
        print("Error while installing dependencies!")

Dependencies successfully installed!


Next, let's make sure the workshop repository is on your Google Drive and up to date.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

GitHub_dir = '/content/drive/MyDrive/GitHub'
import os
if not os.path.isdir(GitHub_dir):
  !mkdir -p {GitHub_dir}

os.chdir(GitHub_dir)
if not os.path.isdir(os.path.join(GitHub_dir,'modelingworkshop')):
  !git clone https://github.com/CCBatIIT/modelingworkshop
else:
  os.chdir(os.path.join(GitHub_dir,'modelingworkshop'))
  !git pull origin main

Mounted at /content/drive
From https://github.com/CCBatIIT/modelingworkshop
 * branch            main       -> FETCH_HEAD
Already up to date.


### Import dependencies

In [4]:
import os
import time

labs_dir = '/content/drive/MyDrive/GitHub/modelingworkshop/labs-complete'
lab_dir = os.path.join(labs_dir, '4-1')
if not os.path.isdir(lab_dir):
  os.mkdir(lab_dir)
os.chdir(lab_dir)


In [2]:
import openmm as mm
import openmm.app as app
from openmm import unit

#### System

In a previous lab, we created an [OpenMM System](http://docs.openmm.org/development/api-python/generated/openmm.openmm.System.html#openmm.openmm.System) and set up the simulation. Here we will load it and perform a molecular dynamics simulation.

In [8]:
xml_FN = os.path.join(labs_dir, '3-1', 'MPro-nirmatrelvir-solv.xml')
pdb_FN = os.path.join(labs_dir, '3-1', 'MPro-nirmatrelvir-solv.pdb')

system = openmm.openmm.XmlSerializer.deserialize(open(xml_FN,'r').read())
pdb_reader = app.pdbfile.PDBFile(pdb_FN)

integrator = mm.LangevinIntegrator(
    300 * unit.kelvin, 1.0 / unit.picoseconds, 2.0 * unit.femtoseconds
)
simulation = app.Simulation(pdb_reader.getTopology(), system, integrator)
simulation.context.setPositions(pdb_reader.getPositions())

### Perform the MD simulation
Now that everything is set up, we can perform the simulation. We need to set starting positions and minimize the energy of the system to get a low energy starting configuration, which is important to decrease the chance of simulation failures due to severe atom clashes. The energy minimized system is saved.

In [9]:
simulation.minimizeEnergy()
with open(os.path.join(lab_dir, 'MPro-nirmatrelvir-solv.pdb'), "w") as pdb_file:
    app.PDBFile.writeFile(
        simulation.topology,
        simulation.context.getState(getPositions=True, enforcePeriodicBox=True).getPositions(),
        file=pdb_file,
        keepIds=True,
    )

Once the minimization has finished, we can perform the MD simulation. In this lab, we will do a short simulation for illustration. Simulations for research purposes span several nanoseconds, even up to microseconds. We will simulate only 100 ps of molecular dynamics corresponding to 50k steps of 2 fs each. We save molecular "snapshots" every 10 ps (5000 steps), for a total of 10 frames. The results are saved in an .xtc file, which contains the coordinates of all the atoms at a given time point. Together with the PDB file of the energy minimized system written before, it gives us all the information needed for later analysis. 

**Note**: This lab will only generate a 20 fs trajectory, if not on Google Colab. However, if you have a good GPU available, you can also increase the simulation time.

In [14]:
# output settings
if on_colab:
    steps = 50000  # corresponds to 100 ps
    write_interval = 5000  # write every 10 ps
    log_interval = 2500  # log progress to stdout every 5 ps
else:
    steps = 10  # corresponds to 20 fs
    write_interval = 1  # write every 2 fs
    log_interval = 1  # log progress to stdout every 2 fs

import mdtraj as md
simulation.reporters.append(
    md.reporters.XTCReporter(os.path.join(lab_dir, 'MPro-nirmatrelvir-solv.xtc'), 
        reportInterval=write_interval)
)

simulation.reporters.append(
    app.StateDataReporter(
        sys.stdout,
        log_interval,
        step=True,
        potentialEnergy=True,
        temperature=True,
        progress=True,
        remainingTime=True,
        speed=True,
        totalSteps=steps,
        separator="\t",
    )
)

The velocities for all particles in the system are randomly chosen from a distribution at the given temperature. We chose 300 Kelvin, which is some degrees above room temperature.
A random seed is generated, but could be explicitly given to reproduce results.

Then the simulation is performed by taking the steps defined before.

In [15]:
simulation.context.setVelocitiesToTemperature(300 * unit.kelvin)
simulation.step(steps)  # perform the simulation

#"Progress (%)"	"Step"	"Potential Energy (kJ/mole)"	"Temperature (K)"	"Speed (ns/day)"	"Time Remaining"
5.0%	2500	-1134755.89887778	294.71176468645916	0	--
#"Progress (%)"	"Step"	"Potential Energy (kJ/mole)"	"Temperature (K)"	"Speed (ns/day)"	"Time Remaining"
5.0%	2500	-1134755.89887778	294.71176468645916	0	--
10.0%	5000	-1129535.89887778	300.72597761536247	31.9	4:03
10.0%	5000	-1129535.89887778	300.72597761536247	35	3:42
15.0%	7500	-1129459.64887778	298.9291367067543	33.4	3:39
15.0%	7500	-1129459.64887778	298.9291367067543	35	3:29
20.0%	10000	-1129001.14887778	300.29503632683947	33.9	3:23
20.0%	10000	-1129001.14887778	300.29503632683947	35	3:17
25.0%	12500	-1129119.89887778	299.8323383351143	34.2	3:09
25.0%	12500	-1129119.89887778	299.8323383351143	35	3:05
30.0%	15000	-1129698.14887778	300.3379182238201	34.4	2:55
30.0%	15000	-1129698.14887778	300.3379182238201	35	2:52
35.0%	17500	-1129752.89887778	299.5036290560823	34.5	2:42
35.0%	17500	-1129752.89887778	299.5036290560823	35	2:40
40.0

In [19]:
# Check the trajectory exists and is not empty
xtc_FN = os.path.join(lab_dir, 'MPro-nirmatrelvir-solv.xtc')
os.path.isfile(xtc_FN) and os.path.getsize(xtc_FN)>0

True

### Download results

You can execute the following cell if you are working on Google Colab to download the MD simulation results.

In [None]:
if on_colab:
    files.download(DATA / "topology.pdb")
    files.download(DATA / "trajectory.xtc")

This code block will save the data on your Google Drive.

In [None]:
# # Mounts your Google Drive
# from google.colab import drive
# import os
# if not os.path.isdir('/content/drive'):
#   drive.mount('/content/drive')

# !cp {DATA}/topology.pdb /content/drive/MyDrive/GitHub/modelingworkshop/labs-complete/4-1/MPro-nirmatrelvir-solv.pdb
# !cp {DATA}/trajectory.xtc /content/drive/MyDrive/GitHub/modelingworkshop/labs-complete/4-1/MPro-nirmatrelvir-solv.xtc

Mounted at /content/drive


## Discussion

We have successfully performed an MD simulation of a protein ligand complex. However, we simulated only a considerably short time to keep the execution time of the lab short. To address critical questions in drug design, longer simulations are often required.  
MD simulations are still too computationally costly to be useful for this purpose. Thus, so-called enhanced sampling methods were developed, that aim to accelerate the conformational sampling. Some of the most common methods are discussed in the **Further reading** section below.   
Furthermore, we did not include an equilibration step, which is commonly used to slowly heat up the system from 0 to 300 K before starting the simulation and might be important when simulating more sensitive systems including lipid bilayers. The protonation of ligand and protein was done separately, which is suboptimal, since protonation states of protein residues and ligand affect each other. However, we did not find a free and open-source solution meeting all requirements.

## Quiz

* Which inter- and intramolecular forces are being considered in the AMBER force field? Can you think of any forces not taken into account?
* Would you expect to see the exact same simulation results when running the notebook twice with the same parameters?
* Try doing a short (10ps, snapshot every 1ps) simulation of a protein without a ligand. You can find a broad variety of structures on [PDB](https://www.rcsb.org/) or you can use the complex and remove the ligand.

## Further reading

### Enhanced sampling methods

In theory, unbiased MD simulations should be capable of simulating binding and unbinding events of a drug molecule and its macromolecular target. However, the timescale of binding and unbinding events lies in the millisecond to second range. Enhanced sampling methods aim to accelerate the conformational sampling ([_J Med Chem._ 2016, **59(9)**, 4035-61](https://doi.org/10.1021/acs.jmedchem.5b01684)).

One of these is **Free energy perturbation (FEP)** (also called alchemical free energy calculation), which computes the free energy difference when going from a state A to another state B. It is often employed in lead optimization to evaluate small modification at the ligand, that may boost the binding affinity for the desired target. The ligand from state A is thereby gradually transformed into the ligand of state B by simulating several intermediate ("alchemical") states ([alchemistry](http://www.alchemistry.org/wiki/Main_Page)). 

Another technique for free-energy calculations is **Umbrella sampling (US)**. US enforces sampling along a collective variable (CV) by performing staged simulations with an energetic bias. The bias usually takes the form of a harmonic potential, hence the term "umbrella". Its goal is to sample high-energy regions along the CV. However, the use in drug design is limited by the high computational cost.

In contrast, **Steered MD (SMD)** follows a different approach: it applies external forces to the system. Those forces are time-dependent and facilitate the unbinding of the ligand from the target. The SMD calculates the final force exerted on the system. The unbinding force profile can then be used to filter hits from docking calculations and to discriminate active from inactive molecules.