### Workflow for simulating a protein-ligand complex in water using OpenMM with Open Force Field/AMBER
Openmm is a toolkit for high performance molecular simulations. OpenMM is not an application in the traditional sense, with simulations being run using python scripts, done via utilising the application layer - which is just a set of Python libraries.

This tutorial demonstrates how to create and run a simple MD simulation of a protein-ligand complex in water, specifically, a potential drug molecule in the binding site of COVID-19's main protease. (The chosen drug molecule is arbitrary, chosen from a set of linked fragment screens which were generated with DeLinker.)[1]

In order to do this, both the protein and ligand must be parameterised and protonated.

In [1]:
import time

import numpy as np
import parmed

from pdbfixer import PDBFixer
from openff.toolkit.topology import Molecule, Topology
from openff.toolkit.typing.engines.smirnoff import ForceField
import openmm
from openmm import app, unit



### Ligand preparation
The ligand was generated using DeLinker without hydrogens, so the first step is to add them. 

Once hydrogens are added, the ligand can be parameterised with OpenForceField. 

-------------------------

First, hydrogens are added using obabel. (! at the beginning of a cell runs the command in bash.) 

Obabel is a tool that allows interconversion between different file formats.

.pdb files lack bond connectivity information, which is needed to parameterise the ligand. The .sdf file format has the information required. 

Now the protonated ligand files can both be loaded into openmm.

In [2]:
# load with Open Force Field toolkit
mol = app.PDBFile('jorgensenh.pdb')

### Protein preparation


Before the protein can be used in openmm it must first be load and potentially fixed. A tool called PDBFixer will find missing residues, rename nonstandard residues etc.

In [3]:
# fix PDB
fixer = PDBFixer('mpro.pdb')
fixer.findMissingResidues()
fixer.findNonstandardResidues()
fixer.findMissingAtoms()

fixer.replaceNonstandardResidues()
fixer.addMissingAtoms()

app.PDBFile.writeFile(fixer.topology, fixer.positions, open('mproh.pdb', 'w'))

### Parametrisation and system assembly


To generate the protein system, a system generator must be used, along with some forcefield settings.

A system generator facilitates parameterising both, the ligand using openff and the protein with the AMBER forcefield ('amber99sbildn').

In [4]:
# forcefield configuration
forcefield_kwargs = { 'rigidWater' : True, 
                      'flexibleConstraints': True, 
                      'constraints': app.HBonds, 
                      'hydrogenMass' : 4*unit.amu,
                      'removeCMMotion' : True} 

forcefields = app.ForceField('amber99sbildn.xml', "jorgensenh.xml", 'tip3pfb.xml')

The next task is to merge the protein and the ligand, and solvate the protein using OpenMM's modeller class.

In [5]:
# load the protonated protein
pdbfile = app.PDBFile('mproh.pdb')
modeller = app.Modeller(pdbfile.getTopology(), pdbfile.getPositions())

# protonate the protein
modeller.addHydrogens()

# add the ligand to the system
modeller.add(mol.topology, mol.positions)

# add the solvent (and box)
modeller.addSolvent(forcefields, 
                    model="tip3p", 
                    ionicStrength=0*unit.molar, 
                    neutralize=True, 
                    padding=1.4)

# write a pdb with the solvated system
app.PDBFile.writeFile(modeller.topology, modeller.positions, open("sys_sol.pdb", 'w'))

Parameterise the system (including the protein, ligand and the waters)

In [6]:
# parameterize and create a system 
system = forcefields.createSystem(topology=modeller.topology, **forcefield_kwargs)

>TIP: The parameterised system can converted to a ParmEd structure object.

### Simulating the protein-ligand complex

The system can be saved by converting it to an .xml containing all system information/parameters.

In [7]:
xml = openmm.XmlSerializer.serializeSystem(system)
with open("complex_system.xml", "w+") as out:
    out.write(xml)

A simulation, which is a combination of a system, integrator (how the equations of motion are advanced) and topology (atom coordinates) can now be created using the system that was just made.

The simulation creates a context, which stores the complete state of a simulation and contains information such as the positions and velocities of particles. 

Parameters such as the temperature of the simulation can be specified.

In [8]:
# propagate the system with Langevin dynamics.
time_step = 1*unit.femtoseconds  # simulation timestep
temperature = 300*unit.kelvin  # simulation temperature
friction = 1/unit.picosecond  # collision rate
integrator_min = openmm.LangevinIntegrator(temperature, friction, time_step)

# set up an openmm simulation
simulation = openmm.app.Simulation(modeller.topology, system, integrator_min)

# set the initial positions
simulation.context.setPositions(modeller.positions)

If the simulation was to be run as is, it would blow up due to extreme forces on inappropriately placed atoms. First the system must be minimised. This minimised system can then be used as a starting point for multiple runs.

In [9]:
simulation.minimizeEnergy()
# get state of minimised simulation
state = simulation.context.getState(getPositions=True)
# get positions of minimised simulation
minimised_positions = state.getPositions()

After the simulation has been minimized, a 'production run' can be performed. How often the output will be written can be set by variables and appending reporter objects to the simulation.

The results of the simulation will be written to trajectory_prod.pdb, which can be loaded into visualisation software e.g. VMD.

In [10]:
#propagate the System with Langevin dynamics.
time_step = 1*unit.femtoseconds  # simulation timestep
temperature = 300*unit.kelvin  # simulation temperature
friction = 1/unit.picosecond  # collision rate
integrator_prod = openmm.LangevinIntegrator(temperature, friction, time_step)

# length of the simulation.
num_steps = 200  # number of integration steps to run

# Logging options.
trj_freq = 50  # how often to save a trajectory frame
data_freq = 50  # how often to output the simulation statistics

# set up an OpenMM simulation using minimised structure positions
simulation = openmm.app.Simulation(modeller.topology, system, integrator_prod)

#set the initial positions.
simulation.context.setPositions(minimised_positions)

#randomize the velocities from a Boltzmann distribution at a given temperature.
simulation.context.setVelocitiesToTemperature(temperature)

#configure the information in the output files.
pdb_reporter = openmm.app.PDBReporter('trajectory_prod.pdb', trj_freq)
state_data_reporter = openmm.app.StateDataReporter('data_prod.csv', data_freq, step=True,
                                                   potentialEnergy=True, temperature=True,
                                                   density=True)
simulation.reporters.append(pdb_reporter)
simulation.reporters.append(state_data_reporter)

Finally, run the simulation.

In [11]:
print("Starting simulation")
start = time.process_time()

#run the simulation
simulation.step(num_steps)

end = time.process_time()
print("Finished in %.2f seconds" % (end-start))

Starting simulation
Finished in 18.54 seconds


[1] - https://pubs.acs.org/doi/10.1021/acs.jcim.9b01120