# Molecular dynamics of an explicitly solvated small protein

For the simulations in this notebook, we will make use of the crystal structure of the Villin Headpiece subdomain [1YRF](https://www.rcsb.org/structure/1YRF). This is relatively small protein (35 residues) consisting of three alpha helices. This is a prototypical fast folding protein, see [10.1016/j.jmb.2006.03.034](https://doi.org/10.1016/j.jmb.2006.03.034) and [10.1073/pnas.0502495102](https://doi.org/10.1073/pnas.0502495102), and it is therefore a popular benchmark for protein folding molecular dynamics simulations, e.g. [10.1073/pnas.1800690115](https://doi.org/10.1073/pnas.1800690115) (force field accuracy) and [10.1109/SC.2014.9](https://doi.org/10.1109/SC.2014.9) (computational performance).



In [1]:
from sys import stdout
from simtk.openmm.app import *
from simtk.openmm import *
from simtk.unit import *

In [2]:
%matplotlib inline
import numpy as np
import mdtraj
import pandas
import matplotlib.pyplot as plt
import nglview

_ColormakerRegistry()

The PDB file for 1YRF contains more information than we need. It holds several possible states of some residues mixed into one file and OpenMM cannot handle this. Most atoms positions are the same for all these states, but a few have an extra `A`, `B`, `C`, ... just after the atom name. The following function splits such a PDB file into multiples ones. Each of these files is suitable for starting an OpenMM simulation.

In [3]:
def split_pdb(fn_pdb):
    pos = 16
    groups = {}
    counter = 0
    
    with open(fn_pdb) as f:
        
        for line in f:
            if line.startswith("ATOM"):
                state = line[pos]
                line = line[:pos] + " " + line[pos+1:]
                groups.setdefault(state, []).append((counter, line))
                counter += 1
                
    for key, lines_group in groups.items():
        
        if key == " ":
            continue
        lines_both = lines_group + groups[" "]
        lines_both.sort()
        with open("{}_{}.pdb".format(fn_pdb[:-4], key.lower()), "w") as f:
            for counter, line in lines_both:
                f.write(line)
                
split_pdb("1yrf.pdb")

In [4]:
pdb = PDBFile('1yrf_a.pdb')
modeller = Modeller(pdb.topology, pdb.positions)
forcefield = ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')
modeller.addHydrogens(forcefield)
modeller.addSolvent(forcefield, model='tip3p', padding=1*nanometer)
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME, constraints=HBonds)
temperature = 300 * kelvin
pressure = 1 * bar
integrator = LangevinIntegrator(temperature, 1/picosecond, 2*femtoseconds)
system.addForce(MonteCarloBarostat(pressure, temperature))
simulation = Simulation(modeller.topology, system, integrator)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy(maxIterations=100)
positions = simulation.context.getState(getPositions=True).getPositions()

with open('init.pdb', 'w') as f:
    PDBFile.writeFile(simulation.topology, positions, f)

**<span style="color:#A03;font-size:14pt">
&#x270B; HANDS-ON! &#x1F528;
</span>**

> Which changes to the input structure can be made, for which the force field can still be applied? Try the following:
>
> - Remove an atom from the PDB file.
> - Remove an entire residue from the PDB file.
> - Use an engineered form of the Villin Headpiece, e.g. the Lys24Nle mutant (PDB 1WY3) or the Lys24Nle/Lys29Nle double mutant (PDB 2F4K).
>
> Finally, restore the input to the original one and rerun the above code cell, to have a good starting point for the next cell.
>
> With the methodology shown here, it is not be possible to define custom mutations.

In [None]:
simulation.reporters = []
simulation.reporters.append(DCDReporter('traj.dcd', 10))
simulation.reporters.append(StateDataReporter(stdout, 100, step=True,
        temperature=True, elapsedTime=True))
simulation.reporters.append(StateDataReporter("scalars.csv", 10, step=True, time=True,
    potentialEnergy=True, totalEnergy=True, temperature=True))
simulation.step(30000)

# The last line is only needed for Windows users,
# to close the DCD file before it can be opened by nglview.
del simulation

#"Step","Temperature (K)","Elapsed Time (s)"
100,91.31850584979749,0.0001010894775390625
200,136.40505701875645,1.0228722095489502
300,173.77564908046043,2.047692060470581
400,196.87661792009473,3.0793709754943848
500,212.6573897318262,4.111469030380249
600,230.1984523361407,5.120949029922485
700,241.93299696819696,6.146464109420776
800,250.2080591900686,7.231529951095581
900,257.6704763902764,8.26680302619934
1000,269.7582531362026,9.304004192352295
1100,270.5073501273368,10.334190130233765
1200,276.0685797262969,11.370741128921509
1300,277.5554219446434,12.418067932128906
1400,281.01895488431296,13.451228141784668
1500,285.27009400330377,14.486724138259888
1600,285.36114487513623,15.516728162765503
1700,288.6229467736835,16.55263614654541
1800,291.7649112613786,17.569833993911743
1900,291.3620631654766,18.60663914680481
2000,291.044095716856,19.641930103302002
2100,293.84914734540155,20.6754732131958
2200,291.9308816546421,21.719101905822754
2300,297.12727672291805,22.7396821975708
2

You will need the output files of this simulation for the following notebook. Copy over the files `init.pdb`, `scalars.csv` and `traj.dcd` to the directory `../05`. This can be done with the following code, or with any file manager that comes with the operating system.

In [None]:
import shutil
shutil.copy("init.pdb", "../05")
shutil.copy("scalars.csv", "../05")
shutil.copy("traj.dcd", "../05")

**<span style="color:#A03;font-size:14pt">
&#x270B; HANDS-ON! &#x1F528;
</span>**

> Make an estimate of the computational cost (wall time) on your current hardware to run a sufficiently long simulation to observe spontaneously a protein folding event for this fast folder. The required simulation time is approximately $4 \mathrm{\mu s}$. Would such a calculations be feasible?

In [None]:
df = pandas.read_csv("scalars.csv")
df.plot(kind='line', x='Time (ps)', y='Potential Energy (kJ/mole)')

In [None]:
traj = mdtraj.load('traj.dcd', top='init.pdb')
view = nglview.show_mdtraj(traj)
view