# Simulating explicitly solvated small protein

For the simulations in this notebook, we will make use of the crystal structure of the Villin Headpiece subdomain [1YRF](https://www.rcsb.org/structure/1YRF). This is relatively small protein (35 residues) consisting of three alpha helices. This is a prototypical fast folding protein, see [10.1016/j.jmb.2006.03.034](https://doi.org/10.1016/j.jmb.2006.03.034) and [10.1073/pnas.0502495102](https://doi.org/10.1073/pnas.0502495102), and it is therefore a popular benchmark for protein folding molecular dynamics simulations, e.g. [10.1073/pnas.1800690115](https://doi.org/10.1073/pnas.1800690115) (force field accuracy) and [10.1109/SC.2014.9](https://doi.org/10.1109/SC.2014.9) (computational performance)

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
%%capture
!conda install -c conda-forge openmm mdtraj parmed
!pip install py3dmol 

In [None]:
import shutil
from sys import stdout
import matplotlib.pyplot as plt
import mdtraj
import numpy as np
import pandas

from openmm import *
from openmm.app import *
from openmm.unit import *

### Obtain 1YRF from PDB

In [None]:
pdb = PDBFile("1yrf_a.pdb")
modeller = Modeller(pdb.topology, pdb.positions)
forcefield = ForceField("amber14-all.xml", "amber14/tip3pfb.xml")
modeller.addHydrogens(forcefield)
modeller.addSolvent(forcefield, model="tip3p", padding=1 * nanometer)
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME, constraints=HBonds)
temperature = 300 * kelvin
pressure = 1 * bar
integrator = LangevinIntegrator(temperature, 1 / picosecond, 2 * femtoseconds)
system.addForce(MonteCarloBarostat(pressure, temperature))
simulation = Simulation(modeller.topology, system, integrator)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy(maxIterations=100)
positions = simulation.context.getState(getPositions=True).getPositions()
with open("init.pdb", "w") as f:
    PDBFile.writeFile(simulation.topology, positions, f)

In [None]:
simulation.reporters = []
simulation.reporters.append(DCDReporter("traj.dcd", 10))
simulation.reporters.append(
    StateDataReporter(stdout, 100, step=True, temperature=True, elapsedTime=True)
)
simulation.reporters.append(
    StateDataReporter(
        "scalars.csv",
        10,
        step=True,
        time=True,
        potentialEnergy=True,
        totalEnergy=True,
        temperature=True,
    )
)
simulation.step(30000)

You will need the output files of this simulation for the following notebook. Copy over the files `init.pdb`, `scalars.csv` and `traj.dcd` to the directory `../05_analysis`. This can be done with the following code, or with any file manager that comes with the operating system.

In [None]:
shutil.copy("init.pdb", "../05_analysis")
shutil.copy("scalars.csv", "../05_analysis")
shutil.copy("traj.dcd", "../05_analysis")

> Make an estimate of the computational cost (wall time) on your current hardware to run a sufficiently long simulation to observe spontaneously a protein folding event for this fast folder. The required simulation time is approximately $4 \mathrm{\mu s}$. Would such a calculations be feasible?

In [None]:
df = pandas.read_csv("scalars.csv")
df.plot(kind="line", x="Time (ps)", y="Potential Energy (kJ/mole)")

In [None]:
traj = mdtraj.load("traj.dcd", top="init.pdb")
view = nglview.show_mdtraj(traj)
view