# Folding simulations of FoxP1 - Production

### Introduction

This tutorial aims at running and analyzing equilibrium folding simulations for a small protein. Our protein will be the DNA binding domain of the forkhead box P1 (a.k.a. FoxP1) transcription factor protein. The tutorial will be divided into two parts: Production and analysis. 

The first part is the folding simulation data production, which will be addressed in this notebook. The analysis part is in the second notebook:

[02-Analysis Notebook](02-Analysis.ipynb)

We will need two files to sample the folding, conformational landscape using Structure-Based Models (SBMs). The protein structure and the contact file. Both are provided here in the input notebook. For further details on how to get these files, please refer to our basic tutorials:

[Basic tutorials](https://github.com/CompBiochBiophLab/sbm-openmm/tree/master/tutorials/basic)

We start by importing the necessary libraries.

In [1]:
import sbmOpenMM
from simtk.openmm.app import *
from simtk.openmm import *
from simtk.unit import *

Using the contact and input file, we create an All-atom SBM model (AA-SBM).

In [2]:
pdb_file = 'input/FoxP_monomer.pdb'
contact_file = 'input/FoxP_monomer.contacts'

sbmAA = sbmOpenMM.models.getAllAtomModel(pdb_file, contact_file)

Generating AA SBM for structure file input/FoxP_monomer.pdb

Setting up geometrical parameters:
_________________________________
Removing hydrogens from topology
Added 747 atoms
Added 767 bonds
Added 1038 angles
Added 855 torsions
Added 166 impropers
Added 357 planars
Reading contacts from contact file: input/FoxP_monomer.contacts
Added 822 native contacts

Setting up default forcefield parameters:
________________________________________
Adding default bond parameters:
Adding default angle parameters:
grouping torsions in backbone and side-chain groups:
Adding default torsion parameters:
Adding default improper parameters:
Adding default planar parameters:
Adding default contact parameters:
Adding default excluded volume parameters:

Adding Forces:
_____________
Adding Harmonic Bond Forces
Adding Harmonic Angle Forces
Adding Periodic Torsion Forces
Adding Harmonic Improper Forces
Adding Periodic Planar Forces
Adding Lennard Jones 12-6 Forces to native contacts
Adding Lennard Jones 12

We next set up the OpenMM integrator and simulation. As before, we give them the topology, system, and positions directly from our sbmAA object. We will select the folding temperature as our integrator temperature. To estimate this parameter, please refer to the "basic" tutorial notebook on this subject:

[Tutorial on folding temperature](https://github.com/CompBiochBiophLab/sbm-openmm/blob/master/tutorials/basic/03-FoldingTemperature/foldingTemperature.ipynb)

Note that temperature is given as kelvin; however, this is only to keep consistency with OpenMM's units. Since our simulation is a simplified forcefield and has not been calibrated to match any unit system, we will refer to the temperature units as reduced temperature units (RTU). 

In [3]:
temperature = 105.35 # Folding temperature for the FoxP1 system using the AA-SBM.
integrator = LangevinIntegrator(temperature*kelvin, 1.0/picosecond, 0.002*picoseconds)
simulation = Simulation(sbmAA.topology, sbmAA.system, integrator)
simulation.context.setPositions(sbmAA.positions)

Additionally, we would like to report energies and trajectory data into files. For this, we set up a DCDReporter (trajectory in DCD format) and a special StateDataReporter included inside the sbmOpenMM library. If we pass the sbmAA to the reporter with the option sbmObject, it will automatically report all the SBM energies. Since we are interested only in the trajectory information for the analysis, we will print the energies in this notebook only to observe the simulation's progression.

In [4]:
# Import stdout to print simulation output in the notebook
from sys import stdout

In [5]:
# Define report frequency
report_frequency = 1 # in ns
print_report_frequency = 20 # in ns
time_step = 0.002 # in ps
report_steps = int(report_frequency*1000/time_step)
print_report_steps = int(print_report_frequency*1000/time_step)

# Add trajectory reporter
simulation.reporters.append(DCDReporter('output/FoxP1_folding_traj.dcd', report_steps))

# Add energy reporter
sbmReporter = sbmOpenMM.reporter.sbmReporter(stdout, print_report_steps, 
                                             step=True, potentialEnergy=True, 
                                             temperature=True, sbmObject=sbmAA)
simulation.reporters.append(sbmReporter)

To run the simulation, we should define the time-length of our folding simulation. The simulation should be long enough to observe several transitions of our process, such we can explore the least probable regions (a.k.a. transition-state region) of the FoxP1 folding process. In the publication presenting the SBMOpenMM library ([Link to our article]()) we ran 15 replicas of 10$\mu{s}$ each. Here we will run a small trajectory to check that the simulation's behavior is correct. However, it is recommended to set a python script to make the full production runs.

We estimated that the mean first passage time of the FoxP1 AA-SBM system is about 0.5 $\mu s$. Therefore, to observe (un)folding transitions, we will run a trajectory of 5 $\mu s$, storing data every one ns. Note that this can take several hours on a computer with a GPU card.

In [6]:
simulation_time = 10 # in microseconds
simulation_steps = int(simulation_time*10**6/time_step)
simulation.step(simulation_steps)

#"Step","Potential Energy (kJ/mole)","Temperature (K)","Harmonic Bond Energy (kJ/mol)","Harmonic Angle Energy (kJ/mol)","Periodic Torsion Energy (kJ/mol)","Harmonic Improper Energy (kJ/mol)","Harmonic Planar Energy (kJ/mol)","LJ 12-6 Contact Energy (kJ/mol)","LJ 12 Repulsion Energy (kJ/mol)"
10000000,704.50927734375,105.91975198271487,321.4348449707031,370.83038330078125,138.8471221923828,43.38890838623047,99.09027099609375,-281.53662109375,12.454378128051758
20000000,689.5614013671875,109.04479182407756,341.6539306640625,342.840087890625,135.9015655517578,38.7005615234375,91.41425323486328,-276.3681640625,15.419159889221191
30000000,666.6947021484375,105.72978375180955,327.3387756347656,348.45654296875,123.92581176757812,45.102821350097656,89.2771224975586,-282.1001281738281,14.693770408630371
40000000,618.5307006835938,100.99039423768312,322.78076171875,326.7628479003906,119.70452880859375,28.615297317504883,94.22433471679688,-284.91094970703125,11.353912353515625
50000000,637.777954

470000000,735.4652099609375,101.39179719334318,349.4844970703125,374.66558837890625,136.86322021484375,35.269371032714844,87.47291564941406,-260.23590087890625,11.945472717285156
480000000,731.1075439453125,96.10681400540362,334.06805419921875,370.76605224609375,137.9110107421875,35.14996337890625,90.73611450195312,-256.0172424316406,18.493574142456055
490000000,674.511474609375,108.46853674042579,353.58575439453125,364.319580078125,125.49100494384766,33.95237731933594,85.13180541992188,-302.5509948730469,14.581952095031738
500000000,683.4622802734375,109.91984437508351,344.6604309082031,354.3466796875,127.36418151855469,42.29878616333008,86.1268081665039,-286.04046630859375,14.705879211425781
510000000,689.7012939453125,109.384612559037,349.81201171875,356.9176025390625,130.52349853515625,34.13750457763672,86.47361755371094,-283.9063415527344,15.743390083312988
520000000,673.4761352539062,111.20270166290524,334.74969482421875,334.46331787109375,145.59835815429688,36.035343170166016,78

Exception: Error downloading array energySum: clEnqueueReadBuffer (-5)

We will use mdtraj to visualize the progression of the trajectory. We will plot the radius of gyration to see the folded and unfolded states of the system clearly.

In [None]:
import mdtraj as md
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Define input for analysis
topology = 'input/FoxP_monomer.pdb'
trajectory_file = 'output/FoxP1_folding_traj.dcd'

# Load trajectory into MDtraj
traj = md.load(trajectory_file, top=topology)

In [None]:
rg = md.compute_rg(traj) # Calculate the radius of gyration for the trajectory
time = np.array(range(1, traj.n_frames+1))*report_frequency
plt.plot(time, rg)

We observe...