Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [1]:
NAME = "Damir"
COLLABORATORS = ""

---

# Introduction to Folding

Any folding algorithm requires…


- …a search strategy, an algorithm to generate many candidate structures (or decoys) and…


- …a scoring function to discriminate near-native structures from all the others.


In this workshop you will write your own Monte Carlo protein folding algorithm from scratch, and we will explore a couple of the tricks used by Simons et al. (1997, 1999) to speed up the folding search.


## Suggested Readings
1. K. T. Simons et al., “Assembly of Protein Structures from Fragments,” *J. Mol. Biol.*
268, 209-225 (1997).
2. K. T. Simons et al., “Improved recognition of protein structures,” *Proteins* 34, 82-95
(1999).
3. Chapter 4 (Monte Carlo methods) of M. P. Allen & D. J. Tildesley, *Computer
Simulation of Liquids*, Oxford University Press, 1989.

**Chapter contributors:**

- Kathy Le (Johns Hopkins University); this chapter was adapted from the [PyRosetta book](https://www.amazon.com/PyRosetta-Interactive-Platform-Structure-Prediction-ebook/dp/B01N21DRY8) (J. J. Gray, S. Chaudhury, S. Lyskov, J. Labonte).

# Basic Folding Algorithm
Keywords: pose_from_sequence(), random move, scoring move, Metropolis, assign(), Pose()

In [2]:
# You have to rerun this each time you start a new notebook or do a "factory reset".
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install() #Instead of pyrosettacolabsetup.setup
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")



Drive already mounted at /content/google_drive; to attempt to forcibly remount, call drive.mount("/content/google_drive", force_remount=True).
Notebook is set for PyRosetta use in Colab.  Have fun!


In [3]:
from pyrosetta import *
from pyrosetta.teaching import *
init()

PyRosetta-4 2021 [Rosetta PyRosetta4.MinSizeRel.python37.ubuntu 2021.21+release.882e5c1ab85c8c251fce4eb3e1e0504af590786a 2021-05-26T14:40:53] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Rosetta version: PyRosetta4.MinSizeRel.python37.ubuntu r284 2021.21+release.882e5c1ab85 882e5c1ab85c8c251fce4eb3e1e0504af590786a http://www.pyrosetta.org 2021-05-26T14:40:53
core.init: command: PyRosetta -ex1 -ex2aro -database /content/google_drive/MyDrive/prefix/lib/python3.7/site-packages/pyrosetta/database
basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/urandom', seed=-752181500 seed_offset=0 real_seed=-752181500
basic.random.init_random_generator: RandomGenerator:init: Normal mode, seed=-752181500 RG_type=mt19937


In [4]:
%cd google_drive/MyDrive/temp_pyrbc_202103_notebooks/

/content/google_drive/MyDrive/temp_pyrbc_202103_notebooks


## Building the Pose

In this workshop, you will be folding a 10 residue protein by building a simple de novo folding algorithm. Start by initializing PyRosetta as usual.

Create a simple poly-alanine `pose` with 10 residues for testing your folding algorithm. Store the pose in a variable called "polyA."

In [5]:
polyA = pose_from_sequence("A"*10)

polyA.pdb_info().name("polyA")

core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 984 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 1.67199 seconds.


__Question:__
Check the backbone dihedrals of a few residues (except the first and last) using the `.phi()` and `.psi()` methods in `Pose`. What are the values of $\phi$ and $\psi$ dihedrals? You should see ideal bond lengths and angles, but the dihedrals may not be as realistic.

In [6]:
print("PSI: ", polyA.psi(3))
print("PHI: ", polyA.phi(3))

PSI:  180.0
PHI:  180.0


OPTIONAL:
We may want to visualize folding as it happens. Before starting with the folding protocol, instantiate a PyMOL mover and setup the udp-bridge daemon with your secret phrase. In the PyMOL mover, we will want to retain history so we can view the entire folding process; to retain the histroy run the mover's `.keep_history()` method.

In [7]:
import os
if os.getenv("DEBUG"): sys.exit(0)
import pyrosetta.network

# make sure to supply unique secret here and use _the same_ secrete when running PyMOL-Rosetta-relay-client.python3.py in PyMOL on your desktop machine
pyrosetta.network.start_udp_to_tcp_bridge_daemon(secret='secret_code')



UDP server started: localhost:65000...


In [8]:
pmm = PyMOLMover()
pmm.keep_history(True)


Use the PyMOL mover to view the `polyA` `Pose`. You should see a long thread-like structure in PyMOL.

In [9]:
pmm.apply(polyA)

Connected to relay.graylab.jhu.edu 128.220.208.35:9989...


## Building A Basic *de Novo* Folding Algorithm

Now, write a program that implements a Monte Carlo algorithm to optimize the protein conformation. 

Our main program will include 100 iterations of making a random trial move, scoring the protein, and accepting/rejecting the move. Therefore, we can break this algorithm down into three smaller subroutines: **random, score, and decision.**

### Step 1: Random Move

For the **random** trial move, write a subroutine to choose one residue at random using `random.randint()` and then randomly perturb either the φ or ψ angles by a random number chosen from a Gaussian distribution. Use the Python built-in function `random.gauss()` from the `random` library with a mean of the current angle and a standard deviation of 25°. After changing the torsion angle, use `pmm.apply(polyA)` to update the structure in PyMOL.

In [10]:
import math
import random

def randTrial(your_pose):

    random_num = random.randint(2, your_pose.total_residue())
    curr_psi = your_pose.psi(random_num)
    curr_phi = your_pose.phi(random_num)

    new_psi = random.gauss(curr_psi, 25)
    new_phi = random.gauss(curr_phi, 25)

    your_pose.set_psi(random_num, new_psi)
    your_pose.set_phi(random_num, new_phi)

    pmm.apply(your_pose)

    return your_pose


### Step 2: Scoring Move

For the **scoring** step, we need to create a scoring function and make a subroutine that simply returns the numerical energy score of the pose.

In [11]:
sfxn = get_fa_scorefxn()

def score(your_pose):
    return sfxn(your_pose)

core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv
basic.io.database: Database file opened: scoring/score_functions/rama/fd/

### Step 3: Accepting/Rejecting Move
For the **decision** step, we need to make a subroutine that either accepts or rejects the new conformatuon based on the Metropolis criterion. The Metropolis criterion has a probability of accepting a move as $P = \exp( -\Delta G / kT )$. When $ΔE ≥ 0$, the Metropolis criterion probability of accepting the move is $P = \exp( -\Delta G / kT )$. When $ΔE < 0$, the Metropolis criterion probability of accepting the move is $P = 1$. Use $kT = 1$ Rosetta Energy Unit (REU).

In [12]:
def decision(before_pose, after_pose):
    score_before = score(before_pose)
    score_after = score(after_pose)
    delta_E = score_after - score_before
    P = math.exp(-delta_E / 1)

    if delta_E < 0:
      return after_pose
    elif random.uniform(0, 1) > P:
      return before_pose
    else:
      return after_pose

### Step 4: Execution
Now we can put these three subroutines together in our main program! Write a loop in the main program so that it performs 100 iterations of: making a random trial move, scoring the protein, and accepting/rejecting the move. 

After each iteration of the search, output the current pose energy and the lowest energy ever observed. **The final output of this program should be the lowest energy conformation that is achieved at *any* point during the simulation.** Be sure to use `low_pose.assign(pose)` rather than `low_pose = pose`, since the latter will only copy a pointer to the original pose.

In [13]:
def basic_folding(your_pose):
    """Your basic folding algorithm that completes 100 Monte-Carlo iterations on a given pose"""
    
    lowest_pose = Pose() # create an empty pose for tracking the lowest energy pose.
    lowest_pose.assign(your_pose) # assign the first move to the pose

    for i in range(100):
      # keep track of the before pose
      before_pose = Pose().assign(your_pose) 

      # applying the random move to the pose
      # deciding whether to keep the change or not
      after_pose = Pose().assign(randTrial(your_pose))
      your_pose.assign(decision(before_pose, after_pose))

      if score(your_pose) < score(lowest_pose):
        lowest_pose.assign(your_pose)

      print("Iteration #{} | score of before: {} | score of after: {}".format(i, score(before_pose), score(after_pose)))
    
    return lowest_pose


Finally, output the last pose and the lowest-scoring pose observed and view them in PyMOL. Plot the energy and lowest-energy observed vs. cycle number. What are the energies of the initial, last, and lowest-scoring pose? Is your program working? Has it converged to a good solution?


In [14]:
%%timeit

lowest_pose = basic_folding(polyA)
pmm.apply(lowest_pose)

basic.io.database: Database file opened: scoring/score_functions/elec_cp_reps.dat
core.scoring.elec.util: Read 40 countpair representative atoms
core.pack.dunbrack.RotamerLibrary: shapovalov_lib_fixes_enable option is true.
core.pack.dunbrack.RotamerLibrary: shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
core.pack.dunbrack.RotamerLibrary: Binary rotamer library selected: /content/google_drive/MyDrive/prefix/lib/python3.7/site-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/content/google_drive/MyDrive/prefix/lib/python3.7/site-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.
core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.289863 seconds to load from binary
Iteration #0 | score of before: 29.96724428561499 | score of after: 27.936081503798377
Iteration #1 | score of before: 27.936081503798377 | score of a

Here's an example of the PyMOL view:

In [15]:
# note that it seems that colab cannot display gifs; just open this gif directly
# from IPython.display import Image
# Image('./Media/folding.gif',width='300')

### Exercise 1: Comparing to Alpha Helices
Using the program you wrote for Workshop #2, force the $A_{10}$ sequence into an ideal α-helix.

**Questions:** Does this helical structure have a lower score than that produced by your folding algorithm above? What does this mean about your sampling or discrimination?

In [16]:
polyB = Pose().assign(polyA)

def convert_a_helix(your_pose):

  for i in range(1, your_pose.total_residue() + 1):
    your_pose.set_psi(i, -60)
    your_pose.set_phi(i, -160)
  
  return your_pose

polyB = convert_a_helix(polyB)

pmm.apply(polyB)

### Exercise 2: Optimizing Algorithm 
Since your program is a stochastic search algorithm, it may not produce an ideal structure consistently, so try running the simulation multiple times or with a different number of cycles (if necessary). Using a kT of 1, your program may need to make up to 500,000 iterations.

# Low-Res Scoring and Fragments
Keywords: centroid, SwitchResidueTypeSetMover(), create_score_function(), score3, fa_standard, ScoreFunction(), set_weight(), read_fragment_file(), ClassicFragmentMover()

## Low-Resolution (Centroid) Scoring


Following the treatment of Simons *et al.* (1999), Rosetta can score a protein conformation using a low-resolution representation. This will make the energy calculation faster.

Load chain A of Ras, a protein from a the previous workshop 3. Also calculate the full-atom energy of the pose.

```
pose = pyrosetta.pose_from_pdb("6Q21_A.pdb")
sfxn = pyrosetta.get_score_function()
sfxn(pose)
```

In [17]:
pose = pyrosetta.pose_from_pdb("inputs/6Q21_A.pdb")
sfxn = pyrosetta.get_score_function()
sfxn(pose)

core.import_pose.import_pose: File 'inputs/6Q21_A.pdb' automatically determined to be of type PDB
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015


1215.729070042399

**Question:** Print residue 5. Note the number of atoms and coordinates of residue 5.

```
print(pose.residue(5))
```

In [18]:
print(pose.residue(5))

Residue 5: LYS (LYS, K):
Base: LYS
 Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA
 Variant types:
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H    HA 
 Side-chain atoms:  CB   CG   CD   CE   NZ  1HB  2HB  1HG  2HG  1HD  2HD  1HE  2HE  1HZ  2HZ  3HZ 
Atom Coordinates:
   N  : 20.315, 43.835, 78.015
   CA : 20.418, 42.863, 79.118
   C  : 19.697, 43.46, 80.329
   O  : 20.096, 44.486, 80.897
   CB : 21.858, 42.487, 79.491
   CG : 22.791, 42.176, 78.316
   CD : 22.406, 40.943, 77.485
   CE : 23.009, 40.932, 76.075
   NZ : 22.748, 42.169, 75.307
   H  : 21.0493, 44.5172, 77.8902
   HA : 19.9193, 41.9417, 78.815
  1HB : 22.3125, 43.3019, 80.0551
  2HB : 21.8492, 41.6078, 80.1356
  1HG : 22.8124, 43.0262, 77.6332
  2HG : 23.8008, 42.0064, 78.6884
  1HD : 22.7418, 40.0399, 77.9965
  2HD : 21.3219, 40.8985, 77.3807
  1HE : 24.088, 40.801, 76.1421
  2HE : 22.5982, 40.0953, 75.5101
  1HZ : 23.1708, 42

### SwitchResidueTypeSetMover

Now, convert the `pose` to the centroid form by using a `SwitchResidueTypeSetMover` object and the apply method:

```
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(pose)
print(pose.residue(5))
```

**Question:** How many atoms are now in residue 5? How is this different than before switching it into centroid mode?

In [19]:
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(pose)
print(pose.residue(5))

core.chemical.GlobalResidueTypeSet: Finished initializing centroid residue type set.  Created 69 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 0.092189 seconds.
Residue 5: LYS (LYS, K):
Base: LYS
 Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE SIDECHAIN_AMINE ALPHA_AA L_AA
 Variant types:
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H  
 Side-chain atoms:  CB   CEN
Atom Coordinates:
   N  : 20.315, 43.835, 78.015
   CA : 20.418, 42.863, 79.118
   C  : 19.697, 43.46, 80.329
   O  : 20.096, 44.486, 80.897
   CB : 21.8754, 42.543, 79.454
   CEN: 23.4957, 41.1851, 79.3707
   H  : 21.0493, 44.5172, 77.8902
Mirrored relative to coordinates in ResidueType: FALSE



Score the new, centroid-based pose by creating and using the standard centroid score function "score3".

```
cen_sfxn = pyrosetta.create_score_function("score3")
cen_sfxn(pose)
```

**Question:** What is the new total score? What scoring terms are included in "score3" (`print` the `cen_sfxn`)? Do these match Simons?

In [20]:
cen_sfxn = pyrosetta.create_score_function("score3")
cen_sfxn(pose)

basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/env_log.txt
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/cbeta_den.txt
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/pair_log.txt
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/cenpack_log.txt
basic.io.database: Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.HS.resmooth
basic.io.database: Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.SS.resmooth


-4.509358254597608

Convert the `pose` back to all-atom form by using another switch object, `SwitchResidueTypeSetMover("fa_standard")`.

```
fa_switch = SwitchResidueTypeSetMover("fa_standard")
fa_switch.apply(pose)
print(pose.residue(5))
```

**Question:** Confirm that you have all the atoms back. Are the atoms in the same coordinate position as before?

In [21]:
fa_switch = SwitchResidueTypeSetMover("fa_standard")
fa_switch.apply(pose)
print(pose.residue(5))

Residue 5: LYS (LYS, K):
Base: LYS
 Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA
 Variant types:
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H    HA 
 Side-chain atoms:  CB   CG   CD   CE   NZ  1HB  2HB  1HG  2HG  1HD  2HD  1HE  2HE  1HZ  2HZ  3HZ 
Atom Coordinates:
   N  : 20.315, 43.835, 78.015
   CA : 20.418, 42.863, 79.118
   C  : 19.697, 43.46, 80.329
   O  : 20.096, 44.486, 80.897
   CB : 21.8754, 42.5429, 79.4539
   CG : 22.8944, 43.244, 78.5655
   CD : 22.2113, 44.1202, 77.5262
   CE : 20.6967, 44.0574, 77.6573
   NZ : 20.2706, 43.1587, 78.7642
   H  : 21.0493, 44.5172, 77.8902
   HA : 19.9306, 41.9373, 78.8101
  1HB : 22.0814, 42.8255, 80.4867
  2HB : 22.0409, 41.4686, 79.3693
  1HG : 23.5468, 43.8654, 79.1801
  2HG : 23.5056, 42.4999, 78.0558
  1HD : 22.5361, 45.154, 77.6512
  2HD : 22.4937, 43.7874, 76.5274
  1HE : 20.3065, 45.0564, 77.8461
  2HE : 20.2655, 43.6934, 76.7248
  1

### Exercise 1: Centroid Folding Algorithm
Go back and adjust your folding algorithm to use centroid mode. Create a `ScoreFunction` that uses only van der Waals (`fa_atr` and `fa_rep`) and `hbond_sr_bb` energy score terms. 

**Question:** How much faster does your program run?

In [22]:
polyA = pyrosetta.pose_from_sequence('A' * 10)
polyA.pdb_info().name("polyA")

# Apply the SwitchResidueTypeSetMover to the pose polyA
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(polyA)

In [23]:
# Create new score function with only VDW and hbond_sr_bb energy score terms.
sfxn2 = pyrosetta.create_score_function("score3")
sfxn2.set_weight(vdw, 1.0)
sfxn2.set_weight(hbond_sr_bb, 1.0)

print(sfxn2.weights())

sfxn2(polyA)

( fa_atr; 0) ( fa_rep; 0) ( fa_sol; 0) ( fa_intra_atr; 0) ( fa_intra_rep; 0) ( fa_intra_sol; 0) ( fa_intra_atr_xover4; 0) ( fa_intra_rep_xover4; 0) ( fa_intra_sol_xover4; 0) ( fa_intra_atr_nonprotein; 0) ( fa_intra_rep_nonprotein; 0) ( fa_intra_sol_nonprotein; 0) ( fa_intra_RNA_base_phos_atr; 0) ( fa_intra_RNA_base_phos_rep; 0) ( fa_intra_RNA_base_phos_sol; 0) ( fa_atr_dummy; 0) ( fa_rep_dummy; 0) ( fa_sol_dummy; 0) ( fa_vdw_tinker; 0) ( lk_hack; 0) ( lk_ball; 0) ( lk_ball_wtd; 0) ( lk_ball_iso; 0) ( lk_ball_bridge; 0) ( lk_ball_bridge_uncpl; 0) ( coarse_fa_atr; 0) ( coarse_fa_rep; 0) ( coarse_fa_sol; 0) ( coarse_beadlj; 0) ( mm_lj_intra_rep; 0) ( mm_lj_intra_atr; 0) ( mm_lj_inter_rep; 0) ( mm_lj_inter_atr; 0) ( mm_twist; 0) ( mm_bend; 0) ( mm_stretch; 0) ( lk_costheta; 0) ( lk_polar; 0) ( lk_nonpolar; 0) ( lk_polar_intra_RNA; 0) ( lk_nonpolar_intra_RNA; 0) ( fa_elec; 0) ( fa_elec_bb_bb; 0) ( fa_elec_bb_sc; 0) ( fa_elec_sc_sc; 0) ( fa_intra_elec; 0) ( h2o_hbond; 0) ( dna_dr; 0) ( dna_b

53.95686106083425

In [26]:
%%timeit

# Use the basic_folding function in the previous chapter,
# overwrite your scoring subroutine, and run the program.

def score_ca(your_pose):
  return sfxn2(your_pose)

def decision(before_pose, after_pose):
    score_before = score_ca(before_pose)
    score_after = score_ca(after_pose)
    delta_E = score_after - score_before
    P = math.exp(-delta_E / 1)

    if delta_E < 0:
      return after_pose
    elif random.uniform(0, 1) > P:
      return before_pose
    else:
      return after_pose

def basic_folding(your_pose):
    """Your basic folding algorithm that completes 100 Monte-Carlo iterations on a given pose"""
    
    lowest_pose = Pose() # create an empty pose for tracking the lowest energy pose.
    lowest_pose.assign(your_pose) # assign the first move to the pose

    for i in range(100):
      # keep track of the before pose
      before_pose = Pose().assign(your_pose) 

      # applying the random move to the pose
      # deciding whether to keep the change or not
      after_pose = Pose().assign(randTrial(your_pose))
      your_pose.assign(decision(before_pose, after_pose))

      if score_ca(your_pose) < score_ca(lowest_pose):
        lowest_pose.assign(your_pose)

      print("Iteration #{} | score of before: {} | score of after: {}".format(i, score_ca(before_pose), score_ca(after_pose)))
    
    return lowest_pose

pose_after = basic_folding(polyA)
pmm.apply(pose_after)

Iteration #0 | score of before: 52.94752900763305 | score of after: 52.18930255311353
Iteration #1 | score of before: 52.18930255311353 | score of after: 52.21480930509638
Iteration #2 | score of before: 52.21480930509638 | score of after: 51.42326984702617
Iteration #3 | score of before: 51.42326984702617 | score of after: 49.68138853711923
Iteration #4 | score of before: 49.68138853711923 | score of after: 49.715040555143176
Iteration #5 | score of before: 49.715040555143176 | score of after: 49.86463079047528
Iteration #6 | score of before: 49.86463079047528 | score of after: 50.185403391238424
Iteration #7 | score of before: 50.185403391238424 | score of after: 51.06102241463212
Iteration #8 | score of before: 51.06102241463212 | score of after: 51.68450873851033
Iteration #9 | score of before: 51.06102241463212 | score of after: 49.732631533156834
Iteration #10 | score of before: 49.732631533156834 | score of after: 49.70606585315153
Iteration #11 | score of before: 49.70606585315

### Note about `Movers`

Not counting the `PyMOLMover`, which is a special case, `SwitchResidueTypeSetMover` is the first example we have seen of a `Mover` class in PyRosetta. Every `Mover` object in PyRosetta has been designed to apply specific and complex changes (or “moves”) to a `pose`. Every `Mover` must be “constructed” and have any options set before being applied to a `pose` with the `apply()` method. `SwitchResidueTypeSetMover` has a relatively simple construction with only the single option `"centroid"`. (Some `Movers`, as we shall see, require no options and are programmed to operate with default values).

## Protein Fragments


Look at the provided `3mer.frags` fragments. These fragments are generated from the Robetta server (http://robetta.bakerlab.org/fragmentsubmit.jsp) for a given sequence. You should see sets of three-lines describing each fragment.

**Questions:** For the first fragment, which PDB file does it come from? Is this fragment helical, sheet, in a loop, or a combination? What are the φ, ψ, and ω angles of the middle residue of the first fragment window?

Create a new subroutine in your folding code for an alternate random move based upon a “fragment insertion”. A fragment insertion is the replacement of the torsion angles for a set of consecutive residues with new torsion angles pulled at random from a fragment library file. Prior to calling the subroutine, load the set of fragments from the fragment file:

```
from pyrosetta.rosetta.core.fragment import *
fragset = ConstantLengthFragSet(3)
fragset.read_fragment_file("3mer.frags")
```

In [30]:
from pyrosetta.rosetta.core.fragment import *
fragset = ConstantLengthFragSet(3)
fragset.read_fragment_file("inputs/3mer.frags")


core.fragments.ConstantLengthFragSet: finished reading top 200 3mer fragments from file inputs/3mer.frags


### Using FragmentMover and MoveMap

Next, we will construct another `Mover` object — this time a `FragmentMover` — using the above fragment set and a `MoveMap` object as options. A `MoveMap` specifies which degrees of freedom are allowed to change in the `pose` when the `Mover` is applied (in this case, all backbone torsion angles):

```
from pyrosetta.rosetta.protocols.simple_moves import ClassicFragmentMover
movemap = MoveMap()
movemap.set_bb(True)
mover_3mer = ClassicFragmentMover(fragset, movemap)
```

In [31]:
from pyrosetta.rosetta.protocols.simple_moves import ClassicFragmentMover
movemap = MoveMap()
movemap.set_bb(True)
mover_3mer = ClassicFragmentMover(fragset, movemap)

Note that when a MoveMap is constructed, all degrees of freedom are set to False initially. If you still have a *PyMOL_Mover* instantiated, you can quickly visualize which degrees of freedom will be allowed by sending your move map to PyMOL with 

```
test_pose = pyrosetta.pose_from_sequence("RFPMMSTFKVLLCGAVLSRIDAG")
pmm.apply(test_pose)
pmm.send_movemap(test_pose, movemap)
```

In [32]:
test_pose = pyrosetta.pose_from_sequence("RFPMMSTFKVLLCGAVLSRIDAG")
pmm.apply(test_pose)
pmm.send_movemap(test_pose, movemap)

Each time this mover is applied, it will select a random 3-mer window and insert only the backbone torsion angles from a random matching fragment in the fragment set. Here is an example using the above `test_pose`:

```
mover_3mer.apply(test_pose)
pmm.apply(test_pose)
```

In [33]:
mover_3mer.apply(test_pose)
pmm.apply(test_pose)

### Exercise 2: Fragment Folding Algorithm
**Question:** When you change your random move in your poly-alanine folding algorithm to a fragment insertion, how much faster is your protocol? Does it converge to a protein-like conformation more quickly?

In [36]:
polyA = pose_from_sequence("A"*10)

polyA.pdb_info().name("polyA")

def randTrial(your_pose):

    mover_3mer.apply(test_pose)
    pmm.apply(your_pose)

    return your_pose

sfxn = get_fa_scorefxn()

def score(your_pose):
    return sfxn(your_pose)

def decision(before_pose, after_pose):
    score_before = score(before_pose)
    score_after = score(after_pose)
    delta_E = score_after - score_before
    P = math.exp(-delta_E / 1)

    if delta_E < 0:
      return after_pose
    elif random.uniform(0, 1) > P:
      return before_pose
    else:
      return after_pose

def basic_folding(your_pose):
    """Your basic folding algorithm that completes 100 Monte-Carlo iterations on a given pose"""
    
    lowest_pose = Pose() # create an empty pose for tracking the lowest energy pose.
    lowest_pose.assign(your_pose) # assign the first move to the pose

    for i in range(100):
      # keep track of the before pose
      before_pose = Pose().assign(your_pose) 

      # applying the random move to the pose
      # deciding whether to keep the change or not
      after_pose = Pose().assign(randTrial(your_pose))
      your_pose.assign(decision(before_pose, after_pose))

      if score(your_pose) < score(lowest_pose):
        lowest_pose.assign(your_pose)

      print("Iteration #{} | score of before: {} | score of after: {}".format(i, score(before_pose), score(after_pose)))
    
    return lowest_pose

after_pose = basic_folding(polyA)
pmm.apply(after_pose)

core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
Iteration #0 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #1 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #2 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #3 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #4 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #5 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #6 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #7 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #8 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #9 | score of before: 29.96724428561499 | score of after: 29.96724428561499
Iteration #10 | score of before: 29.96724428561499 | score of after: 29.9672442856

### Programming Exercises

- Fold a 10-mer poly-alanine using 100 independent trajectories, using any variant of the folding algorithm that you like. (A trajectory is a path through the conformation space traveled during the calculation. The end result of each independent trajectory is called a “decoy”. Given enough sampling, the lowest energy decoy may correspond to the global minimum.) Create a Ramachandran plot using the lowest-scoring conformations (decoys) from all 100 independent trajectories. Repeat this for a 10-mer poly-glycine. How do the plots differ? Compare with the plots in Richardson’s article.


- Test your folding program’s ability to predict a real fold from scratch. Choose a small protein to keep the computation time down, such as Hox-B1 homeobox protein (1B72) or RecA (2REB). How many iterations and how many independent trajectories do you need to run to find a good structure?


- Modify your folding program to include a simulated annealing temperature schedule, decaying exponentially from kT = 100 to kT = 0.1 over the course of the search. Again, fold a test protein. Does this approach work better?
Modify your folding program to remove the Metropolis criterion and instead accept trial moves only when the energy decreases. Plot energy vs. iteration and examine the final output structures from multiple runs. How is the convergence and performance affected? Why?


### Thought Questions

- **[Introductory]** What are the limitations of these types of folding algorithms?


- **[Advanced]** How might you design an intermediate-resolution representation of side chains that has more detail than the centroid approach yet is faster than the full-atom approach? Which types of residues would most benefit from this type of representation?