<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

# Protein folding
Keywords: pose_from_sequence(), random move, scoring move, Metropolis, assign(), Pose()

Any folding algorithm requires…


- …a search strategy, an algorithm to generate many candidate structures (or decoys) and…


- …a scoring function to discriminate near-native structures from all the others.


In this workshop you will write your own Monte Carlo protein folding algorithm from scratch, and we will explore a couple of the tricks used by Simons et al. (1997, 1999) to speed up the folding search.


### Suggested Reading
1. A. V. Finkelstein _et al._, "Protein folding problem: enigma, paradox, solution," **Biophys. Rev.** 14, 1255-1272 (2022).

### Other Suggested Readings
2. K. T. Simons _et al._, “Assembly of Protein Structures from Fragments,” **J. Mol. Biol.**
268, 209-225 (1997).
3. K. T. Simons _et al._, “Improved recognition of protein structures,” **Proteins** 34, 82-95
(1999).
4. Chapter 4 (Monte Carlo methods) of M. P. Allen & D. J. Tildesley, **Computer Simulation of Liquids**, Oxford University Press, 1989.

## Basic Folding Algorithm

In [1]:
# To run this notebook in colab
!pip install pyrosettacolabsetup
import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()
import pyrosetta; pyrosetta.init()
from pyrosetta import *

Collecting pyrosettacolabsetup
  Downloading pyrosettacolabsetup-1.0.9-py3-none-any.whl.metadata (294 bytes)
Downloading pyrosettacolabsetup-1.0.9-py3-none-any.whl (4.9 kB)
Installing collected packages: pyrosettacolabsetup
Successfully installed pyrosettacolabsetup-1.0.9
Mounted at /content/google_drive

Note that USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE.
See https://github.com/RosettaCommons/rosetta/blob/main/LICENSE.md or email license@uw.edu for details.

Looking for compatible PyRosetta wheel file at google-drive/PyRosetta/colab.bin//wheels...
Found compatible wheel: /content/google_drive/MyDrive/PyRosetta/colab.bin/wheels//content/google_drive/MyDrive/PyRosetta/colab.bin/wheels/pyrosetta-2024.19+release.a34b73c40f-cp310-cp310-linux_x86_64.whl


┌──────────────────────────────────────────────────────────────────────────────┐
│                                 PyRosetta-4                                  │
│              Created in JHU by Sergey Lyskov 

In [None]:
# do not run if using colab
from pyrosetta import *
from pyrosetta.teaching import *
init()
import os
notebook_path = os.path.abspath("clase3-protein_folding.ipynb")

### Building the pose
In this workshop, you will be folding a 10 residue protein by building a simple de novo folding algorithm. Start by initializing PyRosetta as usual.

Create a simple poly-alanine `pose` with 10 residues for testing your folding algorithm. Store the pose in a variable called "polyA."

In [2]:
polyA = pyrosetta.pose_from_sequence('A' * 10)

polyA.pdb_info().name("polyA")

core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 985 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 0.961967 seconds.


__Question:__
Check the backbone dihedrals of a few residues (except the first and last) using the `.phi()` and `.psi()` methods in `Pose`. What are the values of $\phi$ and $\psi$ dihedrals? You should see ideal bond lengths and angles, but the dihedrals may not be as realistic.

In [3]:
print("phi: %i" %polyA.phi(9))
print("psi: %i" %polyA.psi(9))

phi: 180
psi: 180


We may want to visualize folding as it happens. Before starting with the folding protocol, instantiate a PyMOL mover and use a UNIQUE port number between 10,000 and 65,536. We will retain history in order to view the entire folding process by utilizing the `.keep_history()` method. Make sure it says `PyMOL <---> PyRosetta link started!` on its command line.

In [4]:
pmm = PyMOLMover()
pmm.keep_history(True)

Use the PyMOL mover to view the `polyA` `Pose`. You should see a long thread-like structure in PyMOL.

In [5]:
pmm.apply(polyA)

### Building A Basic *de Novo* Folding Algorithm

Now, write a program that implements a Monte Carlo algorithm to optimize the protein conformation. You can do this here in the notebook, or you may use a code editor to write a `.py` file and execute in a Python or iPython shell.  

Our main program will include 100 iterations of making a random trial move, scoring the protein, and accepting/rejecting the move. Therefore, we can break this algorithm down into three smaller subroutines: **random, score, and decision.**

#### Step 1: Random Move

For the **random** trial move, write a subroutine to choose one residue at random using `random.randint()` and then randomly perturb either the φ or ψ angles by a random number chosen from a Gaussian distribution. Use the Python built-in function `random.gauss()` from the `random` library with a mean of the current angle and a standard deviation of 25°. After changing the torsion angle, use `pmm.apply(polyA)` to update the structure in PyMOL.

Hint: Ask GPT-4

``` python
import random
import math

def perturb_angle(angles):
    # Choose a random residue
    residue_index = random.randint(0, len(angles) - 1)
    
    # Choose whether to perturb φ or ψ angle
    angle_choice = random.choice(['φ', 'ψ'])
    
    # Get the current angle
    current_angle = angles[residue_index][angle_choice]
    
    # Calculate the perturbed angle using a Gaussian distribution
    mean = current_angle
    stddev = 25
    perturbed_angle = random.gauss(mean, stddev)
    
    # Update the angle in the angles list
    angles[residue_index][angle_choice] = perturbed_angle
    
    return angle`

```

In [None]:
import math
import random

def randTrial(your_pose):
    randNum = random.randint(2, your_pose.total_residue())
    currPhi = your_pose.phi(randNum)
    currPsi = your_pose.psi(randNum)
    newPhi = random.gauss(currPhi, 25)
    newPsi = random.gauss(currPsi, 25)
    your_pose.set_phi(randNum,newPhi)
    your_pose.set_psi(randNum,newPsi)
    #pmm.apply(your_pose)
    return your_pose

In [None]:
# Defino los angulos phi/psi cercanos a una helice alfa

import math
import random

def randTrial_alpha(your_pose):
    randNum = random.randint(2, your_pose.total_residue())
    currPhi = your_pose.phi(randNum)
    currPsi = your_pose.psi(randNum)
    newPhi = random.gauss(-60, 25)
    newPsi = random.gauss(-50, 25)
    #newPhi = random.gauss(currPhi, 25)
    #newPsi = random.gauss(currPsi, 25)
    your_pose.set_phi(randNum,newPhi)
    your_pose.set_psi(randNum,newPsi)
    #pmm.apply(your_pose)
    return your_pose

#### Step 2: Scoring Move

For the **scoring** step, we need to create a scoring function and make a subroutine that simply returns the numerical energy score of the pose.

In [None]:
sfxn = get_fa_scorefxn()

def score(your_pose):
    return sfxn(your_pose)

#### Step 3: Accepting/Rejecting Move
For the **decision** step, we need to make a subroutine that either accepts or rejects the new conformatuon based on the Metropolis criterion. The Metropolis criterion has a probability of accepting a move as $P = \exp( -\Delta G / kT )$. When $ΔE ≥ 0$, the Metropolis criterion probability of accepting the move is $P = \exp( -\Delta G / kT )$. When $ΔE < 0$, the Metropolis criterion probability of accepting the move is $P = 1$. Use $kT = 1$ Rosetta Energy Unit (REU).

In [None]:
def decision(before_pose, after_pose):
    E = score(after_pose) - score(before_pose)
    if E < 0:
        return after_pose
    elif random.uniform(0, 1) >= math.exp(-E/1):
        return before_pose
    else:
        return after_pose

#### Step 4: Execution
Now we can put these three subroutines together in our main program! Write a loop in the main program so that it performs 100 iterations of: making a random trial move, scoring the protein, and accepting/rejecting the move.

After each iteration of the search, output the current pose energy and the lowest energy ever observed. **The final output of this program should be the lowest energy conformation that is achieved at *any* point during the simulation.** Be sure to use `low_pose.assign(pose)` rather than `low_pose = pose`, since the latter will only copy a pointer to the original pose.

In [None]:
def basic_folding(your_pose, outputfile):
    """Your basic folding algorithm that completes 100 Monte-Carlo iterations on a given pose"""

    lowest_pose = Pose() # Create an empty pose for tracking the lowest energy pose.

    for i in range(100000):
        if i == 0:
            lowest_pose.assign(your_pose)

        before_pose = Pose()
        before_pose.assign(your_pose) # keep track of pose before random move

        after_pose = Pose()
        after_pose.assign(randTrial(your_pose)) # do random move and store the pose

        your_pose.assign(decision(before_pose, after_pose)) # keep the new pose or old pose

        if score(your_pose) < score(lowest_pose): # updating lowest pose
            lowest_pose.assign(your_pose)
        if (i % 100==0):
            pmm.apply(your_pose)
            print("Iteration # %i" %i) # output
            print("Current pose score: %1.3f" %score(your_pose)) # output
            print("Lowest pose score: %1.3f" %score(lowest_pose)) # output
    dump_pdb(lowest_pose, outputfile)
    return lowest_pose

In [None]:
basic_folding(polyA, "polyA-freefold.pdb")

## Exercise 1: Comparing to Alpha Helices
Write a new program that nudges the $A_{10}$ sequence to fold into an ideal α-helix. Compare the final poses (the one from your previous program and the new "nudged" into α-helix in PyMOL and their scores.  Is your initial program working? Has it converged to a good solution?

In [None]:
from pyrosetta import *
from pyrosetta.teaching import *
init()

# creo la pose inicial para A_10
polyA = pyrosetta.pose_from_sequence('A' * 10)
polyA.pdb_info().name("polyA")

#inicio PyMOL y lanzo la pose inicial
pmm = PyMOLMover()
pmm.keep_history(True)
pmm.apply(polyA)



# Defino los angulos phi/psi cercanos a una helice alfa (60/50)

import math
import random

def randTrial_alpha(your_pose):
    randNum = random.randint(2, your_pose.total_residue())
    currPhi = your_pose.phi(randNum)
    currPsi = your_pose.psi(randNum)
    newPhi = random.gauss(-60, 25)
    newPsi = random.gauss(-50, 25)
    #newPhi = random.gauss(currPhi, 25)
    #newPsi = random.gauss(currPsi, 25)
    your_pose.set_phi(randNum,newPhi)
    your_pose.set_psi(randNum,newPsi)
    #pmm.apply(your_pose)
    return your_pose

# Scoring step
sfxn = get_fa_scorefxn()

def score(your_pose):
    return sfxn(your_pose)

# Accepting/Rejecting Move
def decision(before_pose, after_pose):
    E = score(after_pose) - score(before_pose)
    if E < 0:
        return after_pose
    elif random.uniform(0, 1) >= math.exp(-E/1):
        return before_pose
    else:
        return after_pose

# Execution

def alpha_folding(your_pose, outputfile):
    """Your basic folding algorithm that completes 100 Monte-Carlo iterations on a given pose"""

    lowest_pose = Pose() # Create an empty pose for tracking the lowest energy pose.

    for i in range(100000):
        if i == 0:
            lowest_pose.assign(your_pose)

        before_pose = Pose()
        before_pose.assign(your_pose) # keep track of pose before random move

        after_pose = Pose()

        after_pose.assign(randTrial_alpha(your_pose)) # do random move close to alpha helix and store the pose

        your_pose.assign(decision(before_pose, after_pose)) # keep the new pose or old pose

        if score(your_pose) < score(lowest_pose): # updating lowest pose
            lowest_pose.assign(your_pose)
        if (i % 100==0):
            pmm.apply(your_pose)
            print("Iteration # %i" %i) # output
            print("Current pose score: %1.3f" %score(your_pose)) # output
            print("Lowest pose score: %1.3f" %score(lowest_pose)) # output
    dump_pdb(lowest_pose, outputfile)
    return lowest_pose

In [None]:
alpha_folding(polyA, "polyA-alphafold.pdb")

This script implements the Monte Carlo algorithm to fold the polyA peptide. It takes ca. 10 minutes to run in a laptop

```python
# import rosetta libraries and initialise
from pyrosetta import *
from pyrosetta.rosetta import *
from pyrosetta.teaching import *
init()


p = pose_from_pdb("mc_initial.pdb")

# set up scrore function
scorefxn = get_score_function(True)

#scorefxn.set_weight(hbond_lr_bb, 1.0)
#scorefxn.set_weight(vdw, 1.0)


# set uo simulation parameters. The MonteCarlo object is initialized with a score function to calculate the energy, a pose object to serve as a reference structure, and the temperature, which is used in the Metropolis Criterion.
# Within this function, the energy of inputted pose is calculated using the score function and compared to the energy of the last accepted pose object. The Metropolis criterion is applied to the pose; if the move was accepted than the inputted pose remains unchanged, and the last accepted pose, within the MonteCarlo object, is updated. If the move was rejected, the inputted pose is switched to the last accepted pose, and the last accepted pose is unchanged.

ncycles = 5000000
kT = 1.0
mc = MonteCarlo(p, scorefxn, kT)

# set up conformational search space
movemap = MoveMap()
movemap.set_bb(True)

#set up movers
small_mover = SmallMover(movemap, kT, 5)

# run simulation
for i in range(1, ncycles):
	small_mover.apply(p)
	mc.boltzmann(p)

#output lowest energy structure. The lowest energy structure assessed by the MonteCarlo object can be accessed as well. The lowest energy structure is not only recovered at the end of the simulation, but often intermittently throughout the simulation as well.
mc.recover_low(p)
dump_pdb(p, "mc_final.pdb")
```