# Protein and Genetic Engineering

### P3 - Protein folding and design

#### Introduction

Protein folding and design are two faces of the same coin. The former defines the problem of finding the path to the native structure given a fixed sequence. The latter is the problem of finding the sequence that can adopt a specific fold. Although they have this semantical relationship, computationally, they are tackled very differently. However, they still maintain a very similar recipe: An optimization algorithm that evaluates the perturbation done to a given protein structure with a score function. 

In this practice session, we will write two algorithms to generate trajectories for the protein folding problem and a protein design method that optimizes a given structure in the protein sequence space. 


#### Importing and initializing Rosetta

First, we start by importing the library's content in our Jupyter notebook:

In [None]:
from pyrosetta import *
from pyrosetta.teaching import *
init()

### Create an extended pose from sequence

We start by creating a 10-residue alanine-only Pose: 

In [None]:
# Create a 10 residue poly A Pose
polyA = pyrosetta.pose_from_sequence('A' * 10)

# Give the name of polyA to our Pose
polyA.pdb_info().name("polyA")

We are going to check the values of ϕ and ψ dihedrals in this newly created Pose (from a sequence), for all the residues in it:

In [None]:
for i in range( 1 , polyA.total_residue()  + 1 ):
    print("Residue %s, phi: %i" %(i,polyA.phi(i)))
    print("Residue %s, psi: %i" %(i,polyA.psi(i)))
    print()

### Create the Pymol mover 

We are going to create a Pymol mover to be able to visualize our folding algorithm:

In [None]:
# Create Pymol Mover instance
pymol_mover = PyMOLMover()
pymol_mover.keep_history(True) # Keep the history of all we send as different frames in Pymol

We now send our Pose to Pymol:

In [None]:
pymol_mover.apply(polyA)

### Building a simple folding algorithm

In previous practical sessions, we have written a random mover to perturb our sequence; we are going to use this method together with the MonteCarlo sampler from our previous session:  

In [None]:
import numpy as np

In [None]:
def perturb_random_angle(pose, max_rot=25):
    
    # Define the perturbation magnitude
    magnitude = np.random.uniform(low=-max_rot, high=max_rot)
    
    #Chose a random angle to perturb between phi and psi
    angle = np.random.choice(['phi', 'psi'])
    
    # Choose a random residue to perturb
    residues = range( 1 , pose.total_residue()  + 1 )
    residue = np.random.choice(residues)
    
    # Perturb the selected angle by the defined magnitude
    if angle == 'phi':
        orig_phi = pose.phi(residue)
        new_phi = orig_phi+magnitude
        
        # Keep phi value in the -180 tp 180 range
        if new_phi > 180:
            new_phi -= 360
        elif new_phi <= -180:
            new_phi += 360
        pose.set_phi(residue, new_phi)
        
    elif angle == 'psi':
        orig_psi = pose.psi(residue)
        new_psi = orig_psi+magnitude
        
        # Keep psi value in the -180 tp 180 range
        if new_psi > 180:
            new_psi -= 360
        elif new_psi <= -180:
            new_psi += 360
        pose.set_psi(residue, new_psi)

In [None]:
def monteCarlo(pose, score_function, temperature=0.5):
    
    # Get the current energy of the pose
    E0 = score_function(pose)
    
    # Create a copy of the pose
    clone_pose = Pose()
    clone_pose.assign(pose)
    
    # Apply perturbation to cloned pose
    perturb_random_angle(clone_pose)
    
    # Evaluate energy of the perturbed pose
    E1 = score_function(clone_pose)
    
    # Calculate the acceptance probability
    P = np.min([1, np.exp(-(E1-E0)/temperature)])
    
    if P >= np.random.uniform(low=0, high=1.0):
        pose.assign(clone_pose)
        
        return 1
        
    return 0

Let's define our all-atom score function and declare our folding method: 

In [None]:
sfxn = get_score_function(True)

In [None]:
# Create a 10 residue poly A Pose
polyA = pyrosetta.pose_from_sequence('A' * 10)

# Give the name of polyA to our Pose
polyA.pdb_info().name("polyA")

# Store energies into a list
energies = []

n_steps = 250000
accepted_steps = 0

# Create pose to store best sampled result
best = Pose()
best.assign(polyA)
Eb = sfxn(best)

for i in range(n_steps): 
    
    # Apply the mover with the MC criterion
    accepted = monteCarlo(polyA, sfxn)
    
    # Save structure every 1000 steps
    if i % 1000 == 0:
        # Send structure to Pymol
        pymol_mover.apply(polyA)
    
    # Get the pose energy 
    E = sfxn(polyA)
    energies.append(E)
    
    if accepted:
        
        # Add one to the number of accepted steps
        accepted_steps += 1

        # Save pose if best stored result is lower in energy
        if E < Eb:
            best.assign(polyA)
            Eb = E
    
print('Accepted fraction %s' % (accepted_steps/n_steps))

Besides visualizing our results, we can plot the energy progression of our protocol.

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(energies)
plt.title('Random Sampling')
plt.xlabel('Step')
plt.ylabel('Energy [kcal/mol]')

We print the best energy explored by our random folder: 

In [None]:
print('The energy of the best conformation is: %.4f kcal/mol' % Eb)

### Folding from an alpha helix

We are going to repeat our folding procedure, but now starting from a helical conformation. For that, before folding, we create our poly-alanine pose and change all the backbone torsional values to alpha-helix ideal values (ϕ=-57°, ψ=-47°):

In [None]:
# Create a 10 residue poly A Pose
polyA = pyrosetta.pose_from_sequence('A' * 10)

# Give the name of polyA to our Pose
polyA.pdb_info().name("polyA")

# Set phi and psi angles to ideal alpha-helix values
for i in range( 1 , polyA.total_residue()  + 1 ):
    polyA.set_phi(i, -57)
    polyA.set_psi(i, -47)
    
# We send our pose to Pymol
pymol_mover.apply(polyA)

# Store energies into a list
energies = []

n_steps = 250000
accepted_steps = 0

# Create pose to store best sampled result
best = Pose()
best.assign(polyA)
Eb = sfxn(best)

for i in range(n_steps): 
    
    # Apply the mover with the MC criterion
    accepted = monteCarlo(polyA, sfxn)
    
    # Save structure every 1000 steps
    if i % 1000 == 0:
        # Send structure to Pymol
        pymol_mover.apply(polyA)
    
    # Get the pose energy 
    E = sfxn(polyA)
    energies.append(E)
    
    if accepted:
        
        # Add one to the number of accepted steps
        accepted_steps += 1

        # Save pose if best stored result is lower in energy
        if E < Eb:
            best.assign(polyA)
            Eb = E
    
print('Accepted fraction %s' % (accepted_steps/n_steps))

In [None]:
plt.plot(energies)
plt.title('Random Sampling from an helical conformation')
plt.xlabel('Step')
plt.ylabel('Energy [kcal/mol]')

In [None]:
print('The energy of the best conformation is: %.4f kcal/mol' % Eb)

How does this energy compare to our previously sampled energy?

### Low resolution representation

The ϕ and ψ angles are not the only torsional values that need to be optimized; for most amino acids, their sidechains have one or several χ torsions, which are also essential degrees of freedom (DOF). It is common to use a coarse-grained representation of the protein's residues to simplify the earlier stages in protein folding. A popular choice represents the sidechains of each residue as a sphere centered in the CB atom (or CA for Gly) of the sidechain. Using a specialized score function that captures the protein behavior with this [granularity](https://en.wikipedia.org/wiki/Granularity), it is possible to simplify the protein representation significantly, and therefore, the number of DOF to optimize.

Rosetta Centroid (coarse-grained) representation is employed in the earlier stages of optimization to accelerate the exploration of conformational space. Let's create an extended version of the human insulin (B-chain) peptide:

In [None]:
# Create a 10 residue poly A Pose
insulin = pyrosetta.pose_from_sequence('FVNQHLCGSHLVEALYLVCGERGFFYTPKT')

# Give the name of polyA to our Pose
insulin.pdb_info().name("insulin")

In [None]:
pymol_mover.apply(insulin)

We can convert this full-atom Pose into a centroid representation by using the SwitchResidueTypeSetMover:

In [None]:
# Declare an instance of the SwitchResidueTypeSetMover mover:
switchToCentroid = SwitchResidueTypeSetMover("centroid")

# Switch insulin representation to centroid mode
switchToCentroid.apply(insulin)

In [None]:
# Send the centroid Pose to Pymol
pymol_mover.apply(insulin)

### Low resolution score function

To score the low-resolution Pose, we need to use a specific score function to treat the system's interactions correctly. In our case, we are going to the default Rosetta centroid score function ('score3') 

In [None]:
cen_sfxn = pyrosetta.create_score_function("score3")
cen_sfxn(insulin)

Before we proceed we are going to calculate the Alpha-Carbon (CA) atoms RMSD of the pose (target) to the native (reference) structure. The RMSD formula is as follow:

$ RMSD_{CA} = \sqrt{\frac{1}{N}\sum_{i}^N{\delta_i^2}}$

$N$ is the number of residues, $i$ is the index of the ith CA atom, and $\delta_i$ is the distance between the CA-atom coordinates in the reference and target structure.

The RMSD is a structural distance that measures the degree of similitude between two conformations of related proteins (usually the same protein, but not necessarily). 

Luckily, PyRosetta already has a function to calculate this value, and we use it to estimate the RMSD of our insulin extended conformation to the native structure.

In [None]:
# Load the insulin peptide
insulin_native = pose_from_pdb('input/insulin_native.pdb')

# Calculate RMSD for the alpha-carbon atoms for the extended and the native Poses
CA_rmsd(insulin, insulin_native)

### Insuling simple folding

Let's sample the insulin peptide folding landscape this score function:

In [None]:
# Create a 10 residue poly A Pose
insulin = pyrosetta.pose_from_sequence('FVNQHLCGSHLVEALYLVCGERGFFYTPKT')

# Give the name of polyA to our Pose
insulin.pdb_info().name("insulin")

# Declare an instance of the SwitchResidueTypeSetMover mover:
switchToCentroid = SwitchResidueTypeSetMover("centroid")

# Switch insulin representation to centroid mode
switchToCentroid.apply(insulin)

# Score insulin with the centroid score function
cen_sfxn(insulin)

# Send the centroid Pose to Pymol
pymol_mover.apply(insulin)

# Store energies into a list
energies = []

# Store RMSD into a list
rmsd = []

n_steps = 250000
accepted_steps = 0

# Create pose to store best sampled result
best = Pose()
best.assign(insulin)
Eb = cen_sfxn(best)

for i in range(n_steps): 
    
    # Apply the mover with the MC criterion
    accepted = monteCarlo(insulin, cen_sfxn)
    
    # Save structure every 1000 steps
    if i % 1000 == 0:
        # Send structure to Pymol
        pymol_mover.apply(insulin)
        
        # Save RMSD 
        rmsd.append(CA_rmsd(insulin, insulin_native))
    
    # Get the pose energy 
    E = cen_sfxn(insulin)
    energies.append(E)
    
    if accepted:
        
        # Add one to the number of accepted steps
        accepted_steps += 1

        # Save pose if best stored result is lower in energy
        if E < Eb:
            best.assign(insulin)
            Eb = E
    
print('Accepted fraction %s' % (accepted_steps/n_steps))

You can use the following command in PyMol to align all the frames into the same reference structure:

```
intra_fit insulin
```

We now plot the energy:

In [None]:
plt.plot(energies)
plt.title('Insulin peptide centroid sampling')
plt.xlabel('Step')
plt.ylabel('Energy [kcal/mol]')

Now, besides plotting the energies, we plot the RMSD progression of our simulation:

In [None]:
plt.plot(rmsd)
plt.title('Insulin peptide centroid sampling')
plt.xlabel('Step')
plt.ylabel('RMSD [$\AA$]')

In [None]:
print('The lowest explored CA RMSD is: %s ' % np.min(rmsd))

We observe that in the number of steps we have run our simulation, the energy has not converged into an energy minimum (i.e., the energy is still decreasing). Let's also send our native structure to Pymol to compare the results:

In [None]:
pymol_mover.apply(insulin_native)

Now we execute in PyMol to align the two structures:
```
align insulin, input_insulin_native
```

### Using fragment insertions

To optimally sample the conformational space of our peptide, we are going to employ a fragment-insertion approach. Fragments are the possible configurations that small portions of our target protein can could adopt. They are based on searching the protein database (PDB) for protein segments with similar sequences to our sequence. There is a web server that can generate these fragments:

[RoBetta fragment server](http://old.robetta.org/)

you can find details about the fragment file format at:

http://new.rosettacommons.org/docs/latest/rosetta_basics/file_types/fragment-file

For our test case, we already have the fragment file in our input folder. Let's have a look at the first lines of our file:

In [None]:
with open('input/insulin_03_05.200_v1_3.txt') as ff:
    for i,l in enumerate(ff):
        print(l)
        if i == 8:
            break

We now create a mover capable of inserting a fragment into our Pose. A fragment insertion is simply the setting of the ϕ, ψ and ω torsional angles of a segment of our Pose to the values of a random fragment in the library. We start by reading our fragment file into PyRosetta:

In [None]:
from pyrosetta.rosetta.core.fragment import *

In [None]:
fragset = ConstantLengthFragSet(3)
fragset.read_fragment_file('input/insulin_03_05.200_v1_3.txt')

Now we need to define the mover that would perform the insertion. For that, we need to create a "MoveMap," which is a special kind of object telling the mover which degrees of freedom it can change:

In [None]:
from pyrosetta.rosetta.protocols.simple_moves import ClassicFragmentMover

In [None]:
movemap = MoveMap()
movemap.set_bb(True)
mover_3mer = ClassicFragmentMover(fragset, movemap)

First, let's recreate and send our extended insulin structure into PyMol to observe how the fragment insertion mover works:

In [None]:
# Create an extended Pose for the insulin peptide
insulin = pyrosetta.pose_from_sequence('FVNQHLCGSHLVEALYLVCGERGFFYTPKT')

# Give the name of polyA to our Pose
insulin.pdb_info().name("insulin")

# Declare an instance of the SwitchResidueTypeSetMover mover:
switchToCentroid = SwitchResidueTypeSetMover("centroid")

# Switch insulin representation to centroid mode
switchToCentroid.apply(insulin)

# Score insulin with the centroid score function
cen_sfxn(insulin)

# Send the centroid Pose to Pymol
pymol_mover.apply(insulin)


Now we will perform five fragment insertion moves and send them into PyMol:

In [None]:
for i in range(5):
    
    # Insert random fragment 
    mover_3mer.apply(insulin)
    
    # Send the changed Pose to Pymol
    pymol_mover.apply(insulin)

### Use MC to explore with a mixed random perturbation and a fragment insertion mover

We are going to sample the folding landscape but now using fragment insertions. We will limit the insertions to only 10% of the time; the remaining steps will be random angle perturbations as before. Therefore, we are going to create a new mover that randomly selects which mover to apply (a random perturbation or a fragment insertion) that would be biased only to apply the insertion mover 10% of the time:

In [None]:
def randomPerturbationWithInsertion(pose, insertion_probability=0.1):
    if insertion_probability <= np.random.uniform(low=0, high=1.0):
        mover_3mer.apply(pose)
    else:
        perturb_random_angle(pose)

We also redefine the MC mover to apply the previously defined mover:

In [None]:
def monteCarloWithInsertion(pose, score_function, temperature=0.5):
    
    # Get the current energy of the pose
    E0 = score_function(pose)
    
    # Create a copy of the pose
    clone_pose = Pose()
    clone_pose.assign(pose)
    
    # Apply perturbation with fragment insertion to cloned pose
    randomPerturbationWithInsertion(clone_pose)
    
    # Evaluate energy of the perturbed pose
    E1 = score_function(clone_pose)
    
    # Calculate the acceptance probability
    P = np.min([1, np.exp(-(E1-E0)/temperature)])
    
    if P >= np.random.uniform(low=0, high=1.0):
        pose.assign(clone_pose)
        
        return 1
        
    return 0

Let's now fold the insulin peptide from the extended conformation:

In [None]:
# Create a 10 residue poly A Pose
insulin = pyrosetta.pose_from_sequence('FVNQHLCGSHLVEALYLVCGERGFFYTPKT')

# Give the name of polyA to our Pose
insulin.pdb_info().name("insulin")

# Declare an instance of the SwitchResidueTypeSetMover mover:
switchToCentroid = SwitchResidueTypeSetMover("centroid")

# Switch insulin representation to centroid mode
switchToCentroid.apply(insulin)

# Score insulin with the centroid score function
cen_sfxn(insulin)

# Send the centroid Pose to Pymol
pymol_mover.apply(insulin)

# Store energies into a list
energies = []

# Store RMSD into a list
rmsd = []

n_steps = 250000
accepted_steps = 0

# Create pose to store best sampled result
best = Pose()
best.assign(insulin)
Eb = cen_sfxn(best)

# Create pose to store best rmsd result
best_rmsd = Pose()
best_rmsd.assign(insulin)
RMSDb = CA_rmsd(insulin, insulin_native)

for i in range(n_steps): 
    
    # Apply the mover with insertion accepting with the MC criterion
    accepted = monteCarloWithInsertion(insulin, cen_sfxn)
    
    # Save structure every 1000 steps
    if i % 1000 == 0:
        # Send structure to Pymol
        pymol_mover.apply(insulin)
        
        # Save RMSD 
        rmsd.append(CA_rmsd(insulin, insulin_native))
        
        # Save best RMSD Pose
        if rmsd[-1] < RMSDb:
            best_rmsd.assign(insulin)
            RMSDb = rmsd[-1]
            
    # Get the pose energy 
    E = cen_sfxn(insulin)
    energies.append(E)
    
    if accepted:
        
        # Add one to the number of accepted steps
        accepted_steps += 1

        # Save pose if best stored result is lower in energy
        if E < Eb:
            best.assign(insulin)
            Eb = E
    
print('Accepted fraction %s' % (accepted_steps/n_steps))

In [None]:
plt.plot(energies)
plt.title('Insulin peptide centroid sampling')
plt.xlabel('Step')
plt.ylabel('Energy [kcal/mol]')

In [None]:
plt.plot(rmsd)
plt.title('Insulin peptide centroid sampling')
plt.xlabel('Step')
plt.ylabel('RMSD [$\AA$]')

In [None]:
print('The lowest explored CA RMSD is: %s ' % np.min(rmsd))

In [None]:
pymol_mover.apply(insulin_native)

How does this run compare with the exploration without fragment insertion?

Finally, let's send our best RMSD pose to Pymol:

In [None]:
pymol_mover.apply(best_rmsd)

## Protein design

We have explored some techniques for folding proteins. We now move to optimize the sequence space for a specific protein structure (protein design). We will design a peptide to an SH3 domain by employing a pre-existing structure in the PDB [2DRK](https://www.rcsb.org/structure/2DRK). We have deleted the original sequence information by substituting all the side chains for alanines. 

Let's load this PDB file into a Pose object:

In [None]:
sh3_peptide = pose_from_pdb('input/SH3_polyA.pdb')

# We score our pose 
sfxn(sh3_peptide)

Note that the pose has been minimized before be designed. Why should we do that?

In [None]:
pymol_mover.apply(sh3_peptide)

The algorithm for the design stage will be a MonteCarlo optimization of a combined perturbation mover:

- Small rigid body perturbation
- Random backbone perturbation
- Packing rotamers with design options for the peptide sequence
- Side-chain gradient-based minimization

This combined perturbation should sample different side-chain rotamers at each backbone conformation that will help us select sequences that stabilize our given structure.


We start by defining the respective movers that we will later combine into our MC sampling method. 

### Small rigid body perturbations

We call the RigidBodyPerturbNoCenterMover mover to define small random translations of our peptide. We will limit the peptide translation to a magnitude of 0.2 angstroms. The mover will be applied to the first jump between the two chains, and it will respect the fold tree (see below).

In [None]:
# Define an instance of the RigidBodyPerturbNoCenterMover
rb_mover = rosetta.protocols.rigid.RigidBodyPerturbNoCenterMover()

# Define rotational magnitude to zero (no rotation)
rb_mover.rot_magnitude(0)

# Define translation magnitude to 0.2 Angstroms
rb_mover.trans_magnitude(0.2)

### Random backbone perturbation

Random backbone perturbations will be carried out with the SmallMover. The mover needs to be restricted to the backbone positions of the peptide only, as we would like to modify the BB DOF only. To restrict the perturbations to specific residues, we need to define a MoveMap (MM). The MM defines which residue can be affected by a mover that perturbs the backbone DOFs.

We start by defining the MM:

In [None]:
# This creates a movemap instance. All DOFs are disabled by default.
small_mm = pyrosetta.rosetta.core.kinematics.MoveMap()

# This tells the MM that this stretch of residues can move their backbone DOFs. 
small_mm.set_bb_true_range(60, 70)

Now we can set up the Small move to only perturb those residues defined in the MM:

In [None]:
# Create a new instance of the small mover
small_mover = pyrosetta.rosetta.protocols.simple_moves.SmallMover()

# Define the magnitude of the random perturbation
small_mover.angle_max(10)

# Define the temperature (small mover inner MC that select torsions according to a ramachandran score)
small_mover.temperature(0.5)

# Define the number of small mover attempts that will be carried out
small_mover.nmoves(500)

# Pass the MM to the small mover
small_mover.movemap(small_mm)

### Packing rotamers with design options for the peptide sequence

We have defined the small rigid body translation and the small perturbations to the BB torsions, we move to define the rotamer packer mover. This mover is responsible for the design stage of the peptide; however, we need to be very careful about how we will optimize the side chain's positions at each residue. Depending on this, we can select which positions will be designed, repacked (treated as flexible but not changing their identity), or kept fixed (won't move at all).

Since we are engaged in designing a peptide that can bound the SH3 domain, we will define the peptide positions as designable, the neighboring receptor positions (those close to the peptide) as repackable, and the remainder as fixed. To define this in PyRosetta, we use TaskOperations (TO), which will define the behavior of the packer mover. We also employ residue selectors to specify the set of residues affected by the TO dynamically.

Let's define first the residue selectors:

In [None]:
# Import all needed objects from PyRosetta
from rosetta.core.select import residue_selector as selections

from rosetta.core.pack.task import TaskFactory
from rosetta.core.pack.task import operation

from rosetta.core import select

In [None]:
# This selects all the receptor residues (first chain)
receptor_selector = selections.ChainSelector(1)

# This selects all the peptide residues (second chain)
peptide_selector = selections.ChainSelector(2)

# This creates an instance of the NeighborhoodResidueSelector
nbr_selector = selections.NeighborhoodResidueSelector()

# This tell the NeighborhoodResidueSelector to select residues close to the peptide
nbr_selector.set_focus_selector(peptide_selector)

# This includes the peptide residues in the selection
nbr_selector.set_include_focus_in_subset(True)

We can explore which residues are affected in our Pose by the defined residue selectors:

In [None]:
for i in select.get_residue_set_from_subset(peptide_selector.apply(sh3_peptide)):
    print(i)

In [None]:
for i in select.get_residue_set_from_subset(nbr_selector.apply(sh3_peptide)):
    print(i)

We can refer to the specific subsets using residue selectors; we set their behavior by defining TO upon them. One generic class of TO is the Residue Level Task (RLT) operations, which define specific behaviors for residues. Let's define the ones we are going to need:

In [None]:
# Define RLT to prevent repacking of residues (fixed side chains)
prevent_repacking_rlt = operation.PreventRepackingRLT()

# Define RLT to only repack residues (movable side chains but fixed sequence)
restrict_repacking_rlt = operation.RestrictToRepackingRLT()

# Define RLT to design residues (designable positions)
restrict_design_rlt = operation.ExtraRotamersGenericRLT()

Now we need to assign these behaviors to specific residues in the complex. For that, we use the TO OperateOnResidueSubset, which associates a TO with a residue selector. We create one per each instruction we need:

In [None]:
# This prevents the repacking (fix) of the selected residues. 
# The True given to the function means to invert the residue selection.
# Therefore, everything that is not peptide or it's neighbours will not be moved.
prevent_subset_repacking = operation.OperateOnResidueSubset(prevent_repacking_rlt, nbr_selector, True)

# This will allow receptor residues to only be repackable.
restrict_subset_to_repacking = operation.OperateOnResidueSubset(restrict_repacking_rlt, receptor_selector)

How do we give these TO definitions to the packer mover?

We do so by the use of a TaskFactory (TF). The TF allows to update the residues selection and TO every time the mover is called. Therefore it allows to adapt dynamically to changes in the Pose that affect the residue selectors, and therefore change which residues will be affected by the TOs paired with it. 

We start by creating the TF and give the specific TO one by one. After each time, we are going to query our Pose to print how the current TF will affect the packing of the residues in it:

In [None]:
tf = TaskFactory()

In [None]:
print(tf.create_task_and_apply_taskoperations(sh3_peptide))

All positions are defined as designable by a newly created TF. We now fix residues not covered by the nbr_selector:

In [None]:
tf.push_back(prevent_subset_repacking)

In [None]:
print(tf.create_task_and_apply_taskoperations(sh3_peptide))

We can see how some residues won't move when the packer is called upon the Pose. Finally, we set the receptor residues not affected by the first TO to only be repacked. 

In [None]:
tf.push_back(restrict_subset_to_repacking)

In [None]:
print(tf.create_task_and_apply_taskoperations(sh3_peptide))

The TF push_back() method puts the last given TO at the beginning of the list. This reordering means that the order in which they will be applied is reversed. Think about how the positions are affected by the order in which the TOs are applied.

We observe that our TF have now the expected behavior, so we are ready to pass it to the packer mover:

In [None]:
# Creates an instance of the pack rotamer mover
pack_rotamers = pyrosetta.rosetta.protocols.minimization_packing.PackRotamersMover()

# Pass the TF to the packer mover
pack_rotamers.task_factory(tf)

### Side-chain gradient-based minimization

The last mover will be a gradient-based minimization of the placed side chains. This minimization is carried out to move the sidechains to the nearest local energy minimum. The minMover can also receive a MM to define which DOFs will be allowed to minimize. In the MM, we can also define if chi torsions are allowed to move. We will tell the system that only chi values will be allowed to be minimized:

In [None]:
# Create an instance of the MM class
min_mm = pyrosetta.rosetta.core.kinematics.MoveMap()

# Define BB torsion as fixed
min_mm.set_bb(False)

# Defin chi torsions as movable. 
min_mm.set_chi(True)

We now pass this MM to the MinMover:

In [None]:
# Creates an instance of the minMover
minMover = rosetta.protocols.minimization_packing.MinMover()

# Define the change in energy upon which the minimization will be stopped
minMover.tolerance(0.01)

# Pass the MM to the mover
minMover.movemap(min_mm)

### Define the FoldTree

The fold tree (FT) is an advanced topic we have not discussed yet. In summary, it defines the direction that perturbations will propagate into the coordinates of the Pose. You can find an introductory tutorial [here](https://www.rosettacommons.org/demos/latest/tutorials/fold_tree/fold_tree). We discuss it further in the practice session:

In [None]:
# Define a FT instance
fold_tree = FoldTree()

# Add the different edges to the new FT instance
fold_tree.add_edge(52, 1, -1)
fold_tree.add_edge(52, 59, -1)
fold_tree.add_edge(52, 65, 1)
fold_tree.add_edge(65, 60, -1)
fold_tree.add_edge(65, 69, -1)

We pass this FT definition to our Pose:

In [None]:
sh3_peptide.fold_tree(fold_tree)

### Protein design using MC 

We are going to apply the combined mover inside an MC algorithm:

- Small rigid body perturbation
- Random backbone perturbation
- Packing rotamers with design options for the peptide sequence
- Side-chain gradient-based minimization

During the MC, we will store the energies and the best Pose so far explored. Let-s define the MC function:

In [None]:
def monteCarloPeptideDesign(pose, score_function, temperature=0.8):
    
    # Get the current energy of the pose
    E0 = score_function(pose)
    
    # Create a copy of the pose
    clone_pose = Pose()
    clone_pose.assign(pose)
    
    # Apply design moves
    rb_mover.apply(clone_pose)
    small_mover.apply(clone_pose)
    pack_rotamers.apply(clone_pose)
    minMover.apply(clone_pose)
    
    # Evaluate energy of the perturbed pose
    E1 = score_function(clone_pose)
    
    # Calculate the acceptance probability
    P = np.min([1, np.exp(-(E1-E0)/temperature)])
    
    if P >= np.random.uniform(low=0, high=1.0):
        pose.assign(clone_pose)
        
        return 1
        
    return 0

We ran our MC design method for 100 steps only since the pack rotamer mover is very time consuming:

In [None]:
# Store energies into a list
energies = []

n_steps = 100
accepted_steps = 0

# Create pose to store best sampled result
best = Pose()
best.assign(sh3_peptide)
Eb = sfxn(best)

for i in range(n_steps): 
    
    # Apply the mover with insertion accepting with the MC criterion
    accepted = monteCarloPeptideDesign(sh3_peptide, sfxn)
    
    # Send structure to Pymol at each step
    pymol_mover.apply(sh3_peptide)
        
    # Get the pose energy 
    E = sfxn(sh3_peptide)
    energies.append(E)
    
    if accepted:
        
        # Add one to the number of accepted steps
        accepted_steps += 1

        # Save pose if best stored result is lower in energy
        if E < Eb:
            best.assign(sh3_peptide)
            Eb = E
    
print('Accepted fraction %s' % (accepted_steps/n_steps))

We now plot the energies explored by the method:

In [None]:
plt.plot(energies)
plt.title('SH3 peptide design')
plt.xlabel('Step')
plt.ylabel('Energy [kcal/mol]')

In [None]:
print('The energy of the best design is: %s kcal/mol' % Eb)

### Compare designed sequence to the original sequence

Now we are going to compare the best-designed sequence to the original peptide sequence in the PDB:

In [None]:
sh3_acan125 = pose_from_pdb('input/SH3_Acan125.pdb')

In [None]:
# The last 10 residues belong to the peptide sequence
print(best.sequence()[-10:])
print(sh3_acan125.sequence()[-10:])

How do they compare?

Finally, we write down into a file our best-designed structure

In [None]:
rosetta.core.io.pdb.dump_pdb(best, 'best_design.pdb')