<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

# Protein design
Keywords: pose_from_sequence(), random move, scoring move, Metropolis, assign(), Pose(), FastDesign, FastRelax, ResidueSelector, TaskFactory, TaskOperation

## Overview

In these classes we hopefully achieved the following goals:

* Understanding Protein Structures: Before designing a protein, one must understand the current structure. PyRosetta allows for the exploration and manipulation of protein structures, providing insights into how they fold and function.
* Predicting Protein Folding: Proteins are made up of long chains of amino acids that fold into complex three-dimensional structures. PyRosetta can predict how changes in the sequence of amino acids will affect the way a protein folds.
* Designing Proteins: Once you understand the principles of protein folding and structure, you can begin to design new proteins. PyRosetta allows you to make specific changes to amino acid sequences and predict the structural and functional implications of those changes.

We will work with the designing part by playig a bit with the helix structure of the polyA peptide from earlier. In terms of PyRosetta, we will work again with `ResidueSelectors` and `TaskOperations`.

`ResidueSelectors` in PyRosetta are a powerful feature used to select specific residues from a protein structure based on certain criteria. These selectors provide a flexible and efficient way to identify subsets of residues that you might want to focus on for further analysis, manipulation, or scoring. Whether you are interested in a particular chain, residues with certain properties, or residues in a specific spatial region, `ResidueSelectors` offer the tools you need to make these selections and integrate them into your computational protein modeling workflows.

`TaskOperations` in PyRosetta are components used to manipulate and configure the behavior of the Rosetta packing algorithm, which is responsible for optimizing side-chain conformations and, optionally, the backbone of a protein structure. `TaskOperation` allow you to define and customize the behavior of the packing algorithm to suit specific modeling tasks, such as protein design, mutation, or refinement.

We can once again use FastRelax or FastDesign to do design. `FastRelax` and `FastDesign` are both protocols in PyRosetta used for optimizing protein structures, but they serve different purposes and have some key differences in their functionalities and applications.

### FastRelax:

- **Purpose**: `FastRelax` is primarily used for refining and relaxing protein structures. It aims to relieve any steric clashes, optimize side-chain conformations, and improve the overall energy of the protein structure.
- **Process**: It performs a series of minimization steps, gradually decreasing the weight of the repulsive term in the energy function to allow for larger movements initially, followed by finer adjustments.
- **Applications**: Commonly used for refining structures after homology modeling, loop modeling, or other structural perturbations.
- **Flexibility**: While it can make significant adjustments to side-chain conformations and small adjustments to backbone conformations, it does not perform extensive redesign of the protein sequence.

### FastDesign:

- **Purpose**: `FastDesign` extends the capabilities of `FastRelax` by incorporating the ability to redesign the protein sequence in addition to optimizing the structure.
- **Process**: It integrates the packing and design algorithms, allowing for simultaneous optimization of side-chain conformations and amino acid identities. Like `FastRelax`, it also uses a series of minimization steps with decreasing repulsive weight.
- **Applications**: Used for protein design tasks, such as designing new protein-protein interfaces, creating novel protein folds, or improving protein stability.
- **Flexibility**: Provides extensive capabilities for both structural optimization and sequence redesign, making it a powerful tool for protein engineering.

### Main Differences:

1. **Sequence Design**: The most significant difference between `FastRelax` and `FastDesign` is the ability of `FastDesign` to redesign the protein sequence. `FastRelax` focuses solely on optimizing the structure.
2. **Applications**: `FastRelax` is typically used for structure refinement, while `FastDesign` is used for more extensive protein design and engineering tasks.
3. **Flexibility**: `FastDesign` offers more flexibility and capabilities for manipulating both the structure and sequence of a protein.

## Basic $\alpha$-helix redesign

In [1]:

!pip install pyrosettacolabsetup
import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()
from pyrosetta import *
from pyrosetta.rosetta import *
from pyrosetta.teaching import *
import pyrosetta.toolbox
#pyrosetta.init()

pyrosetta.init("-ex1 -ex2aro")
!pip install py3Dmol
import py3Dmol


Collecting pyrosettacolabsetup
  Downloading pyrosettacolabsetup-1.0.9-py3-none-any.whl.metadata (294 bytes)
Downloading pyrosettacolabsetup-1.0.9-py3-none-any.whl (4.9 kB)
Installing collected packages: pyrosettacolabsetup
Successfully installed pyrosettacolabsetup-1.0.9
Mounted at /content/google_drive

Note that USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE.
See https://github.com/RosettaCommons/rosetta/blob/main/LICENSE.md or email license@uw.edu for details.

Looking for compatible PyRosetta wheel file at google-drive/PyRosetta/colab.bin//wheels...
Found compatible wheel: /content/google_drive/MyDrive/PyRosetta/colab.bin/wheels//content/google_drive/MyDrive/PyRosetta/colab.bin/wheels/pyrosetta-2024.19+release.a34b73c40f-cp310-cp310-linux_x86_64.whl


┌──────────────────────────────────────────────────────────────────────────────┐
│                                 PyRosetta-4                                  │
│              Created in JHU by Sergey Lyskov 

In [None]:
import os
import logging
logging.basicConfig(level=logging.INFO)
from pyrosetta import *
from pyrosetta.rosetta import *
from pyrosetta.teaching import *
import pyrosetta.toolbox
pyrosetta.init("-ex1 -ex2aro")

Initialize pyrosetta.
- The `-ex1` option in PyRosetta is used to increase the accuracy of side-chain modeling by enabling additional rotamer sampling for chi1 angles of amino acid side chains during simulations.
- The `ex2aro`is used to enhance the sampling of aromatic side chains during protein structure prediction and design simulations. When the -ex2aro option is enabled, PyRosetta performs additional sampling of the chi2 dihedral angles for aromatic residues. This results in a more thorough exploration of the conformational space of these residues, which can lead to more accurate and reliable modeling of protein structures at the cost of additional computational resources.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Function to visualise poses

In [None]:
import tempfile


# Function to display the pose
def display_protein(pose):
    with tempfile.NamedTemporaryFile(suffix='.pdb') as tmp:
        pose.dump_pdb(tmp.name) #Indented this line
        with open(tmp.name, 'r') as pdb_file: #Indented this line
            pdb_string = pdb_file.read() #Indented this line
    viewer = py3Dmol.view(width=800, height=600)
    viewer.addModel(pdb_string, 'pdb')
    # viewer.setStyle({'cartoon': {'color': 'spectrum'}})
    viewer.setStyle({'stick': {'colorscheme': 'greenCarbon'}})
    viewer.zoomTo()
    viewer.show()



# Save the pose to a temporary PDB file and read it as a string
#with tempfile.NamedTemporaryFile(suffix='.pdb') as tmp:
#    polyA.dump_pdb(tmp.name)
 #   with open(tmp.name, 'r') as pdb_file:
 #       pdb_string = pdb_file.read()



### Building the pose
We start from the folded polyA peptide stored in a variable called "start_pose"

In [None]:
#start_pose = pyrosetta.pose_from_sequence('A' * 10)
# Better to start from the folded peptide we obtained earlier

scorefxn = pyrosetta.create_score_function("ref2015_cart.wts")
start_pose = pose_from_pdb("/content/google_drive/MyDrive/BIP_24-25/5-Protein_design/mc_final.pdb") # Call the function correctly
pose = start_pose.clone()
pose = start_pose.clone()


core.import_pose.import_pose: File '/content/google_drive/MyDrive/BIP_24-25/5-Protein_design/mc_final.pdb' automatically determined to be of type PDB


In [None]:
# measure energy level
scorefxn(pose)

core.energy_methods.CartesianBondedEnergy: Creating new peptide-bonded energy container (10)
basic.io.database: Database file opened: scoring/score_functions/elec_cp_reps.dat
core.scoring.elec.util: Read 40 countpair representative atoms
core.pack.dunbrack.RotamerLibrary: shapovalov_lib_fixes_enable option is true.
core.pack.dunbrack.RotamerLibrary: shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
core.pack.dunbrack.RotamerLibrary: Binary rotamer library selected: /usr/local/lib/python3.10/dist-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/usr/local/lib/python3.10/dist-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.
core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.445813 seconds to load from binary


-5.371127246023434

We will use the function above to inspect the initial conformation of the polyA peptide.

In [None]:
display_protein(pose)

### Insert mutations

Define the mutations you want to introduce. We need to write them in a resfile.

In PyRosetta, a resfile is used to specify which residues should be mutated and to what amino acid they should be changed. The resfile format allows for a lot of flexibility and control over the design process.

Here is an example of a resfile that would mutate positions 3 and 7 of a 10 amino acid polyalanine peptide. In this example, we will mutate position 3 to aspartate (D) and position 7 to leucine (L).

```
NATRO
start

3 A PIKAA D
7 A PIKAA L
```

* NATRO: This command specifies that all residues not mentioned explicitly should not be repacked or mutated (they remain "naturally" as they are).
* start: This indicates the beginning of the residue-specific commands.
* 3 A PIKAA D: This line tells PyRosetta to mutate residue number 3 in chain A to aspartate. PIKAA stands for "point mutation using the indicated amino acid."
* 7 A PIKAA L: This line tells PyRosetta to mutate residue number 7 in chain A to leucine.

Check out the design proposed in the `design.resfile`.

In [None]:
resfile = "/content/google_drive/MyDrive/BIP_24-25/5-Protein_design/design.resfile"

Now we can setup the TaskOperations for the `FastRelax` mover. These tell `FastRelax` which residues to design or repack during the packer steps in `FastRelax`.


In PyRosetta, **TaskOperations** and **FastRelax** are crucial for controlling how the protein structure is modified, especially during tasks like structure optimization and repacking.



**TaskOperations** define specific rules and operations that guide how residues in a protein are manipulated during structure refinement, repacking, and design processes. They specify which residues are allowed to be mutated, which rotamers to sample, and how packing should proceed. In your code:

- `TaskFactory`: This object collects all the **TaskOperations**. It's like a manager that bundles all the operations you want to apply to your structure.
  
- `InitializeFromCommandline()`: This operation sets up the packing task based on any command-line options provided during PyRosetta initialization. It helps initialize the task with default settings.

- `IncludeCurrent()`: This operation ensures that the current rotamers (the conformations of side chains) of residues are included in the packing process. It ensures that the current structure is among the options when PyRosetta evaluates side-chain conformations.

- `ReadResfile(resfile)`: This reads a **resfile**, a text file that specifies custom rules about which residues should be repacked or redesigned. You can use it to fine-tune how specific regions of the protein are handled.

By using these **TaskOperations**, you can define which residues in your protein should be flexible, which ones should remain fixed, or even guide specific mutation and repacking strategies. The `packer_task` created from this allows you to inspect the task and see how it's being applied to your protein (`pose`).



**FastRelax** is a powerful mover in PyRosetta designed for fast and efficient protein structure refinement. It combines multiple cycles of energy minimization and side-chain repacking to produce low-energy protein models. It’s a faster variant of the standard **Relax** protocol, but it still provides high-quality results.

- **Minimization**: FastRelax performs gradient-based energy minimization, adjusting the protein's backbone and side chains to lower the overall energy.
  
- **Repacking**: During FastRelax, side chains are repacked to explore different rotamer conformations. This is where **TaskOperations** come into play: they control which residues get repacked, how they are sampled, and what restrictions (e.g., from a resfile) should be applied.

FastRelax typically works in several cycles, alternating between repacking and minimizing to drive the protein structure to a local energy minimum.

In the code below, the **TaskFactory** you set up defines the rules for how the residues will be repacked during the **FastRelax** protocol, ensuring that the current rotamers are considered and that any custom residue rules from the resfile are respected.


The **PackerTask** in PyRosetta defines the specific set of operations and rules used during **repacking** or **design** of side chains in a protein structure. It's essentially a blueprint that controls how the side chains (rotamers) of residues in a protein will be sampled, repacked, or redesigned.

##### Key aspects of a **PackerTask**:

1. **Repacking**: The PackerTask dictates which residues' side chains can be repacked, meaning their rotamer conformations are optimized while keeping the amino acid identity fixed. This is useful for refining the structure without changing its overall sequence.

2. **Design**: In addition to repacking, a PackerTask can also be used to redesign certain residues, meaning the side chain can change to a different amino acid entirely. The rules for this can be specified by the resfile or other task operations.

3. **TaskOperations**: A PackerTask is generated by applying **TaskOperations**, which control:
   - Which residues are allowed to be modified (packed, repacked, or designed).
   - Whether the current side chain conformation is included in the repacking process.
   - Any restrictions on amino acid substitutions or packing regions.

4. **Residue-level control**: For each residue in the protein, the PackerTask can specify whether it is allowed to move, which rotamers can be sampled, and whether the side chain should be repacked, designed, or remain fixed.


In summary, the **PackerTask** is a detailed set of instructions that defines how side chains are handled during repacking or design, ensuring that the specified residues follow the rules set by the task operations and resfile during processes like FastRelax.


In [None]:
# The task factory accepts all the task operations
tf = pyrosetta.rosetta.core.pack.task.TaskFactory()

# These are pretty standard
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.InitializeFromCommandline())
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.IncludeCurrent())


# Include the resfile
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.ReadResfile(resfile))

# Convert the task factory into a PackerTask to take a look at it
packer_task = tf.create_task_and_apply_taskoperations(pose)
# View the PackerTask
print(packer_task)

core.pack.task: Packer task: initialize from command line()
#Packer_Task

Threads to request: ALL AVAILABLE

resid	pack?	design?	allowed_aas
1	TRUE	FALSE	ALA:NtermProteinFull
2	TRUE	FALSE	ALA
3	TRUE	FALSE	ALA
4	TRUE	FALSE	ALA
5	TRUE	TRUE	PRO
6	TRUE	FALSE	ALA
7	TRUE	TRUE	LYS
8	TRUE	FALSE	ALA
9	TRUE	FALSE	ALA
10	TRUE	FALSE	ALA:CtermProteinFull



 The PackerTask looks as intended!

 Now we can set up a `MoveMap` to specify which torsions are free to minimize during the minimization steps of the `FastDesign` mover.

 A `MoveMap` in PyRosetta is a data structure that defines which degrees of freedom (backbone, side chains, or rigid-body jumps) in a protein structure can be moved or optimized during refinement or minimization processes. It's used to control which parts of the protein are flexible and which are fixed during various protocols such as **FastRelax**, **Minimization**, and **CartesianMinimization**.

##### Components of a **MoveMap**:

1. **Backbone flexibility (bb)**: The protein's backbone consists of the φ (phi) and ψ (psi) torsion angles for each residue. Setting the backbone to be flexible means these torsion angles can change during the optimization.
   
2. **Side-chain flexibility (chi)**: The side-chain dihedral angles (χ angles) can also be adjusted. Allowing chi flexibility means that side-chain conformations (rotamers) can be optimized during refinement.

3. **Jump flexibility (jump)**: Proteins or complexes that contain multiple domains or chains connected by rigid-body jumps (rigid-body transformations between two parts of a structure) can have their relative orientations changed if jumps are flexible.

### Breakdown of the code:

```python
mm = pyrosetta.rosetta.core.kinematics.MoveMap()
mm.set_bb(True)   # Allow backbone torsions (phi, psi) to move
mm.set_chi(True)  # Allow side-chain torsions (chi) to move
mm.set_jump(True) # Allow rigid-body jumps to move
```

- `mm.set_bb(True)`: This command allows the **backbone** torsions of the protein to move during optimization. If set to `False`, the backbone remains fixed.
  
- `mm.set_chi(True)`: This enables movement in the **side-chain dihedral angles**. If set to `False`, the side chains remain fixed.

- `mm.set_jump(True)`: This allows **rigid-body jumps** (which define the relative positioning of chains or domains) to be altered.

##### How the **MoveMap** is used:


In the code below, you are allowing **all parts of the protein**—the backbone, side chains, and jumps—to move, making the entire structure flexible during whatever optimization protocol you're running.


In [None]:
mm = pyrosetta.rosetta.core.kinematics.MoveMap()
mm.set_bb(True)
mm.set_chi(True)
mm.set_jump(True)

Set up `FastDesign`

In [None]:
rel_design = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn_in=scorefxn, standard_repeats=1, script_file="MonomerDesign2019")
rel_design.cartesian(True)
rel_design.set_task_factory(tf)
rel_design.set_movemap(mm)
rel_design.minimize_bond_angles(True)
rel_design.minimize_bond_lengths(True)

 Run the `FastDesign` mover. This is fast

In [None]:
d3l7_pose = pose.clone()  # Create a copy of the original pose
%time rel_design.apply(d3l7_pose)

core.energy_methods.CartesianBondedEnergy: Creating new peptide-bonded energy container (10)
protocols.relax.FastRelax: CMD: repeat  -5.37113  0  0  0.55
protocols.relax.FastRelax: CMD: coord_cst_weight  -5.37113  0  0  0.55
protocols.relax.FastRelax: CMD: scale:fa_rep  -7.87213  0  0  0.03245
core.pack.task: Packer task: initialize from command line()
core.pack.pack_rotamers: built 30 rotamers at 10 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating PDInteractionGraph
protocols.relax.FastRelax: CMD: repack  5.69158  0  0  0.03245
protocols.relax.FastRelax: CMD: scale:fa_rep  12.7067  0  0  0.0506
protocols.relax.FastRelax: CMD: min  -13.5998  0.447376  0.447376  0.0506
protocols.relax.FastRelax: CMD: coord_cst_weight  -13.5998  0.447376  0.447376  0.0506
protocols.relax.FastRelax: CMD: scale:fa_rep  -10.351  0.447376  0.447376  0.154
core.pack.task: Packer task: initialize from command line()
core.pack.pack_rotamers: built 33 rotamers at 10 positions.
core

We can see the result of the design.

In [None]:
display_protein(d3l7_pose)

Since the peptide was already folded, we don't see major changes in its conformation. However, we can compare the relative stability of the design by measuring the Rosetta energy before and after design.


In [None]:
before_design = scorefxn(pose)
print(before_design)
energy_design = scorefxn(d3l7_pose)
print(energy_design)

core.energy_methods.CartesianBondedEnergy: Creating new peptide-bonded energy container (10)
-5.371127246023434
-7.343814043773602


**Exercise:** Apparently, we have stabilised the helix with our design. How is this possible?

To properly measure the energy level of the no-design peptide, we need to subject it to the same energy minimization.


In [None]:
# new design resfile
nodesign_resfile = "/content/google_drive/MyDrive/BIP_24-25/5-Protein_design/nodesign.resfile"

#reset pose to the initial conformation
pose = start_pose.clone()


# The task factory accepts all the task operations
tf = pyrosetta.rosetta.core.pack.task.TaskFactory()

# These are pretty standard
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.InitializeFromCommandline())
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.IncludeCurrent())


# Include the resfile
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.ReadResfile(nodesign_resfile))

# Convert the task factory into a PackerTask to take a look at it
packer_task = tf.create_task_and_apply_taskoperations(pose)
# View the PackerTask
print(packer_task)


mm = pyrosetta.rosetta.core.kinematics.MoveMap()
mm.set_bb(True)
mm.set_chi(True)
mm.set_jump(True)

rel_design = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn_in=scorefxn, standard_repeats=1, script_file="MonomerDesign2019")
rel_design.cartesian(True)
rel_design.set_task_factory(tf)
rel_design.set_movemap(mm)
rel_design.minimize_bond_angles(True)
rel_design.minimize_bond_lengths(True)

%time rel_design.apply(pose)



core.pack.task: Packer task: initialize from command line()
#Packer_Task

Threads to request: ALL AVAILABLE

resid	pack?	design?	allowed_aas
1	TRUE	FALSE	ALA:NtermProteinFull
2	TRUE	FALSE	ALA
3	TRUE	FALSE	ALA
4	TRUE	FALSE	ALA
5	TRUE	FALSE	ALA
6	TRUE	FALSE	ALA
7	TRUE	FALSE	ALA
8	TRUE	FALSE	ALA
9	TRUE	FALSE	ALA
10	TRUE	FALSE	ALA:CtermProteinFull

core.energy_methods.CartesianBondedEnergy: Creating new peptide-bonded energy container (10)
protocols.relax.FastRelax: CMD: repeat  -5.37113  0  0  0.55
protocols.relax.FastRelax: CMD: coord_cst_weight  -5.37113  0  0  0.55
protocols.relax.FastRelax: CMD: scale:fa_rep  -7.87213  0  0  0.03245
core.pack.task: Packer task: initialize from command line()
core.pack.pack_rotamers: built 10 rotamers at 10 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
protocols.relax.FastRelax: CMD: repack  -7.87213  0  0  0.03245
protocols.relax.FastRelax: CMD: scale:fa_rep  -7.78442  0  0  0.0506
protocols.re

In [None]:
energy_nodesign = scorefxn(pose)
print(energy_nodesign)

-11.688240255929747


In [None]:
print("deltaE =", energy_design - energy_nodesign)

deltaE = 4.344426212156145


**Exercise. Mess with the polyA peptide. Explore the mutations that destabilize the $\alpha$-helix.**