Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [1]:
NAME = "Damir"
COLLABORATORS = ""

---

# Mover Lab
In this lab, you will learn to use Movers to manipulate poses. 

In [2]:
# This takes ~1.5 minutes each time you start a new notebook or do a "factory reset".
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install() #Instead of pyrosettacolabsetup.setup
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")



Drive already mounted at /content/google_drive; to attempt to forcibly remount, call drive.mount("/content/google_drive", force_remount=True).
Notebook is set for PyRosetta use in Colab.  Have fun!


In [3]:
%cd google_drive/My\ Drive/temp_pyrbc_202103_notebooks/

/content/google_drive/My Drive/temp_pyrbc_202103_notebooks


In [4]:
import pyrosetta

pyrosetta.init()

PyRosetta-4 2021 [Rosetta PyRosetta4.MinSizeRel.python37.ubuntu 2021.21+release.882e5c1ab85c8c251fce4eb3e1e0504af590786a 2021-05-26T14:40:53] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Rosetta version: PyRosetta4.MinSizeRel.python37.ubuntu r284 2021.21+release.882e5c1ab85 882e5c1ab85c8c251fce4eb3e1e0504af590786a http://www.pyrosetta.org 2021-05-26T14:40:53
core.init: command: PyRosetta -ex1 -ex2aro -database /content/google_drive/MyDrive/prefix/lib/python3.7/site-packages/pyrosetta/database
basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/urandom', seed=2037310180 seed_offset=0 real_seed=2037310180
basic.random.init_random_generator: RandomGenerator:init: Normal mode, seed=2037310180 RG_type=mt19937


Let's load the structure 2WPT, which is a complex of protein Im2 and colicin E9 DNase. Researchers have introduced various mutations to the interface to study the changes of binding free energy.

In [5]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('inputs/2wpt.pdb')

core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 984 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 2.62624 seconds.
core.import_pose.import_pose: File 'inputs/2wpt.pdb' automatically determined to be of type PDB
core.chemical.GlobalResidueTypeSet: Loading (but possibly not actually using) 'GOL' from the PDB components dictionary for residue type 'pdb_GOL'
core.chemical.GlobalResidueTypeSet: Loading (but possibly not actually using) 'NO3' from the PDB components dictionary for residue type 'pdb_NO3'


Open a PyMol window. Initialize a PyMol mover and let it send the pose to a PyMol session. As its name suggested, the PyMOLMover is a mover because it is derived from the Mover class. However, it is a special one, since it does not change the pose, but send the pose to PyMol. In PyMol, if you color the structure by chains, you can see that there are two proteins.

In [6]:
import pyrosetta.network
 
# make sure to supply unique secret here and use _the same_ secrete when running PyMOL-Rosetta-relay-client.python3.py in PyMOL on your desktop machine
pyrosetta.network.start_udp_to_tcp_bridge_daemon(secret='none')

pmm = pyrosetta.PyMOLMover()
pmm.apply(pose)

UDP server started: localhost:65000...


## Backbone movers
Let's try to modify the protein backbone. The simplest way to sample backbone conformations is introducing random perturbations. The SmallMover makes small independent random perturbations of the phi and psi torsion angles of random residues. It uses the rama score to ensure that only favorable backbone torsion angles are being selected. Let's initialize a SmallMover and let it introduce 10 random perturbations.

In [7]:
small_mover = pyrosetta.rosetta.protocols.simple_moves.SmallMover()
small_mover.nmoves(10)
small_mover.apply(pose)

pmm.pymol_name('small_moved')
pmm.apply(pose)

core.scoring.ramachandran: shapovalov_lib::shap_rama_smooth_level of 4( aka highest_smooth ) got activated.
Connected to relay.graylab.jhu.edu 128.220.208.35:9989...
basic.io.database: Database file opened: scoring/score_functions/rama/shapovalov/kappa25/all.ramaProb


In PyMol, compare the structures before and after perturbation. Do you find anything weird? Yes, the C-terminus changes much more than the N-terminus. This is called the lever effect in backbone sampling. The change at a residue will propagate to all its downstream residues. Because of the lever-arm effect, backbone perturbations are not local and bad contacts can be easily introduced.

The ShearMover deals with the lever effect. Instead of independently sampling backbone torsions, it changes torsions of two consecutive residues together in a way that the downstream lever effect is reduced. Let's import a fresh pose, initialize a ShearMover and let it introduce 100 perturbations.

In [8]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('inputs/2wpt.pdb')
shear_mover = pyrosetta.rosetta.protocols.simple_moves.ShearMover()
shear_mover.nmoves(100)
shear_mover.apply(pose)

pmm.pymol_name('shear_moved')
pmm.apply(pose)

core.import_pose.import_pose: File 'inputs/2wpt.pdb' automatically determined to be of type PDB


Now you should see that the lever-arm effect is reduced, but not completely gone. 

"Backrub" is one method to realize true local sampling. The trade off is that backbone bond angles are changed slightly. Initialize a BackrubMover and apply 100 times.

In [9]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('inputs/2wpt.pdb')
br_mover = pyrosetta.rosetta.protocols.backrub.BackrubMover()
for i in range(100):
    br_mover.apply(pose)

pmm.pymol_name('backrub_moved')
pmm.apply(pose)

core.import_pose.import_pose: File 'inputs/2wpt.pdb' automatically determined to be of type PDB
core.mm.MMBondAngleLibrary: MM bond angle sets added fully assigned: 603; wildcard: 0 and 1 virtual parameter.
basic.io.database: Database file opened: sampling/branch_angle/branch_angle_1.txt
basic.io.database: Database file opened: sampling/branch_angle/branch_angle_2.txt
protocols.backrub.BackrubMover: Segment lengths: 3-34 atoms
protocols.backrub.BackrubMover: Main chain pivot atoms: CA
protocols.backrub.BackrubMover: Adding backrub segments for residues 1-200
protocols.backrub.BackrubMover: Total Segments Added: 1778


Now you can see that the perturbations are evenly distributed throughout the structure.

## Mutate residues
Protein designers constantly explore conformation and sequence spaces of proteins. You already learned methods to sample the backbone conformation space, now it's time to consider introducing mutations.

A previous study showed that the N34V R38T mutations on chain A lowers binding free energy by -2.60 kcal/mol. Let's introduce these two mutations to our structure. Again, import a fresh pose.

In [10]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('inputs/2wpt.pdb')

core.import_pose.import_pose: File 'inputs/2wpt.pdb' automatically determined to be of type PDB


In Rosetta, residues in a pose are numbered from 1 to N which is the total number of residues. This indexing system is different from what you see from a PDB file. For example, the first lysine in our structure has Rosetta index 1 but its pdb index is A4. In order to introduce mutations, we need to first figure out the Rosetta indices of the residues of our interest. As we have done before, we will turn to the PDBInfo object attached to a pose.

In [11]:
print(pose.pdb_info().pdb2pose('A', 34))
print(pose.pdb_info().pdb2pose('A', 38))

31
35


Use the MutateResidue mover to introduce mutations N34V R38T.

In [12]:
mutater = pyrosetta.rosetta.protocols.simple_moves.MutateResidue()

mutater.set_target(31)
mutater.set_res_name('VAL')
mutater.apply(pose)

mutater.set_target(35)
mutater.set_res_name('THR')
mutater.apply(pose)

pmm.pymol_name('mutated')
pmm.apply(pose)

Now you should be able to see these mutations in PyMol. Now you learned movers that can help you expore the backbone and sequence spaces. You may have realized that the side chain conformations, which are very important, are not sampled. Side chain sampling will be covered in later labs.

## Exercises
1. Use the functions you learned from the previous lecture to score the poses before and after mutation. What is the change of the score value? Does it match the experimentally measured -2.60 kcal/mol? What score terms change significantly? What 10 residues' scores change the most? Do their changes make sense?

(Hint: it is possible to take the difference between two `EMapVector`s, but currently the functionality is half broken. If you have `emap1` and `emap2`, you can calculate the difference as follows:
```
diff_emap = EnergyMap(emap1)
temp_emap = diff_emap # create a reference to the same object
temp_emap -= emap2
print(temp_emap) # temp_emap is now None. This is the "half broken" part
print(diff_emap) # diff_emap has been modified
```

As of 5/27/2019, I have fixed this code in Rosetta's C++ version, but the fix will not make it out to your pyrosetta download in time for the 6/6/2019 code school.)

2. Redo the mutagenesis and ddG calculation on backbone perturbed structures. How much do the results change? Why?

3. Generate a backbone ensemble made of 20 structures with your favorate backbone sampling method. Redo the mutagenesis and ddG calculation on each structure and take the mean/meadian/mimimal score. How much do the results change? Why?

4. The above ddG analysis is very crude an inaccurate. What improvements should be introduced to make it better?

In [13]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('inputs/2wpt.pdb')

from pyrosetta.teaching import *
sfxn = get_score_function(True)

sfxn(pose)

core.import_pose.import_pose: File 'inputs/2wpt.pdb' automatically determined to be of type PDB
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_

78.00506039085164

In [14]:
mutater = pyrosetta.rosetta.protocols.simple_moves.MutateResidue()

mutater.set_target(31)
mutater.set_res_name('VAL')
mutater.apply(pose)

mutater.set_target(35)
mutater.set_res_name('THR')
mutater.apply(pose)

sfxn(pose)

102.23306806552328

2. Redo the mutagenesis and ddG calculation on backbone perturbed structures. How much do the results change? Why?

In [15]:
pose = pyrosetta.rosetta.core.import_pose.pose_from_file('inputs/2wpt.pdb')
br_mover = pyrosetta.rosetta.protocols.backrub.BackrubMover()
for i in range(100):
    br_mover.apply(pose)

mutater = pyrosetta.rosetta.protocols.simple_moves.MutateResidue()

mutater.set_target(31)
mutater.set_res_name('VAL')
mutater.apply(pose)

mutater.set_target(35)
mutater.set_res_name('THR')
mutater.apply(pose)

sfxn.show(pose)

core.import_pose.import_pose: File 'inputs/2wpt.pdb' automatically determined to be of type PDB
basic.io.database: Database file opened: sampling/branch_angle/branch_angle_1.txt
basic.io.database: Database file opened: sampling/branch_angle/branch_angle_2.txt
protocols.backrub.BackrubMover: Segment lengths: 3-34 atoms
protocols.backrub.BackrubMover: Main chain pivot atoms: CA
protocols.backrub.BackrubMover: Adding backrub segments for residues 1-200
protocols.backrub.BackrubMover: Total Segments Added: 1778
core.scoring.ScoreFunction: 
------------------------------------------------------------
 Scores                       Weight   Raw Score Wghtd.Score
------------------------------------------------------------
 fa_atr                       1.000   -1223.938   -1223.938
 fa_rep                       0.550   20711.198   11391.159
 fa_sol                       1.000     906.542     906.542
 fa_intra_rep                 0.005     466.694       2.333
 fa_intra_sol_xover4          1.000

3. Generate a backbone ensemble made of 20 structures with your favorate backbone sampling method. Redo the mutagenesis and ddG calculation on each structure and take the mean/meadian/mimimal score. How much do the results change? Why?

In [18]:
from statistics import mean, median

pose = pyrosetta.rosetta.core.import_pose.pose_from_file('inputs/2wpt.pdb')
score_pose = sfxn(pose)

data = []
data2 = []

for i in range(20):
  new_pose = pyrosetta.Pose();
  new_pose.assign(pose);
  br_mover = pyrosetta.rosetta.protocols.backrub.BackrubMover();
  for i in range(100):
      br_mover.apply(new_pose);

  mutater = pyrosetta.rosetta.protocols.simple_moves.MutateResidue();

  mutater.set_target(31);
  mutater.set_res_name('VAL');
  mutater.apply(new_pose);

  mutater.set_target(35);
  mutater.set_res_name('THR');
  mutater.apply(new_pose);

  ddG = score_pose - sfxn(new_pose)

  data.append(sfxn(new_pose))
  data2.append(ddG)

print("MEAN | MEDIAN | MIN")
print(mean(data), median(data), min(data))
print(mean(data2), median(data2), min(data2))

core.import_pose.import_pose: File 'inputs/2wpt.pdb' automatically determined to be of type PDB
basic.io.database: Database file opened: sampling/branch_angle/branch_angle_1.txt
basic.io.database: Database file opened: sampling/branch_angle/branch_angle_2.txt
protocols.backrub.BackrubMover: Segment lengths: 3-34 atoms
protocols.backrub.BackrubMover: Main chain pivot atoms: CA
protocols.backrub.BackrubMover: Adding backrub segments for residues 1-200
protocols.backrub.BackrubMover: Total Segments Added: 1778
basic.io.database: Database file opened: sampling/branch_angle/branch_angle_1.txt
basic.io.database: Database file opened: sampling/branch_angle/branch_angle_2.txt
protocols.backrub.BackrubMover: Segment lengths: 3-34 atoms
protocols.backrub.BackrubMover: Main chain pivot atoms: CA
protocols.backrub.BackrubMover: Adding backrub segments for residues 1-200
protocols.backrub.BackrubMover: Total Segments Added: 1778
basic.io.database: Database file opened: sampling/branch_angle/branch_

4. The above ddG analysis is very crude an inaccurate. What improvements should be introduced to make it better?

Since we mutate residues directly, and do not let the rest of the protein structure adjust to the mutation, our ddG calculation is not fully-representative of the energy function.

It could be improved by letting the rest of the protein adjust to the mutation.