# **References** 

[1] Le, K.; Adolf-Bryfogle, J.; Klima, J.; Lyskov, S.; Labonte, J.; Bertolani, S.; Roy Burman, S.; Leaver-Fay, A.; Weitzner, B.; Maguire, J.; Rangan, R.; Adrianowycz, M.; Alford, R.; Adal, A.; Nance, M.; Das, R.; Dunbrack, R.; Schief, W.; Kuhlman, B.; Siegel, J.; Gray, J. PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design. Preprints 2020, 2020020097 (doi: 10.20944/preprints202002.0097.v1). 

[2] https://rosettacommons.github.io/PyRosetta.notebooks/

# **Installing necessary python libraries**

In [1]:
!pip install pyrosettacolabsetup
!pip install py3Dmol
!pip install nglview

import sys
sys.path.append("/usr/local/lib/python3.9/site-packages")

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyrosettacolabsetup
  Downloading pyrosettacolabsetup-1.0.6-py3-none-any.whl (4.7 kB)
Installing collected packages: pyrosettacolabsetup
Successfully installed pyrosettacolabsetup-1.0.6
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting py3Dmol
  Downloading py3Dmol-1.8.1-py2.py3-none-any.whl (6.5 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-1.8.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting nglview
  Downloading nglview-3.0.3.tar.gz (5.7 MB)
[K     |████████████████████████████████| 5.7 MB 5.0 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting jedi>=0.10
  Downloading jedi-0.18.1-py2.py3-none-a

# **Import pyrosetta**

pyrosetta.init() is very important and needs to be run before using any of the pyrosetta functions. Don't forget this step! 

In [3]:
import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()
import pyrosetta; pyrosetta.init()
from pyrosetta import *
init()

Mounted at /content/google_drive
Looking for compatible PyRosetta wheel file at google-drive/PyRosetta/colab.bin/wheels...
Found compatible wheel: /content/google_drive/MyDrive/PyRosetta/colab.bin/wheels//content/google_drive/MyDrive/PyRosetta/colab.bin/wheels/pyrosetta-2022.34+release.9ec33c9fe00-cp37-cp37m-linux_x86_64.whl
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


PyRosetta-4 2022 [Rosetta PyRosetta4.MinSizeRel.python37.ubuntu 2022.34+release.9ec33c9fe00427e5b1b0393e84cd8062c07f104c 2022-08-24T09:49:27] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Rosetta version: PyRosetta4.MinSizeRel.python37.ubuntu r328 2022.34+release.9ec33c9fe00 9ec33c9fe00427e5b1b0393e84cd8062c07f104c http://www.pyrosetta.org 2022-08-24T09:49:27
core.init: command: PyRosetta -ex1 -

# Loading a PDB into Pose

Protein structure/PDB contains a lot of information you can extract at various scales like sequence information/residue information/side chain information/geometry information like atomic positions etc. Pose is a data structure that contains all these properties that you can extract with different function calls. Let's start by loading a PDB file and initializing a *pose* object

In [None]:
from pyrosetta.toolbox.rcsb import pose_from_rcsb
pose = pose_from_file("7l0j.pdb1")
# pose_pdb = pose_from_rcsb("6m0j")

Get total number of chains in the pdb


In [None]:
print(pose)

PDB file name: 7l0j.pdb
Total residues: 213
Sequence: DGPCALRELSVDLRAERSVLIPETYQANNCQGVCGWPQSDRNPRYGNHVVLLLKMQARGAALARPPCCVPTAYAGKLLISLSEERISAHHVPNMVATECGCRHMPPNRRTCVFFEAPGVRGSTKTLGELLDTGTELPRAIRCLYSRCCFGIWNLTQDRAQVEMQGCRDSDEPGCESLHCDPSPRAHPSPGSTLFTCSCGTDFCNANYSHLPZZ
Fold tree:
FOLD_TREE  EDGE 1 102 -1  EDGE 1 103 1  EDGE 103 211 -1  EDGE 1 212 2  EDGE 1 213 3 


Get information about what a function does. you can also go to https://www.pyrosetta.org/documentation/

In [None]:
help(pose.get_hbonds())

Help on HBondSet in module pyrosetta.rosetta.core.scoring.hbonds object:

class HBondSet(pyrosetta.rosetta.basic.datacache.CacheableData)
 |  A class that holds Hbond objects and helps setup Hbonds for scoring
 |  
 |  
 |  For general hydrogen bond information, either use the default or option constructor,
 |  then use the fill methods in hbonds.hh OR use the convenience constructors to detect all Hbonds.
 |  Use the copy constructors to fill HBondSets with the Hydrogen bonds you are interested in.
 |  
 |  Method resolution order:
 |      HBondSet
 |      pyrosetta.rosetta.basic.datacache.CacheableData
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(...)
 |      __init__(*args, **kwargs)
 |      Overloaded function.
 |      
 |      1. __init__(self: pyrosetta.rosetta.core.scoring.hbonds.HBondSet) -> None
 |      
 |      2. __init__(self: pyrosetta.rosetta.core.scoring.hbonds.HBondSet, nres: int) -> None
 |      
 | 

#**Protein geometry**

- Get position of the C-alpha carbon of any residue in the protein 

In [None]:
CA_position = pose.residue(21).atom("CA").xyz()
print(CA_position)

      13.58100000000000      -22.03800000000000       10.98500000000000


You can also load a protein sequence into a pose 

In [None]:
sequence_pose = pose_from_sequence("SAGATAADGPCALRELSVDLRAERSVLIPETYQANNCQGVCGWPQSDRNPRYGNHVVLLLKMQARGAALARPPCCVPTAYAGKLLISLSEERISAHHVPNMVATECGCR")

If the pose is loaded from a sequence, it won't have any geometric information

In [None]:
CA_position = sequence_pose.residue(21).atom("CA").xyz()
print(CA_position)

      63.47821915334983       38.22963513528444   6.424264384584369E-15


# **Numbering in Pose vs PDB**

You will notice a difference in the numbering of residues in the pdb and in pose. 

In [None]:
#Residue Objects - We can select residues of a specific pose with
residue_21 = pose.residue(21)
print(residue_21.name())
print(residue_21.is_aromatic())

ILE
False


In [None]:
print(pose.pdb_info().chain(21))
print(pose.pdb_info().number(21))

A
479


You can get the index of a residue in the pdb vs the index of a residue in the pose using the following functions

In [None]:
print(pose.pdb_info().pose2pdb(21))

479 A 


In [None]:
print(pose.pdb_info().pdb2pose('A',479))

21


Phi/Psi angles

In [None]:
print(pose.phi(21))
print(pose.psi(21))
#pose.chi(1,3)

-143.77929710436814
131.58317355644212


In [None]:
residue_21.atom_is_backbone(residue_21.atom_index("CA"))

True

Calculate the bond length between nitrogen and alpha_carbon of residue 21

In [None]:

nitrogen_residue_21 = AtomID(residue_21.atom_index("N"), 21)
alpha_carbon = AtomID(residue_21.atom_index("CA"), 21)
pose.conformation().bond_length(nitrogen_residue_21, alpha_carbon)

1.4576978424899996

In [None]:
residue_21.atom_index("CA")

2

**Exercises**

- Make a list of other information that you can get from the residue object
- Pick 4 residues at the interface of the complex and list out their properties 
- set psi,phi angles to something random of residue 20,21,24 and visualize the protein in pymol after. example - pose.set_psi(20,45) Residue 20, 45 degrees.

# **Clean the Protein**

PDB files sometimes contain other information like small molecules/dna etc. If you want to only keep protein information you can clean the pdb file. This cell will create a new file in your folder called 7l0j.clean.pdb

In [6]:
from pyrosetta.toolbox import cleanATOM
cleanATOM("7l0j.pdb")
clean_pose = pose_from_pdb("7l0j.clean.pdb")

core.import_pose.import_pose: File '7l0j.clean.pdb' automatically determined to be of type PDB
core.conformation.Conformation: Found disulfide between residues 4 68
core.conformation.Conformation: current variant for 4 CYS
core.conformation.Conformation: current variant for 68 CYS
core.conformation.Conformation: current variant for 4 CYD
core.conformation.Conformation: current variant for 68 CYD
core.conformation.Conformation: Found disulfide between residues 30 99
core.conformation.Conformation: current variant for 30 CYS
core.conformation.Conformation: current variant for 99 CYS
core.conformation.Conformation: current variant for 30 CYD
core.conformation.Conformation: current variant for 99 CYD
core.conformation.Conformation: Found disulfide between residues 34 101
core.conformation.Conformation: current variant for 34 CYS
core.conformation.Conformation: current variant for 101 CYS
core.conformation.Conformation: current variant for 34 CYD
core.conformation.Conformation: current vari

# **Display Protein Sequence**

In [None]:
pose.annotated_sequence()


'D[ASP:NtermProteinFull]GPC[CYS:disulfide]ALRELSVDLRAERSVLIPETYQANNC[CYS:disulfide]QGVC[CYS:disulfide]GWPQSDRNPRYGNHVVLLLKMQARGAALARPPCC[CYS:disulfide]VPTAYAGKLLISLSEERISAHHVPNMVATEC[CYS:disulfide]GC[CYS:disulfide]R[ARG:CtermProteinFull]H[HIS:NtermProteinFull]MPPNRRTC[CYS:disulfide]VFFEAPGVRGSTKTLGELLDTGTELPRAIRC[CYS:disulfide]LYSRC[CYS:disulfide]C[CYS:disulfide]FGIWNLTQDRAQVEMQGC[CYS:disulfide]RDSDEPGC[CYS:disulfide]ESLHC[CYS:disulfide]DPSPRAHPSPGSTLFTC[CYS:disulfide]SC[CYS:disulfide]GTDFC[CYS:disulfide]NANYSHLP[PRO:CtermProteinFull]Z[pdb_NAG]Z[pdb_SO4]'

In [None]:
pose.sequence()

'DGPCALRELSVDLRAERSVLIPETYQANNCQGVCGWPQSDRNPRYGNHVVLLLKMQARGAALARPPCCVPTAYAGKLLISLSEERISAHHVPNMVATECGCRHMPPNRRTCVFFEAPGVRGSTKTLGELLDTGTELPRAIRCLYSRCCFGIWNLTQDRAQVEMQGCRDSDEPGCESLHCDPSPRAHPSPGSTLFTCSCGTDFCNANYSHLPZZ'

In [None]:
clean_pose.annotated_sequence()


'D[ASP:NtermProteinFull]GPC[CYS:disulfide]ALRELSVDLRAERSVLIPETYQANNC[CYS:disulfide]QGVC[CYS:disulfide]GWPQSDRNPRYGNHVVLLLKMQARGAALARPPCC[CYS:disulfide]VPTAYAGKLLISLSEERISAHHVPNMVATEC[CYS:disulfide]GC[CYS:disulfide]R[ARG:CtermProteinFull]H[HIS:NtermProteinFull]MPPNRRTC[CYS:disulfide]VFFEAPGVRGSTKTLGELLDTGTELPRAIRC[CYS:disulfide]LYSRC[CYS:disulfide]C[CYS:disulfide]FGIWNLTQDRAQVEMQGC[CYS:disulfide]RDSDEPGC[CYS:disulfide]ESLHC[CYS:disulfide]DPSPRAHPSPGSTLFTC[CYS:disulfide]SC[CYS:disulfide]GTDFC[CYS:disulfide]NANYSHLP[PRO:CtermProteinFull]'

In [None]:
clean_pose.sequence()

'DGPCALRELSVDLRAERSVLIPETYQANNCQGVCGWPQSDRNPRYGNHVVLLLKMQARGAALARPPCCVPTAYAGKLLISLSEERISAHHVPNMVATECGCRHMPPNRRTCVFFEAPGVRGSTKTLGELLDTGTELPRAIRCLYSRCCFGIWNLTQDRAQVEMQGCRDSDEPGCESLHCDPSPRAHPSPGSTLFTCSCGTDFCNANYSHLP'

#**Exercises**

*   Exercise 1 - What is the difference between sequence() and annotated sequence() 

*   Exercise 2 - What is the difference between clean_pose sequence and sequence



# **Score a pose**

the default energy function is called ref2015 which is a weighting of various energy terms that we learned about in the lecture

In [None]:
sfxn = get_score_function()
print(sfxn)

core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
ScoreFunction::show():
weights: (fa_atr 1) (fa_rep 0.55) (fa_sol 1) (fa_intra_rep 0.005) (fa_intra_sol_xover4 1) (lk_ball_wtd 1) (fa_elec 1) (pro_close 1.25) (hbond_sr_bb 1) (hbond_lr_bb 1) (hbond_bb_sc 1) (hbond_sc 1) (dslf_fa13 1.25) (omega 0.4) (fa_dun 0.7) (p_aa_pp 0.6) (yhh_planarity 0.625) (ref 1) (rama_prepro 0.45)
energy_method_options: EnergyMethodOptions::show: aa_composition_setup_files: 
EnergyMethodOptions::show: mhc_epitope_setup_files: 
EnergyMethodOptions::show: netcharge_setup_files: 
EnergyMethodOptions::show: aspartimide_penalty_value: 25
EnergyMethodOptions::show: etable_type: FA_STANDARD_DEFAULT
analytic_etable_evaluation: 1
EnergyMethodOptions::show: method_weights: ref 1.32468 3.25479 -2.14574 -2.72453 1.21829 0.79816 -0.30065 2.30374 -0.71458 1.66147 1.65735 -1.34026 -1.64321 -1.45095 -0.09474 -0.28969 1.15175 2.64269 2.26099 0.58223
EnergyMethodOptions::show: method_weights: free_res
EnergyMethodOptions:

In [None]:
print(sfxn(pose))

973.1901999727738


In [None]:
sfxn.show(pose)

core.scoring.ScoreFunction: 
------------------------------------------------------------
 Scores                       Weight   Raw Score Wghtd.Score
------------------------------------------------------------
 fa_atr                       1.000   -1095.142   -1095.142
 fa_rep                       0.550    2055.665    1130.616
 fa_sol                       1.000     681.879     681.879
 fa_intra_rep                 0.005     411.280       2.056
 fa_intra_sol_xover4          1.000      43.121      43.121
 lk_ball_wtd                  1.000     -24.416     -24.416
 fa_elec                      1.000    -238.732    -238.732
 pro_close                    1.250      30.706      38.382
 hbond_sr_bb                  1.000     -19.656     -19.656
 hbond_lr_bb                  1.000     -73.486     -73.486
 hbond_bb_sc                  1.000     -22.669     -22.669
 hbond_sc                     1.000      -5.105      -5.105
 dslf_fa13                    1.250      -8.943     -11.179
 omega  

Change the weight of various energy terms

Breakdown energy terms between a pair of atoms. Compute the energy of two atoms and return the LJ, solvation and electrostatic terms.

In [None]:
residue21_atomN = residue_21.atom_index("N")
residue21_atomO = residue_21.atom_index("O")
pyrosetta.etable_atom_pair_energies(residue_21, residue21_atomN, residue_21, residue21_atomO, sfxn)

(-0.15176425575543143,
 0.02212383175326532,
 0.7072056183066056,
 2.6673026379709674)

Reweighting energy terms in the score function

In [None]:
sfxn_reweighted = get_score_function()
sfxn_reweighted.set_weight(rosetta.core.scoring.fa_sol, 2.0)

core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015


In [None]:
sfxn_reweighted.show(pose)
print(sfxn_reweighted)

core.scoring.ScoreFunction: 
------------------------------------------------------------
 Scores                       Weight   Raw Score Wghtd.Score
------------------------------------------------------------
 fa_atr                       1.000   -1095.142   -1095.142
 fa_rep                       0.550    2055.665    1130.616
 fa_sol                       2.000     681.879    1363.758
 fa_intra_rep                 0.005     411.280       2.056
 fa_intra_sol_xover4          1.000      43.121      43.121
 lk_ball_wtd                  1.000     -24.416     -24.416
 fa_elec                      1.000    -238.732    -238.732
 pro_close                    1.250      30.706      38.382
 hbond_sr_bb                  1.000     -19.656     -19.656
 hbond_lr_bb                  1.000     -73.486     -73.486
 hbond_bb_sc                  1.000     -22.669     -22.669
 hbond_sc                     1.000      -5.105      -5.105
 dslf_fa13                    1.250      -8.943     -11.179
 omega  

**Exercises**

- Change the weight of the lenard-jones potential terms and score the PDB
- Score the chains in PDB seperately

# **Relaxing a structure in PyRosetta**

In [None]:
# FastRelax Mover
fr = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn_in=sfxn, standard_repeats=1)

In [None]:
fr.apply(pose)

protocols.relax.FastRelax: CMD: repeat  973.19  0  0  0.55
protocols.relax.FastRelax: CMD: coord_cst_weight  973.19  0  0  0.55
protocols.relax.FastRelax: CMD: scale:fa_rep  -112.201  0  0  0.022
core.pack.task: Packer task: initialize from command line()
core.pack.rotamer_set.RotamerSet_: Using simple Rotamer generation logic for pdb_NAG
core.pack.rotamer_set.RotamerSet_: Using simple Rotamer generation logic for pdb_SO4
core.pack.pack_rotamers: built 5455 rotamers at 213 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
protocols.relax.FastRelax: CMD: repack  -444.663  0  0  0.022
protocols.relax.FastRelax: CMD: scale:fa_rep  -437.001  0  0  0.02805
protocols.relax.FastRelax: CMD: min  -758.964  1.54444  1.54444  0.02805
protocols.relax.FastRelax: CMD: coord_cst_weight  -758.964  1.54444  1.54444  0.02805
protocols.relax.FastRelax: CMD: scale:fa_rep  -453.583  1.54444  1.54444  0.14575
core.pack.task: Packer task: initialize from 

In [None]:
sfxn.show(pose)

core.scoring.ScoreFunction: 
------------------------------------------------------------
 Scores                       Weight   Raw Score Wghtd.Score
------------------------------------------------------------
 fa_atr                       1.000   -1090.561   -1090.561
 fa_rep                       0.550     238.079     130.943
 fa_sol                       1.000     644.068     644.068
 fa_intra_rep                 0.005     391.388       1.957
 fa_intra_sol_xover4          1.000      40.163      40.163
 lk_ball_wtd                  1.000     -25.202     -25.202
 fa_elec                      1.000    -317.424    -317.424
 pro_close                    1.250       1.065       1.332
 hbond_sr_bb                  1.000     -23.697     -23.697
 hbond_lr_bb                  1.000     -80.558     -80.558
 hbond_bb_sc                  1.000     -52.480     -52.480
 hbond_sc                     1.000     -24.128     -24.128
 dslf_fa13                    1.250      -7.493      -9.366
 omega  

In [None]:
sfxn_reweighted.show(pose)

core.scoring.ScoreFunction: 
------------------------------------------------------------
 Scores                       Weight   Raw Score Wghtd.Score
------------------------------------------------------------
 fa_atr                       1.000   -1090.561   -1090.561
 fa_rep                       0.550     238.079     130.943
 fa_sol                       2.000     644.068    1288.137
 fa_intra_rep                 0.005     391.388       1.957
 fa_intra_sol_xover4          1.000      40.163      40.163
 lk_ball_wtd                  1.000     -25.202     -25.202
 fa_elec                      1.000    -317.424    -317.424
 pro_close                    1.250       1.065       1.332
 hbond_sr_bb                  1.000     -23.697     -23.697
 hbond_lr_bb                  1.000     -80.558     -80.558
 hbond_bb_sc                  1.000     -52.480     -52.480
 hbond_sc                     1.000     -24.128     -24.128
 dslf_fa13                    1.250      -7.493      -9.366
 omega  

Exercises 

- Explore pairwise residue energy terms after relaxing

# **Visualization** 

Download the pose as a pdb after relaxing. We will visualize this using pymol on our desktop

In [None]:
pose.dump_pdb("7l0j_relaxed.pdb")

True

# **Optional: Move a Protein Randomly**

In [None]:
import math
import random

pmm = PyMOLMover()
size_pose = pose.total_residue()
pmm.keep_history(True)

def randomly_move_residues(struct_pose):
    residue_idx = random.randint(1, size_pose)
    currPhi = struct_pose.phi(residue_idx)
    currPsi = struct_pose.psi(residue_idx)
    newPhi = random.gauss(currPhi, 25) #random.guass(mean,standard deviation)
    newPsi = random.gauss(currPsi, 25)
    struct_pose.set_phi(residue_idx,newPhi) 
    struct_pose.set_psi(residue_idx,newPsi)
    pmm.apply(struct_pose)
    return struct_pose

for i in range(5):
  randomly_move_residues(pose)



<pyrosetta.rosetta.core.pose.Pose at 0x7f183960c730>

<pyrosetta.rosetta.core.pose.Pose at 0x7f183960c730>

<pyrosetta.rosetta.core.pose.Pose at 0x7f183960c730>

<pyrosetta.rosetta.core.pose.Pose at 0x7f183960c730>

<pyrosetta.rosetta.core.pose.Pose at 0x7f183960c730>

In [None]:
pose.dump_pdb("7l0j_random.pdb")

True

# **Optional : Combine two pdbs**

In [None]:
pose1 = pose_from_file("7l0j_relaxed.pdb")
pose2 = pose_from_file("7l0j.clean.pdb")
pyrosetta.rosetta.core.pose.append_pose_to_pose(pose1,pose2)
pose1.dump_pdb("7l0j_combined.pdb")

core.import_pose.import_pose: File '7l0j_relaxed.pdb' automatically determined to be of type PDB
core.conformation.Conformation: Found disulfide between residues 111 148
core.conformation.Conformation: current variant for 111 CYS
core.conformation.Conformation: current variant for 148 CYS
core.conformation.Conformation: current variant for 111 CYD
core.conformation.Conformation: current variant for 148 CYD
core.conformation.Conformation: Found disulfide between residues 142 166
core.conformation.Conformation: current variant for 142 CYS
core.conformation.Conformation: current variant for 166 CYS
core.conformation.Conformation: current variant for 142 CYD
core.conformation.Conformation: current variant for 166 CYD
core.conformation.Conformation: Found disulfide between residues 147 174
core.conformation.Conformation: current variant for 147 CYS
core.conformation.Conformation: current variant for 174 CYS
core.conformation.Conformation: current variant for 147 CYD
core.conformation.Confor

True

# **Optional : Using py3Dmol for visualizing and interacting with PDB files in colab notebooks**

In [None]:
import py3Dmol

with open("7l0j.pdb") as pdb_info:
  pdb_struct = "".join([f for f in pdb_info])

In [None]:
sequence_pose.dump_pdb("seq_only.pdb")

True

In [None]:
# Read in PDB File
view = py3Dmol.view(width=300, height=300)
view.addModelsAsFrames(pdb_struct)
view.setStyle({'model': -1}, {"cartoon": {'colorscheme': 'greenCarbon'}})
view.zoomTo()

<py3Dmol.view at 0x7f3313975510>

In [None]:
import nglview as nv
from google.colab import output
output.enable_custom_widget_manager()
nv.show_pdbid("6m0j",gui=True)

NGLWidget()

Tab(children=(Box(children=(Box(children=(Box(children=(Label(value='step'), IntSlider(value=1, min=-100)), la…