# Workshop #2: PyRosetta

Rosetta is a suite of algorithms for biomolecular structure prediction and design. Rosetta is
written in C++ and is available from www.rosettacommons.org. PyRosetta is a toolkit in the
programming language Python, which encapsulates the Rosetta functionality by using the
compiled C++ libraries. Python is an easy language to learn and includes modern programming
approaches such as objects. It can be used via scripts and interactively as a command-line
program, similar to MATLAB®.

The goals of this first workshop are (1) to have you learn to use PyRosetta both interactively
and by writing programs and (2) to have you learn the PyRosetta functions to access and
manipulate properties of protein structure.

# Pose Lab

In this lab, we will get practice working with the `Pose` class in PyRosetta. We will load in a protein from a PDB files, use the `Pose` class to learn about the geometry of the protein, make changes to this geometry, and visualize the changes easily with `PyMOL` and PyRosetta's `PyMOLMover`. 

On the corresponding `Pose` lab found on the PyRosetta website, you will find various useful commands to interrogate poses; these may come in handy for the exercises.

**PyRosetta Installation:**
The following two lines will load in the PyRosetta library and load in database files. If this does not work, please notify the professor or the TA.

In [2]:
from pyrosetta import *
init()

[0mcore.init: [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: [0mRosetta version: PyRosetta4.Release.python36.mac r208 2019.04+release.fd666910a5e fd666910a5edac957383b32b3b4c9d10020f34c1 http://www.pyrosetta.org 2019-01-22T15:55:37
[0mcore.init: [0mcommand: PyRosetta -ex1 -ex2aro -database /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database
[0mcore.init: [0m'RNG device' seed mode, using '/dev/urandom', seed=369733516 seed_offset=0 real_seed=369733516
[0mcore.init.random: [0mRandomGenerator:init: Normal mode, seed=369733516 RG_type=mt19937


## Loading in a PDB File ##

We will spend some time today looking at the crystal structure for the protein **PafA** (PDB ID: 5tj3) using Pyrosetta and PyMOL. PafA is an alkaline phosphatase, which removes a phosphate group from a phosphate monoester. In this structure, a modified amino acid, phosphothreonine, is used to mimic the substrate in the active site. Let's load in this structure with PyRosetta (make sure that you have the PDB file located in your current directory):

`pose = pose_from_pdb("5tj3.pdb")`

In [2]:
pose = pose_from_pdb("5tj3.pdb") #d

[0mcore.chemical.GlobalResidueTypeSet: [0mFinished initializing fa_standard residue type set.  Created 696 residue types
[0mcore.chemical.GlobalResidueTypeSet: [0mTotal time to initialize 1.16246 seconds.
[0mcore.import_pose.import_pose: [0mFile '5tj3.pdb' automatically determined to be of type PDB
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 233 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 350 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 353 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 354 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 382 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 454 because of missing atom number 6 atom n

Use `pose.sequence()` to look at the protein's sequence:

In [3]:
# print out the sequence of the pose
pose.sequence() #d

'NAVPRPKLVVGLVVDQMRWDYLYRYYSKYGEGGFKRMLNTGYSLNNVHIDYVPTVTAIGHTSIFTGSVPSIHGIAGNDWYDKELGKSVYCTSDETVQPVGTTSNSVGQHSPRNLWSTTVTDQLGLATNFTSKVVGVSLKDRASILPAGHNPTGAFWFDDTTGKFITSTYYTKELPKWVNDFNNKNVPAQLVANGWNTLLPINQYTESSEDNVEWEGLLGSKKTPTFPYTDLAKDYEAKKGLIRTTPFGNTLTLQMADAAIDGNQMGVDDITDFLTVNLASTDYVGHNFGPNSIEVEDTYLRLDRDLADFFNNLDKKVGKGNYLVFLSADHGAAHSVGFMQAHKMPTGFFDMKKEMNAKLKQKFGADNIIAAAMNYQVYFDRKVLADSKLELDDVRDYVMTELKKEPSVLYVLSTDEIWESSIPEPIKSRVINGYNWKRSGDIQIISKDGYLSAYSKKGTTHSVWNSYDSHIPLLFMGWGIKQGESNQPYHMTDIAPTVSSLLKIQFPSGAVGKPITEVIGZZZZ'

Sometimes PDB files do not conform to standards and need to be cleaned to be loaded successfully with PyRosetta. One way to make sure the file is loaded successfully is to only include the ATOM lines from the PDB file. Alternatively, you could use the cleanATOM function in pyrosetta.toolbox to achieve the same: 

In [4]:
from pyrosetta.toolbox import cleanATOM
cleanATOM("5tj3.pdb")

This method will create a cleaned 5tj3.clean.pdb file for you. Lets load this into PyRosetta as well:

In [5]:
pose_clean = pose_from_pdb("5tj3.clean.pdb")

[0mcore.import_pose.import_pose: [0mFile '5tj3.clean.pdb' automatically determined to be of type PDB
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 232 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 349 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 352 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 353 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 381 because of missing atom number 6 atom name  CG
[0mcore.pack.pack_missing_sidechains: [0mpacking residue number 453 because of missing atom number 6 atom name  CG
[0mcore.pack.task: [0mPacker task: initialize from command line()
[0mcore.scoring.ScoreFunctionFactory: [0mSCOREFUNCTION: [32mref2015[0m
[0mcore.pack.pack_rotamers: [0mbuilt 90 rotamers a

In our case, we could load in the PDB file for 5tj3 without cleaning it. In fact, we've lost some residues when cleaning the PDB file with cleanATOM. What is the difference in the `sequence` of the `pose_clean` now, compared to before?

In [6]:
# print out the sequence of the pose_clean
pose_clean.sequence() #d

'NAVPRPKLVVGLVVDQMRWDYLYRYYSKYGEGGFKRMLNTGYSLNNVHIDYVPTVAIGHTSIFTGSVPSIHGIAGNDWYDKELGKSVYCTSDETVQPVGTTSNSVGQHSPRNLWSTTVTDQLGLATNFTSKVVGVSLKDRASILPAGHNPTGAFWFDDTTGKFITSTYYTKELPKWVNDFNNKNVPAQLVANGWNTLLPINQYTESSEDNVEWEGLLGSKKTPTFPYTDLAKDYEAKKGLIRTTPFGNTLTLQMADAAIDGNQMGVDDITDFLTVNLASTDYVGHNFGPNSIEVEDTYLRLDRDLADFFNNLDKKVGKGNYLVFLSADHGAAHSVGFMQAHKMPTGFFDMKKEMNAKLKQKFGADNIIAAAMNYQVYFDRKVLADSKLELDDVRDYVMTELKKEPSVLYVLSTDEIWESSIPEPIKSRVINGYNWKRSGDIQIISKDGYLSAYSKKGTTHSVWNSYDSHIPLLFMGWGIKQGESNQPYHMTDIAPTVSSLLKIQFPSGAVGKPITEVIG'

With the function `annotated_sequence` below, we can start to see in more detail what the differences are. Note that non-canonical amino acids and hetatms are spelled out more explicitly now.

In [7]:
pose.annotated_sequence()

'N[ASN:NtermProteinFull]AVPRPKLVVGLVVDQMRWDYLYRYYSKYGEGGFKRMLNTGYSLNNVHIDYVPTVT[THR:phosphorylated]AIGHTSIFTGSVPSIHGIAGNDWYDKELGKSVYCTSDETVQPVGTTSNSVGQHSPRNLWSTTVTDQLGLATNFTSKVVGVSLKDRASILPAGHNPTGAFWFDDTTGKFITSTYYTKELPKWVNDFNNKNVPAQLVANGWNTLLPINQYTESSEDNVEWEGLLGSKKTPTFPYTDLAKDYEAKKGLIRTTPFGNTLTLQMADAAIDGNQMGVDDITDFLTVNLASTDYVGHNFGPNSIEVEDTYLRLDRDLADFFNNLDKKVGKGNYLVFLSADHGAAHSVGFMQAHKMPTGFFDMKKEMNAKLKQKFGADNIIAAAMNYQVYFDRKVLADSKLELDDVRDYVMTELKKEPSVLYVLSTDEIWESSIPEPIKSRVINGYNWKRSGDIQIISKDGYLSAYSKKGTTHSVWNSYDSHIPLLFMGWGIKQGESNQPYHMTDIAPTVSSLLKIQFPSGAVGKPITEVIG[GLY:CtermProteinFull]Z[ZN]Z[ZN]Z[ZN]Z[ZN]'

In [8]:
pose_clean.annotated_sequence()

'N[ASN:NtermProteinFull]AVPRPKLVVGLVVDQMRWDYLYRYYSKYGEGGFKRMLNTGYSLNNVHIDYVPTVAIGHTSIFTGSVPSIHGIAGNDWYDKELGKSVYCTSDETVQPVGTTSNSVGQHSPRNLWSTTVTDQLGLATNFTSKVVGVSLKDRASILPAGHNPTGAFWFDDTTGKFITSTYYTKELPKWVNDFNNKNVPAQLVANGWNTLLPINQYTESSEDNVEWEGLLGSKKTPTFPYTDLAKDYEAKKGLIRTTPFGNTLTLQMADAAIDGNQMGVDDITDFLTVNLASTDYVGHNFGPNSIEVEDTYLRLDRDLADFFNNLDKKVGKGNYLVFLSADHGAAHSVGFMQAHKMPTGFFDMKKEMNAKLKQKFGADNIIAAAMNYQVYFDRKVLADSKLELDDVRDYVMTELKKEPSVLYVLSTDEIWESSIPEPIKSRVINGYNWKRSGDIQIISKDGYLSAYSKKGTTHSVWNSYDSHIPLLFMGWGIKQGESNQPYHMTDIAPTVSSLLKIQFPSGAVGKPITEVIG[GLY:CtermProteinFull]'

### Exercise 1: Inspecting pose sequences

Visually inspect the sequences to find the difference(s) between the `pose_clean.sequence()` and `pose.sequence()`. Were residues removed? Which ones?

### Bonus Exercise 1: Identifying differences in sequences

(Optional) Write a program to automatically find the differences between these two sequences

Because this PDB file was able to load into PyRosetta successfully without the cleanATOM method, we're going to stick with this slightly larger `pose` through the rest of this lab.

## Working with Pose residues ##

   We can use methods in `Pose` to count residues and pick out residues from the pose. Remember that `Pose` is a python class, and to access methods it implements, you need an instance of the class (here `pose` or `pose_clean`) and you then use a dot after the instance.

In [9]:
print(pose.total_residue())
print(pose_clean.total_residue())
# Did you catch all the missing residues before?

524
519


 Store the `Residue` information for residue 20 of the pose by using the `pose.residue(20)` function.

In [10]:
# residue20 = type here
residue20 =  pose.residue(20) #d
print(residue20.name())

ASP


### Exercise 2: Residue objects

Use the `pose`'s `.residue()` object to get the 24th residue of the protein pose. What is the 24th residue in the PDB file (look in the PDB file)? Are they the same residue?

In [11]:
# store the 24th residue in the pose into a variable (see residue20 example above)
residue24 = pose.residue(24) #d

In [12]:
# what other methods are attached to that Residue object? (type "residue24." and hit Tab to see a list of commands)


We can immediately see that the numbering PyRosetta internally uses for pose residues is different from the PDB file. The information corresponding to the PDB file can be accessed through the `pose.pdb_info()` object.

In [13]:
print(pose.pdb_info().chain(24))
print(pose.pdb_info().number(24))

A
47


By using the `pdb2pose` method in `pdb_info()`, we can turn PDB numbering (which requires a chain ID and a residue number) into Pose numbering

In [14]:
# PDB numbering to Pose numbering
print(pose.pdb_info().pdb2pose('A', 24))

1


Use the `pose2pdb` method in `pdb_info()` to see what is the corresponding PDB chain and residue ID for pose residue number 24

In [15]:
# Pose numbering to PDB numbering

In [16]:
print(pose.pdb_info().pose2pdb(1))

24 A 


Now we can see how to examine the identity of a residue by PDB chain and residue number.

Once we get a residue, there are various methods in the `Residue` class that might be for running analysis. We can get instances of the `Residue` class from `Pose`. For instance, we can do the following:

In [17]:
res_24 = pose.residue(24)
print(res_24.name())
print(res_24.is_charged())

ARG
True


## Accessing PyRosetta Documentation ##

One benefit of working within Jupyter notebooks is that we can make use of its autocomplete features. To see an example, try typing `res_24.is_` and pressing `tab` to find other features of residues you can examine. Note that you can scroll down to see more features.

Now that we've looked through those functions, we know how to confirm that PyRosetta has loaded in the zinc ions as metal ions. 

In [18]:
zn_resid = pose.pdb_info().pdb2pose('A', 601)
res_zn = pose.residue(zn_resid) 
res_zn.is_metal() 

True

### Exercise 3: Python Object Help
We can also explore documentation for objects and methods from Jupyter notebooks. Say you wanted to find out more about the Pose object. Try typing in `Pose?`, `?Pose` or `help(Pose)`.

In [None]:
Pose? #d

By the way, now if you ever go on to develop some PyRosetta functions, you can see the importance of docstrings!

This works for PyRosetta methods as well:

In [None]:
res_24 = pose.residue(24)
res_24.atom_index?

### Exercise 4: Some residue commands

Now use the `atom_index` method and the method below to find out whether the "CA" atom in res_24 is a backbone atom. 

In [19]:
res_24.atom_index("CA") #d
res_24.atom_is_backbone(2) #d

True

In [None]:
?res_24.atom_is_backbone

## Getting spatial features from a Pose ## 

<img src="../../Media/Workshop2/dihedral-final.png" width="500">

`Pose` objects make it easy to access angles, distances, and torsions for analysis. Lets take a look at how to get backbone torsions first.

In [1]:
#resid = "get the pose residue number for chain A:res 28 using the pdb2pose function"

In [20]:
resid = pose.pdb_info().pdb2pose('A', 28)

In [21]:
print("phi:", pose.phi(resid))
print("psi:", pose.psi(resid))
print("chi1:", pose.chi(1, resid))

phi: -149.17513487055064
psi: 151.30037995499168
chi1: -82.850785668982


Say we want to find the length of the $N$-$C_\alpha$ and $C_\alpha$-$C$ bonds for residue A:28 from the PDB file. We can use a couple approaches. The first involves using the bond length in the `Conformation` class, which stores some info on protein geometry. Take a look at some of the methods in the `Conformation` class using tab completion.

In [22]:
conformation = pose.conformation()
# do some tab completion here to explore the Conformation class methods
#conformation.

Look at the documentation for the method `conformation.bond_length` below. Remember using the `?`

In [None]:
?conformation.bond_length #d

To use the bond_length method in the `Conformation` class, it looks like we'll need to make `AtomID` objects. We can do this using an atom index and residue ID as follows:

In [23]:
# Double Check: does resid contain the Pose numbering or PDB numbering?
res_28 = pose.residue(resid)
N28 = AtomID(res_28.atom_index("N"), resid)
CA28 = AtomID(res_28.atom_index("CA"), resid)
C28 = AtomID(res_28.atom_index("C"), resid)

# try printing out an AtomID object!

In [24]:
print(N28) #d

 atomno= 1 rsd= 5 


As usual, if you did not know how to construct an `AtomID`, you could check the documentation using `?AtomID`.

Now we can compute the bond lengths:

In [25]:
print(pose.conformation().bond_length(N28, CA28))
print(pose.conformation().bond_length(CA28, C28))

1.456100614655453
1.5184027792387658


Alternatively, we can compute bond lengths ourselves starting from the xyz coordinates of the atoms. 

The method `xyz` of `Residue` returns a `Vector` class. The `Vector` class has various useful builtin methods including computing dot products, cross products, and norms. Through operator overloading in the `Vector` class, you can just subtract and add vector objects and they will manipulate the corresponding vectors appropriately.

In [26]:
N_xyz = res_28.xyz("N")
CA_xyz = res_28.xyz("CA")
C_xyz = res_28.xyz("C")
N_CA_vector = CA_xyz - N_xyz
CA_C_vector = CA_xyz - C_xyz
print(N_CA_vector.norm())
print(CA_C_vector.norm())

1.456100614655453
1.5184027792387658


Thankfully, the two approaches for computing distances check out!

**Note**: Not all bond lengths, angles, and torsions will be accessible using the `Conformation` object. That is because the `Conformation` object stores only the subset it needs to generate xyz locations for the atoms in the pose. The most stable way to get this information is to compute it using the xyz Cartesian coordinate vectors as a starting point.

# Rosetta Database Files

Let's take a look at Rosetta's ideal values for this amino acid's bond lengths and see how these values compare. First find Pyrosetta's database directory on your computer (hint: it should have shown up when you ran `init()` at the beginning of this Jupyter notebook.) Head to the subdirectory `chemical/residue_type_sets/fa_standard/` to find the residue you're looking at. Let's look at valine, which can be found in the `l-caa` folder, since it is a standard amino acid. The `ICOOR_INTERNAL` lines will provide torsion angles, bond angles, and bond lengths between subsequent atoms in this residue. From this you should be able to deduce Rosetta's ideal $N$-$C_\alpha$ and $C_\alpha$-$C$ bond lengths.

These ideal values would for instance be used if we generated a new pose from an amino acid sequence. In fact, let's try that here:

In [27]:
one_res_seq = "V"
pose_one_res = pose_from_sequence(one_res_seq)
print(pose_one_res.sequence())

V


In [28]:
N_xyz = pose_one_res.residue(1).xyz("N")
CA_xyz = pose_one_res.residue(1).xyz("CA")
C_xyz = pose_one_res.residue(1).xyz("C")
print((CA_xyz - N_xyz).norm())
print((CA_xyz - C_xyz).norm())

1.458004
1.52326


Now lets figure out how to get angles in the protein. If the `Conformation` class has the angle we're looking for, we can use the AtomID objects we've already created:

In [29]:
angle = pose.conformation().bond_angle(N28, CA28, C28)
print(angle)

1.913188766577386


Notice that `.bond_angle()` gives us the angle in radians. We can compute the above angle in degrees:

In [30]:
import math
angle*180/math.pi

109.61764173672383

Note how this compares to the expected angle based on a tetrahedral geometry for the $C_\alpha$ carbon.

### Exercise 5: Calculating psi angle

Try to calculate this angle using the xyz atom positions for N, CA, and C of residue A:28 in the protein. You can use the `Vector` function `v3 = v1.dot(v2)` along with `v1.norm()`. The vector angle between two vectors BA and BC is $\cos^{-1}(\frac{BA \cdot BC}{|BA| |BC|})$.

## Manipulating Protein Geometry

We can also alter the geometry of the protein, with particular interest in manipulating the protein backbone and $\chi$ dihedrals.

### Exercise 6: Changing phi/psi angles

Perform each of the following manipulations, and give the coordinates of the CB atom of Pose residue 2 afterward.
- Set the $\phi$ of residue 2 to -60
- Set the $\psi$ of residue 2 to -43

In [31]:
# three alanines
tripeptide = pose_from_sequence("AAA")

orig_phi = tripeptide.phi(2)
orig_psi = tripeptide.psi(2)
print("original phi:", orig_phi)
print("original psi:", orig_psi)

# print the xyz coordinates of the CB atom of residue 2 here BEFORE setting
print("xyz coordinates:", tripeptide.residue(2).xyz("CB")) #d

# set the phi and psi here
tripeptide.set_phi(2, -60) #d
tripeptide.set_psi(2, -43) #d

print("new phi:", tripeptide.phi(2))
print("new psi:", tripeptide.psi(2))

# print the xyz coordinates of the CB atom of residue 2 here AFTER setting
# did changing the phi and psi angle change the xyz coordinates of the CB atom of alanine 2?


original phi: 180.0
original psi: 180.0
xyz coordinates:       3.535270304899897       3.659035776744378       1.199094204197625
new phi: -60.0
new psi: -43.0


By printing the pose (see below command), we can see that the whole protein is in a single chain from residue 1 to 524 (or 519, depending on if the pose was cleaned).

The `FOLD_TREE` controls how changes to residue geometry propagate through the protein (left to right in the FoldTree chain.) We will go over the FoldTree in another lecture, but based on how you think perturbing the backbone of a protein structure affects the overall protein conformation, consider this question: If you changed a torsion angle for residue 5, would the Cartesian coordinaes for residue 7 change? What about the coordinates for residue 3?

Try looking at the pose in PyMOL before and after you set the backbone $\phi$ and $\psi$ for a chosen residue.

In [32]:
print(pose)

PDB file name: 5tj3.pdb
Total residues:524
Sequence: NAVPRPKLVVGLVVDQMRWDYLYRYYSKYGEGGFKRMLNTGYSLNNVHIDYVPTVTAIGHTSIFTGSVPSIHGIAGNDWYDKELGKSVYCTSDETVQPVGTTSNSVGQHSPRNLWSTTVTDQLGLATNFTSKVVGVSLKDRASILPAGHNPTGAFWFDDTTGKFITSTYYTKELPKWVNDFNNKNVPAQLVANGWNTLLPINQYTESSEDNVEWEGLLGSKKTPTFPYTDLAKDYEAKKGLIRTTPFGNTLTLQMADAAIDGNQMGVDDITDFLTVNLASTDYVGHNFGPNSIEVEDTYLRLDRDLADFFNNLDKKVGKGNYLVFLSADHGAAHSVGFMQAHKMPTGFFDMKKEMNAKLKQKFGADNIIAAAMNYQVYFDRKVLADSKLELDDVRDYVMTELKKEPSVLYVLSTDEIWESSIPEPIKSRVINGYNWKRSGDIQIISKDGYLSAYSKKGTTHSVWNSYDSHIPLLFMGWGIKQGESNQPYHMTDIAPTVSSLLKIQFPSGAVGKPITEVIGZZZZ
Fold tree:
FOLD_TREE  EDGE 1 520 -1  EDGE 1 521 1  EDGE 1 522 2  EDGE 1 523 3  EDGE 1 524 4 


## Visualization and the PyMOL Mover

To check that the necessary PyRosetta commands are run by PyMOL, open up PyMOL on Polander and check for a message like `PyMOL <--> PyRosetta link started!` in the dialog box. PyMOL is now listening for updates from PyRosetta on port 127.0.0.1 by default.

**Note:** this may not work if many people are trying to do this at the same time, so you may need to specify a different port number by (1) typing `pmm = PyMOLMover('127.0.0.1', some number between 10000 and 65536)` in PyRosetta, (2) `run PyMOL-RosettaServer.py` in PyMOL command line, and (3) `start_rosetta_server('127.0.0.1', that number you used in step 1)` in PyMOL command line.

**If you are using your own computer (not Polander):** either use the PyMOL command line to run the PyMOL-RosettaServer.py file or drag and drop the PyMOL-RosettaServer.py file onto the PyMOL window to start the PyMOL-PyRosetta link.

The `PyMOLMover` class will let us send information from PyRosetta to PyMOL for quick visualization. We are creating an instance of PyMOLMover called `pmm`.

In [37]:
from pyrosetta import PyMOLMover

(Skip this if you already initialized pmm.)

In [38]:
pmm = PyMOLMover() #go here for additional help: http://www.pyrosetta.org/pymol_mover-tutorial


To view the pose, you can use the apply method on your pose.

In [39]:
clone_pose = Pose()
clone_pose.assign(pose)
pmm.apply(clone_pose)

The PyMOLMover has useful helper functions. For example, you can visualize all the hydrogen bonds in your protein with the following:

In [36]:
pmm.send_hbonds(clone_pose)

Just deselect the hydrogen bonds in PyMOL if you want to hide them temporarily.

What other send methods does the PyMOLMover have?

The method `keep_history`, if set to True, allows you to load in structures with the same name into states of the same object in PyMOL. This is the starting point for creating a PyMOL movie of your structure, and allows you to loop through structures in different geometries efficiently (try clicking the arrows that are shown below in the red box).

In [40]:
pmm.keep_history(True) 
pmm.apply(clone_pose)
clone_pose.set_phi(5, -90)
pmm.apply(clone_pose)

This is what it should look like (assuming you are able to establish the PyMOL <--> PyRosetta link):

![SegmentLocal](../../Media/Workshop2/PyMOL-tutorial.gif "PyMOL")

### Exercise 7: Visualizing changes in backbone angles

Use a `for` loop to change some backbone torsions (phi and psi) of `test_pose`. Be sure to `keep_history` and send to PyMOL. Try printing the $\phi$ and $\psi$ before and after you set it to make sure it is working as you expect.

In [None]:
test_pose = Pose()
test_pose.assign(pose)

# use a for loop here
# set some phi and psi values
# send the structure to PyMOL

## Additional Exercises ##

The following exercises are meant to get you more comfortable with `Pose` methods and python coding. Many will require looping through the residues in a pose. As you find residues that answer these questions, view them in the PyMOL structure to check your work.

**PyMOL Instructios:** View the original protein (5tj3) in PyMOL, view as cartoon, view Zn2+ atoms as spheres, and color the substrate mimic residue TPO distinctly (in PyMOL, try `select resn TPO`).

### Make a Ramachandran plot

- Create the Ramachandran plot for the protein and compare with the [Ramachandran plot](http://kinemage.biochem.duke.edu/teaching/anatax/html/anatax.1b.html)
from  [Richardson's Anatomy and Taxonomy of Protein Structure](http://kinemage.biochem.duke.edu/teaching/anatax/).

Don't forget to label your axes!

In [None]:
import matplotlib
# this inline command gets plots to appear within the notebook
%matplotlib inline
import matplotlib.pyplot as plt

# example of how to make a scatter plot from a list
# uncomment to see how it works and pops up in the notebook
#x_coords = list(range(10))
#y_coords = list(range(10))
#plt.scatter(x_coords, y_coords)
#plt.xlabel("X axis")
#plt.ylabel("Y axis")

# A Ramachandran plot is psi vs phi. Collect these values from the pose and plot them

### Analyzing Amino Acid Patterns

- Find all the polar amino acids in the protein. Using PyMOL, figure out where they are they located in the protein. Are there any patterns here?

Hint, don't type in a residue number one-by-one. Try `select resn XXX` and replace XXX with polar residue names in PyMOL

### Active Site Residues

- Find all residues that coordinate with the Zn2+ atoms around TPO (have any side-chain atoms within < 2.3 Angstroms). These residues may have a role in catalysis.

Consider how you could loop through every atom index in a residue

- Get all residue types within 8 Angstroms of the active site. Are there any patterns in terms of residue types here?

Perhaps residues with backbone atoms within 8-9 Angstroms to the Zn atoms are within the active site

## Answers

### Exercise 6

In [None]:
# three alanines
tripeptide = pose_from_sequence("AAA")

orig_phi = tripeptide.phi(2)
orig_psi = tripeptide.psi(2)
print("original phi:", orig_phi)
print("original psi:", orig_psi)

# print the xyz coordinates of the CB atom of residue 2 here BEFORE setting
print("xyz coordinates:", tripeptide.residue(2).xyz("CB"))

# set the phi and psi here
tripeptide.set_phi(2, -60)
tripeptide.set_psi(2, -43)

print("new phi:", tripeptide.phi(2))
print("new psi:", tripeptide.psi(2))

# print the xyz coordinates of the CB atom of residue 2 here AFTER setting
# did changing the phi and psi angle change the xyz coordinates of the CB atom of alanine 2?


## References
This notebook includes some concepts and exercises from:

"Workshop #2: PyRosetta" in the PyRosetta workbook: https://graylab.jhu.edu/pyrosetta/downloads/documentation/pyrosetta4_online_format/PyRosetta4_Workshop2_PyRosetta.pdf

"Workshop #4.1: PyMOL_Mover" in the PyRosetta workbook: 
http://www.pyrosetta.org/pymol_mover-tutorial