## Side Chain Conformations, the Rotamer Library, and Dunbrack Energies

Begin by importing and initializing PyRosetta and loading cetuximab from PDB 1YY8.clean.pdb used in previous workshops.

```
from pyrosetta import *
from pyrosetta.teaching import *
pyrosetta.init()
pose = pose_from_pdb("1YY8.clean.pdb")
start_pose = Pose()
start_pose.assign(pose)
```

In [1]:
from pyrosetta import *
from pyrosetta.teaching import *
pyrosetta.init()
pose = pose_from_pdb("1YY8.clean.pdb")
start_pose = Pose()
start_pose.assign(pose)

[0mcore.init: [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: [0mRosetta version: PyRosetta4.Release.python36.mac r208 2019.04+release.fd666910a5e fd666910a5edac957383b32b3b4c9d10020f34c1 http://www.pyrosetta.org 2019-01-22T15:55:37
[0mcore.init: [0mcommand: PyRosetta -ex1 -ex2aro -database /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database
[0mcore.init: [0m'RNG device' seed mode, using '/dev/urandom', seed=1425732740 seed_offset=0 real_seed=1425732740
[0mcore.init.random: [0mRandomGenerator:init: Normal mode, seed=1425732740 RG_type=mt19937


  from rosetta.core.scoring import *


[0mcore.chemical.GlobalResidueTypeSet: [0mFinished initializing fa_standard residue type set.  Created 696 residue types
[0mcore.chemical.GlobalResidueTypeSet: [0mTotal time to initialize 1.09089 seconds.
[0mcore.import_pose.import_pose: [0mFile '1YY8.clean.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: [0mFound disulfide between residues 23 88
[0mcore.conformation.Conformation: [0mcurrent variant for 23 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 88 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 23 CYD
[0mcore.conformation.Conformation: [0mcurrent variant for 88 CYD
[0mcore.conformation.Conformation: [0mFound disulfide between residues 134 194
[0mcore.conformation.Conformation: [0mcurrent variant for 134 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 194 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 134 CYD
[0mcore.conformation.Conformation: [0mcurrent variant 

<pyrosetta.rosetta.core.pose.Pose at 0x1065edd50>

__Question:__ What are the φ, ψ, and χ angles of residue K49?

In [3]:
print(pose.residue(49).name())
print("Phi: %.5f\nPsi: %.5f\n" %(pose.phi(49), pose.psi(49)))
print("Chi 1: %.5f\nChi 2: %.5f\nChi 3: %.5f\nChi 4: %.5f" %(pose.chi(1, 49), pose.chi(2, 49), pose.chi(3, 49), pose.chi(4, 49)))

LYS
Phi: -122.41752
Psi: 153.08267

Chi 1: 68.12458
Chi 2: -169.18576
Chi 3: -175.37028
Chi 4: -169.58806


Open `asp.bbdep.rotamers.lib` by unpacking `asp.bbdep.rotamers.gz`  from the directory `rosetta_database/rotamer/ExtendedOpt1-5`. Find the φ/ψ bin for lysine at residue 49 and find the nearest rotamer. 

__Question:__ What are the χ angles and standard deviations of this rotamer? What is its probability?

Score your pose with the standard full-atom score function. What is the energy of K49? Note the Dunbrack energy component (`fa_dun`), which represents the side-chain conformational probability. Does it match what you found in the table? (You will need to convert between probability and energy; use `kT = 1`.) If not, why not?

```
scorefxn = get_fa_scorefxn()
scorefxn(pose)
energies = pose.energies()
print(energies.residue_total_energies(49))
print(energies.residue_total_energies(49)[pyrosetta.rosetta.core.scoring.fa_dun])
```

In [4]:
### BEGIN SOLUTION
scorefxn = get_fa_scorefxn()
scorefxn(pose)
energies = pose.energies()
print(energies.residue_total_energies(49))
print(energies.residue_total_energies(49)[pyrosetta.rosetta.core.scoring.fa_dun])
### END SOLUTION

[0mcore.scoring.ScoreFunctionFactory: [0mSCOREFUNCTION: [32mref2015[0m
( fa_atr; -8.83354) ( fa_rep; 2.78019) ( fa_sol; 5.92847) ( fa_intra_atr; -1.68168) ( fa_intra_rep; 2.47533) ( fa_intra_sol; 1.20218) ( fa_intra_atr_xover4; -0.52776) ( fa_intra_rep_xover4; 0.191981) ( fa_intra_sol_xover4; 0.292855) ( fa_intra_atr_nonprotein; 0) ( fa_intra_rep_nonprotein; 0) ( fa_intra_sol_nonprotein; 0) ( fa_intra_RNA_base_phos_atr; 0) ( fa_intra_RNA_base_phos_rep; 0) ( fa_intra_RNA_base_phos_sol; 0) ( fa_atr_dummy; 0) ( fa_rep_dummy; 0) ( fa_sol_dummy; 0) ( fa_vdw_tinker; 0) ( lk_hack; 0) ( lk_ball; 0) ( lk_ball_wtd; 0.321463) ( lk_ball_iso; 0) ( lk_ball_bridge; 0) ( lk_ball_bridge_uncpl; 0) ( coarse_fa_atr; 0) ( coarse_fa_rep; 0) ( coarse_fa_sol; 0) ( coarse_beadlj; 0) ( mm_lj_intra_rep; 0) ( mm_lj_intra_atr; 0) ( mm_lj_inter_rep; 0) ( mm_lj_inter_atr; 0) ( mm_twist; 0) ( mm_bend; 0) ( mm_stretch; 0) ( lk_costheta; 0) ( lk_polar; 0) ( lk_nonpolar; 0) ( lk_polar_intra_RNA; 0) ( lk_nonpolar_int

Use `pose.set_chi(<i>, <res_num>, <chi>)` to set the side chain of residue 49 to the all-anti conformation. (Here, `i` is the χ index, and `chi` is the new torsion angle in degrees.) Re-score the pose and note the Dunbrack energy.

__Question:__ Does it match the probability in the table? Is this conformation valid for cetuximab (i.e., is the total score of this residue acceptable)?

```
for i in range(1, 5):
    pose.set_chi(i, 49, 180)
```

In [5]:
### BEGIN SOLUTION
for i in range(1, 5):
    pose.set_chi(i, 49, 180)
### END SOLUTION

## Monte Carlo Side-Chain Packing

Side-chain packing can be done in a Monte Carlo search routine that iteratively swaps rotamers of a random residue and tests each move using the Metropolis criterion. Rosetta has such a routine pre-packaged as a `Mover` that carries out a simulated annealing search each time it is applied. The specific scope of the packing is specified in a `PackerTask` object, which is similar to a `MoveMap` in that it specifies degrees of freedom. We can specify via commands or from an input file our settings for a `PackerTask`.

Create a `PackerTask` as follows. This will set the task to allow packing only of residue 49:

```
task_pack = standard_packer_task(pose)
task_pack.restrict_to_repacking()
task_pack.temporarily_fix_everything()
task_pack.temporarily_set_pack_residue(49, True)
```

In [6]:
### BEGIN SOLUTION
task_pack = standard_packer_task(pose)
task_pack.restrict_to_repacking() # only allows rotamers of current residue
task_pack.temporarily_fix_everything()
task_pack.temporarily_set_pack_residue(49, True)
### END SOLUTION

[0mcore.pack.task: [0mPacker task: initialize from command line()


The default task allows any amino acid residue to be swapped in for another; that is, it would simulate a protein variant as a result of mutation. This would be useful for protein design but not for side-chain packing. `restrict_to_repacking()` only allows rotamers from the current residue at that position to be used.

We can confirm our settings using (note how only one amino acid is allowed at each position):
```
print(task_pack)
```

In [7]:
### BEGIN SOLUTION
print(task_pack)
### END SOLUTION

#Packer_Task

resid	pack?	design?	allowed_aas
1	FALSE	FALSE	ASP:NtermProteinFull
2	FALSE	FALSE	ILE
3	FALSE	FALSE	LEU
4	FALSE	FALSE	LEU
5	FALSE	FALSE	THR
6	FALSE	FALSE	GLN
7	FALSE	FALSE	SER
8	FALSE	FALSE	PRO
9	FALSE	FALSE	VAL
10	FALSE	FALSE	ILE
11	FALSE	FALSE	LEU
12	FALSE	FALSE	SER
13	FALSE	FALSE	VAL
14	FALSE	FALSE	SER
15	FALSE	FALSE	PRO
16	FALSE	FALSE	GLY
17	FALSE	FALSE	GLU
18	FALSE	FALSE	ARG
19	FALSE	FALSE	VAL
20	FALSE	FALSE	SER
21	FALSE	FALSE	PHE
22	FALSE	FALSE	SER
23	FALSE	FALSE	
24	FALSE	FALSE	ARG
25	FALSE	FALSE	ALA
26	FALSE	FALSE	SER
27	FALSE	FALSE	GLN
28	FALSE	FALSE	SER
29	FALSE	FALSE	ILE
30	FALSE	FALSE	GLY
31	FALSE	FALSE	THR
32	FALSE	FALSE	ASN
33	FALSE	FALSE	ILE
34	FALSE	FALSE	HIS,HIS_D
35	FALSE	FALSE	TRP
36	FALSE	FALSE	TYR
37	FALSE	FALSE	GLN
38	FALSE	FALSE	GLN
39	FALSE	FALSE	ARG
40	FALSE	FALSE	THR
41	FALSE	FALSE	ASN
42	FALSE	FALSE	GLY
43	FALSE	FALSE	SER
44	FALSE	FALSE	PRO
45	FALSE	FALSE	ARG
46	FALSE	FALSE	LEU
47	FALSE	FALSE	LEU
48	FALSE	FALSE	ILE
49	TRUE	FALSE	LYS
50	FALSE	FALS

We now can construct a `PackRotamersMover`:

```
pack_mover = PackRotamersMover(scorefxn, task_pack)
```

In [8]:
### BEGIN SOLUTION
pack_mover = PackRotamersMover(scorefxn, task_pack)
### END SOLUTION

Apply the `PackMover` above to your pose with the `.apply()` method.

```
pack_mover.apply(pose)
```

__Question:__ Now what are the χ angles of K49? Which rotamer is this? What is the Dunbrack energy?

In [9]:
### BEGIN SOLUTION
pack_mover.apply(pose)
### END SOLUTION

[0mcore.pack.pack_rotamers: [0mbuilt 9 rotamers at 1 positions.
[0mcore.pack.interaction_graph.interaction_graph_factory: [0mInstantiating DensePDInteractionGraph


In [9]:
### BEGIN SOLUTION
for i in range(1, 5):
    print(pose.chi(i, 49))
### END SOLUTION    

65.29322908237052
-178.50000000000003
175.57824315863283
65.80745535047775


In [10]:
### BEGIN SOLUTION
print(energies.residue_total_energies(49))
print(energies.residue_total_energies(49)[pyrosetta.rosetta.core.scoring.fa_dun])
### END SOLUTION

( fa_atr; -9.0236) ( fa_rep; 2.72124) ( fa_sol; 6.88515) ( fa_intra_atr; -1.78222) ( fa_intra_rep; 2.48344) ( fa_intra_sol; 1.51257) ( fa_intra_atr_xover4; -0.566597) ( fa_intra_rep_xover4; 0.228627) ( fa_intra_sol_xover4; 0.463576) ( fa_intra_atr_nonprotein; 0) ( fa_intra_rep_nonprotein; 0) ( fa_intra_sol_nonprotein; 0) ( fa_intra_RNA_base_phos_atr; 0) ( fa_intra_RNA_base_phos_rep; 0) ( fa_intra_RNA_base_phos_sol; 0) ( fa_atr_dummy; 0) ( fa_rep_dummy; 0) ( fa_sol_dummy; 0) ( fa_vdw_tinker; 0) ( lk_hack; 0) ( lk_ball; 0) ( lk_ball_wtd; 0.580299) ( lk_ball_iso; 0) ( lk_ball_bridge; 0) ( lk_ball_bridge_uncpl; 0) ( coarse_fa_atr; 0) ( coarse_fa_rep; 0) ( coarse_fa_sol; 0) ( coarse_beadlj; 0) ( mm_lj_intra_rep; 0) ( mm_lj_intra_atr; 0) ( mm_lj_inter_rep; 0) ( mm_lj_inter_atr; 0) ( mm_twist; 0) ( mm_bend; 0) ( mm_stretch; 0) ( lk_costheta; 0) ( lk_polar; 0) ( lk_nonpolar; 0) ( lk_polar_intra_RNA; 0) ( lk_nonpolar_intra_RNA; 0) ( fa_elec; -4.69306) ( fa_elec_bb_bb; 0) ( fa_elec_bb_sc; 0) ( f

__Question:__ What is the new total energy of K49? Why did Rosetta pick this rotamer? Answer this in terms of the components of the score function and in terms of the residues with which K49 interacts.

## Packing for Refinement


Side-chain packing can be used when converting a pose from centroid to full-atom mode, and it is used extensively in full-atom refinement calculations. Let’s examine how packing improves scores.

Use your code from Workshop #4 to create a centroid-representation model for RecA protein domain 2 (PDB ID: 2REB) using the `SwitchResidueTypeSetMover`. Save that centroid “decoy” so that we can compare several basic refinement steps.

```
cen_ras = pose_from_file("6Q21_A.pdb")
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(cen_ras)
```

In [11]:
### BEGIN SOLUTION
cen_ras = pose_from_file("6Q21_A.pdb")
switch = SwitchResidueTypeSetMover("centroid")
switch.apply(cen_ras)
### END SOLUTION

[0mcore.import_pose.import_pose: [0mFile '6Q21_A.pdb' automatically determined to be of type PDB
[0mcore.chemical.GlobalResidueTypeSet: [0mFinished initializing centroid residue type set.  Created 62 residue types
[0mcore.chemical.GlobalResidueTypeSet: [0mTotal time to initialize 0.038653 seconds.


Load another `ras` and keep it in full-atom representation. Save this starting configuration for future use. Score the pose with the standard centroid score function.

__Question:__ Why is the score so high?

```
ras = pose_from_file("6Q21_A.pdb")
start_ras = Pose()
start_ras.assign(ras)
scorefxn(ras)
```

In [12]:
### BEGIN SOLUTION
ras = pose_from_file("6Q21_A.pdb")
start_ras = Pose()
start_ras.assign(ras)
scorefxn(ras)
### END SOLUTION

[0mcore.import_pose.import_pose: [0mFile '6Q21_A.pdb' automatically determined to be of type PDB


1215.729069796814

Create a default `PackRotamersMover` with a `PackerTask` that allows all residues to vary χ angles. Create a test pose from your start pose and pack the side chains.

__Question:__ What is the new pose score?

```
test_ras = Pose()
test_ras.assign(start_ras)

task_pack = standard_packer_task(test_ras)
task_pack.restrict_to_repacking()
task_pack.temporarily_fix_everything()
task_pack.temporarily_set_pack_residue(49, True)

pack_mover = PackRotamersMover(scorefxn, task_pack)
pack_mover.apply(test_ras)
```

In [13]:
### BEGIN SOLUTION
test_ras = Pose()
test_ras.assign(start_ras)

task_pack = standard_packer_task(test_ras)
task_pack.restrict_to_repacking()
task_pack.temporarily_fix_everything()
task_pack.temporarily_set_pack_residue(49, True)

pack_mover = PackRotamersMover(scorefxn, task_pack)
pack_mover.apply(test_ras)
### END SOLUTION

[0mcore.pack.task: [0mPacker task: initialize from command line()
[0mcore.pack.pack_rotamers: [0mbuilt 15 rotamers at 1 positions.
[0mcore.pack.interaction_graph.interaction_graph_factory: [0mInstantiating DensePDInteractionGraph


Reset the test pose to the start configuration. Create a `MoveMap` that allows χ angles but not φ/ψ/ω angles to vary. Confirm the `MoveMap` by printing it. Create a `MinMover` using the Davidson-Fletcher-Powell minimization scheme by applying the method `min_type("dfpmin")` to your mover. Apply the `MinMover` and rescore the pose.

__Question:__ How does this energy compare?

```
test_ras.assign(start_ras)
mm = MoveMap()
mm.set_chi(True)
mm.set_bb(False)
print(mm)

min_mover = MinMover()
min_mover.set_movemap(mm)
min_mover.score_function(scorefxn)
min_mover.min_type("dfpmin")
print(min_mover)

print(scorefxn(test_ras))
min_mover.apply(test_ras)
print(scorefxn(test_ras))
```

In [14]:
### BEGIN SOLUTION
test_ras.assign(start_ras)
mm = MoveMap()
mm.set_chi(True)
mm.set_bb(False)
print(mm)

min_mover = MinMover()
min_mover.set_movemap(mm)
min_mover.score_function(scorefxn)
min_mover.min_type("dfpmin")
print(min_mover)

print(scorefxn(test_ras))
min_mover.apply(test_ras)
print(scorefxn(test_ras))
### END SOLUTION


-------------------------------
  resnum     Type  TRUE/FALSE 
-------------------------------
 DEFAULT      BB     FALSE
 DEFAULT      SC      TRUE
 DEFAULT      NU     FALSE
 DEFAULT  BRANCH     FALSE
-------------------------------
 jumpnum     Type  TRUE/FALSE 
-------------------------------
 DEFAULT     JUMP    FALSE
-------------------------------
  resnum  atomnum     Type  TRUE/FALSE 
-------------------------------
 DEFAULT               PHI    FALSE
 DEFAULT             THETA    FALSE
 DEFAULT                 D    FALSE
 DEFAULT               RB1    FALSE
 DEFAULT               RB2    FALSE
 DEFAULT               RB3    FALSE
 DEFAULT               RB4    FALSE
 DEFAULT               RB5    FALSE
 DEFAULT               RB6    FALSE
-------------------------------


Mover name: MinMover, Mover type: MinMover, Mover current tag:NoTag
Minimization type:	dfpmin
Scorefunction:		ref2015
Score tolerance:	0.01
Nb list:		True
Deriv check:		False
Movemap:

---------------------------

Again, reset the test pose to the starting configuration. Apply the packer and then minimize on the χ angles.

__Question:__ Now what is the final score?

```
test_ras.assign(start_ras)
print(scorefxn(ras))
pack_mover.apply(ras)
print(scorefxn(ras))
min_mover.apply(ras)
print(scorefxn(ras))
```

In [None]:
### BEGIN SOLUTION
test_ras.assign(start_ras)
print(scorefxn(ras))
pack_mover.apply(ras)
print(scorefxn(ras))
min_mover.apply(ras)
print(scorefxn(ras))
### END SOLUTION

For fun, you might examine the individual residue energies to find the residues most responsible for the score changes. Typically, a small number of residues may make clashes that can be resolved using the χ angle minimization, which allows off-rotamer side-chain conformations.

## Design


Design calculations can be accomplished simply by packing side chains with a rotamer set that includes all amino acid types. That is, when the Monte Carlo routine swaps rotamers, it could replace the existing side chain with another conformation of the same residue or some conformation of a different residue type. Trial mutations are accepted or rejected with the Metropolis criterion, and the standard full-atom energy function is supplemented by a reference energy term, `ref`, which represents the relative energies of each residue type in an unfolded peptide.

Design operations are easiest to specify through a data file called a “resfile.” You can create a resfile for a given pdb file or pose using the following toolbox methods:


```
from pyrosetta.toolbox import generate_resfile_from_pdb
generate_resfile_from_pdb("1YY8.clean.pdb", "1YY8.resfile")
from pyrosetta.toolbox import generate_resfile_from_pose
generate_resfile_from_pose(pose, "1YY8.resfile")
```

In [15]:
### BEGIN SOLUTION
from pyrosetta.toolbox import generate_resfile_from_pdb
generate_resfile_from_pdb("1YY8.clean.pdb", "1YY8.resfile")
from pyrosetta.toolbox import generate_resfile_from_pose
generate_resfile_from_pose(pose, "1YY8.resfile")
### END SOLUTION

[0mcore.import_pose.import_pose: [0mFile '1YY8.clean.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: [0mFound disulfide between residues 23 88
[0mcore.conformation.Conformation: [0mcurrent variant for 23 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 88 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 23 CYD
[0mcore.conformation.Conformation: [0mcurrent variant for 88 CYD
[0mcore.conformation.Conformation: [0mFound disulfide between residues 134 194
[0mcore.conformation.Conformation: [0mcurrent variant for 134 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 194 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 134 CYD
[0mcore.conformation.Conformation: [0mcurrent variant for 194 CYD
[0mcore.conformation.Conformation: [0mFound disulfide between residues 235 308
[0mcore.conformation.Conformation: [0mcurrent variant for 235 CYS
[0mcore.conformation.Conformation: [0mcurrent 

Inside the resfile you will see a list of all residues and their chain with NATRO next to that, indicating that the position is set to use the native rotamer. NATRO can be changed to any of the following with a text editor:

```
NATRO		use native amino acid and native rotamer (does not repack)
NATAA		use native amino acid but allow repacking to other rotamers
PIKAA ILV	use only the following amino acids (ILV) and allow repacking between them
ALLAA		use all amino acids and all repacking
```

### In your terminal:


Edit the resfile to force residue 49 to be glutamic acid (`49 A PIKAA E`) and ensure all other residues cannot be redesigned (change `NATAA` to `NATRO`). Save the file as `1YY8-K49E.resfile`.

### Back here in the notebook:

Create a new task for design from the resfile:
```
from pyrosetta.rosetta.core.pack.task import TaskFactory, parse_resfile
task_design = TaskFactory.create_packer_task(pose)
parse_resfile(pose, task_design, "1YY8-K49E.resfile")
```

In [16]:
### BEGIN SOLUTION
from pyrosetta.rosetta.core.pack.task import TaskFactory, parse_resfile
task_design = TaskFactory.create_packer_task(pose)
parse_resfile(pose, task_design, "1YY8-K49E.resfile")
### END SOLUTION

RuntimeError: 

File: /Volumes/MacintoshHD3/benchmark/W.fujii/rosetta.Fujii/_commits_/main/source/src/core/pack/task/ResfileReader.cc:1487
[ ERROR ] UtilityExitException
ERROR: Cannot open file 1YY8-K49E.resfile



Score the original `start_pose` conformation from the 1YY8 pdb for reference. Create a new `PackResiduesMover` with the `task_design` and use it to mutate residue 49 to glutamic acid. Confirm you mutated the residue by printing residue 49.

__Question:__ What is the predicted Δ*G* of the mutation? Is this a stabilizing mutation?

```
pose.assign(start_pose)
pack_mover = PackRotamersMover(scorefxn, task_design)
print(pose.residue(49))
pack_mover.apply(pose)
print(pose.residue(49))
print(scorefxn(pose) - scorefxn(start_pose))
```

In [None]:
### BEGIN SOLUTION
pose.assign(start_pose)
pack_mover = PackRotamersMover(scorefxn, task_design)
print(pose.residue(49))
pack_mover.apply(pose)
print(pose.residue(49))
print(scorefxn(pose) - scorefxn(start_pose))
### END SOLUTION

Residue 49: LYS (LYS, K):
Base: LYS
 Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA
 Variant types:
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H    HA 
 Side-chain atoms:  CB   CG   CD   CE   NZ  1HB  2HB  1HG  2HG  1HD  2HD  1HE  2HE  1HZ  2HZ  3HZ 
Atom Coordinates:
   N  : 32.097, 27.128, 7.923
   CA : 31.663, 28.176, 7.007
   C  : 30.939, 27.471, 5.878
   O  : 31.191, 26.303, 5.597
   CB : 32.852, 28.971, 6.449
   CG : 33.743, 28.165, 5.512
   CD : 35.058, 28.866, 5.167
   CE : 35.795, 28.002, 4.134
   NZ : 37.148, 28.45, 3.923
   H  : 32.6902, 26.3872, 7.57742
   HA : 31.0202, 28.8679, 7.55234
  1HB : 32.4846, 29.8416, 5.90508
  2HB : 33.4654, 29.335, 7.27346
  1HG : 33.9859, 27.2076, 5.97453
  2HG : 33.2119, 27.9736, 4.58014
  1HD : 34.8472, 29.8571, 4.76287
  2HD : 35.6568, 28.9812, 6.07042
  1HE : 35.8169, 26.9678, 4.47438
  2HE : 35.2623, 28.0373, 3.18377
  1HZ : 37.5965, 27.8576,

Note the residue reference energy term (`ref`) in the scoring function.

__Question:__ What is this value before and after you mutated the residue? What does this energy represent?

### In your terminal:

Create a new resfile that allows residue 49 to be designed freely (`49 A ALLAA`) and call it `1YY8-K49All.resfile`.

### Back here in the notebook:

Create a new `PackerTask` and `PackRotamersMover` and apply it.

__Question:__ What residue does Rosetta choose? Why?

```
pose.assign(start_pose)
task_design_all = TaskFactory.create_packer_task(pose)
parse_resfile(pose, task_design_all, "1YY8-K49All.resfile")
pack_mover_all = PackRotamersMover(scorefxn, task_design_all)
pack_mover_all.apply(pose)
print(pose.residue(49))
```

Create your own resfile that will restrict residue 49 to only negatively charged residues using the resfile line `49 A PIKAA DE` and re-apply the design mover.

__Question:__ Now what residue is chosen? What is the new residue energy, and why (physically) is it less favorable than the last design?

Let’s try to make this design more favorable. Select several surrounding residues for design, and set them also to enable mutations to any residue. Call the design mover again.

__Question:__ Now what do you find?

It should be noted that PyRosetta includes a handy toolbox method mutate_residue() that will change a specified residue in a given pose into another. However, the rotamer of this new residue will not be optimized. For example:

```
from pyrosetta.toolbox import mutate_residue
pose.assign(start_pose)
print(pose.residue(49))
mutate_residue(pose, 49, 'E')
print(pose.residue(49))
```

### Programming Exercises


- *Refinement and discrimination*. Download the “single misfold” decoy set from the Decoys ’R Us repository at dd.compbio.washington.edu/ddownload.cgi?misfold. (Documentation for this project is at dd.compbio.washington.edu.) This repository has a single “correct” and “incorrect” predicted structure for several proteins. For this exercise, analyze pdbs 2CI2 and 2CRO; each has two “incorrect” structures offered. (Technical note: These decoys have an empty occupancy field in the PDB *ATOM* records; a value of 1 needs to be added before Rosetta will load these structures.)

    Write a program that will calculate and output the score for each decoy (i) as is from the PDB file, (ii) after packing only, (iii) after minimization only, and (iv) after packing and minimizing. For each of the four cases, compare the scores of the “correct” structure with those of the “incorrect” structure. Which schemes successfully discriminate the correct structures?


- Write a refinement protocol that will iterate between side-chain packing, small and shear moves, and minimization. Where is the best place to position the Monte Carlo acceptance test? Test your protocol by making 10 independently-refined structures for the correct and incorrect decoys of 2CRO from the Decoys ’R Us single misfold set. Is this protocol able to discriminate the correct decoy? Submit your code.


- HIV-1 protease is a major drug target for antiretroviral therapies. Protease inhibitors are designed from substrate peptide mimics. We will attempt to take a natural substrate peptide of HIV-1 protease and design it for improved binding — potentially to serve as a good template for drug design. Use PDB file 1KJG for the following analysis.
    
    
    - Turn on side-chain packing for the protease active site (residues 8, 23, 25, 29, 30, 32, 45, 47, 50, 53, 82, and 84 of both chains A and B) and for the peptide (residues 2–9 on chain P; all of these numbers follow the PDB numbering).


    - Repack the above side chains and then energy minimize those same side chains with the backbone fixed. Generate 10 decoys and record the energies for each decoy. This will represent the reference state: the wild-type peptide bound to the protease.


    - For residue 2 of the peptide (chain P), allow repacking to any of the 20 amino acid residues, while leaving the packing and side-chain minimization the same as in step b. Generate 10 decoys and record the energies. These will represent single mutants at that residue position.


    - Repeat step c for each of the other 8 residues in the substrate peptide.
    
    
    - Take the lowest energy for each mutation position and compare it to the lowest energy for the wild type. Do single mutants at any of these positions improve the energy over the wild type? Which ones? By how much? Which energy components are mostly responsible?
    

    - Which peptide residue positions are easiest to improve? Which positions are the hardest?


    - Are there any other trends? Hydrophobic vs. polar, bulky residues vs. small residues, etc.?


    - Altman et al. (Proteins 2008) found, using their own computational design algorithm, that the most favorable sequences were a triple mutant E3D/T4I/V6L, a single mutant T4V, and a single mutant E3Q. How do their results compare with yours?


    - Natural substrates are often sub-optimal binders. Why would this be advantageous?


- Effect of backbone conformation on design. HIV-1 protease is promiscuous, meaning it can cleave a wide range of peptides beyond the ten natural substrates of the virus. Let’s examine the preferences of the enzyme through Rosetta design calculations.

    - Download HIV-1 protease in complex with CA-P2 peptide (1F7A). Select the eight peptide residues for unrestricted design and let Rosetta redesign the substrate sequence. What is the new sequence and how does it compare to the original? What percent of the original sequence was optimal for its structure?


    - Download HIV-1 protease in complex with RT-RH peptide (1KJG). (Note that the enzyme is the same here, but it is crystallized with a different substrate.) Again, design the eight substrate residues with Rosetta. What percent of this substrate sequence is optimal for this crystal structure? ____%


    - How do the designed sequences of (a) and (b) compare? Why should they be the same? Why would they not be the same? What are the implications for the field of computational protein design?


- Write a program which iterates between design of all residues of a protein and refinement via small, shear, and minimization moves.


### Thought Question

What is the thermodynamic meaning of the ref energy term, and what does it correspond to physically?
During evolution, the genome sequence may mutate to cause protein sequence changes. Alternately, one could consider the difference in evolutionary propensities for each residue type. How could you derive reference energies from sequence data, and what would that mean? 


How do Kuhlman & Baker fit the reference energies in their 2000 PNAS paper?


### References


- S. C. Lovell et al., “The penultimate rotamer library,” Proteins 40, 389-408 (2000).


- R. L. Dunbrack & F. E. Cohen, “Bayesian statistical analysis of protein side-chain rotamer preferences,” Protein Sci. 6, 1661-1681 (1997)