<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

<!--NAVIGATION-->
< [Side-Chain Packing](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Side-chain-packing.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Docking](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/07.00-Docking.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>

# Design
Keywords: generate_resfile_from_pdb(), generate_resfile_from_pose(), create_packer_task(), mutate_residue()

In [25]:
# Mounting Google Drive and add it to Python sys path

##google_drive_mount_point = '/content/google_drive'

import os, sys, time
##from google.colab import drive
##drive.mount(google_drive_mount_point)

##google_drive = google_drive_mount_point + '/My Drive'
##google_drive_prefix = google_drive + '/prefix'

##if not os.path.isdir(google_drive_prefix): os.mkdir(google_drive_prefix)

##pyrosetta_install_prefix_path = '/content/prefix'
##if os.path.islink(pyrosetta_install_prefix_path): o##s.unlink(pyrosetta_install_prefix_path)
##os.symlink(google_drive_prefix, pyrosetta_install_prefix_path)


##for e in os.listdir(pyrosetta_install_prefix_path): ##sys.path.append(pyrosetta_install_prefix_path + '/' + e)

In [26]:
# From previous section:

from pyrosetta import *
from pyrosetta.teaching import *
pyrosetta.init()



PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.31+release.9a323bc72ca18d3abdc8b1a730b37e52197e4ceb 2019-07-29T16:16:04] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: [0mRosetta version: PyRosetta4.Release.python36.mac r229 2019.31+release.9a323bc72ca 9a323bc72ca18d3abdc8b1a730b37e52197e4ceb http://www.pyrosetta.org 2019-07-29T16:16:04
[0mcore.init: [0mcommand: PyRosetta -ex1 -ex2aro -database /Users/jack/PyRosetta/pyrosetta/database
[0mbasic.random.init_random_generator: [0m'RNG device' seed mode, using '/dev/urandom', seed=896538787 seed_offset=0 real_seed=896538787
[0mbasic.random.init_random_generator: [0mRandomGenerator:init: Normal mode, seed=896538787 RG_type=mt19937


TODO: paragraph motivating HBNet

We prepare for HBNet the same way that we prepare for packing. We setup the pose and score function as before...

In [27]:
pose = pose_from_pdb("hbnet_example.pdb")
start_pose = Pose()
start_pose.assign(pose)
scorefxn = get_fa_scorefxn()

[0mcore.import_pose.import_pose: [0mFile 'hbnet_example.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: [0mFound disulfide between residues 21 90
[0mcore.conformation.Conformation: [0mcurrent variant for 21 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 90 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 21 CYD
[0mcore.conformation.Conformation: [0mcurrent variant for 90 CYD
[0mcore.conformation.Conformation: [0mFound disulfide between residues 137 187
[0mcore.conformation.Conformation: [0mcurrent variant for 137 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 187 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 137 CYD
[0mcore.conformation.Conformation: [0mcurrent variant for 187 CYD
[0mcore.conformation.Conformation: [0mFound disulfide between residues 162 374
[0mcore.conformation.Conformation: [0mcurrent variant for 162 CYS
[0mcore.conformation.Conformation: [0mcurre

Just like before, you can edit the resfile to your own personal specifications. Alternatively, you can use task operations to automate the process. Let's use task operations to fix all residues not at the interface.

## Setting Designable Residues:

Create a new task for design


In [28]:
from pyrosetta.rosetta.core.select.residue_selector import InterGroupInterfaceByVectorSelector, ChainSelector, NotResidueSelector

chain1 = ChainSelector( "1" ) #selects the first chain
chain2 = ChainSelector( "2" ) #selects the second chain

interface_selector = InterGroupInterfaceByVectorSelector( chain1, chain2 );#selects residues at the interface
not_interface_selector = NotResidueSelector( interface_selector ); #selects residues not at the interface

from pyrosetta.rosetta.core.pack.task.operation import PreventRepackingRLT, RestrictToRepackingRLT, OperateOnResidueSubset

#prevent non interface residues from repacking/designing
fix_non_interface = OperateOnResidueSubset( PreventRepackingRLT(), not_interface_selector )

#perhaps we are performing one-sided design and do not want to make mutations on chain 2:
no_mutation_chain2 = OperateOnResidueSubset( RestrictToRepackingRLT(), chain2 )

from pyrosetta.rosetta.core.pack.task import TaskFactory
task_factory = TaskFactory()
task_factory.push_back( fix_non_interface )
task_factory.push_back( no_mutation_chain2 )

task_design = task_factory.create_task_and_apply_taskoperations( pose )
print( "Num residues: ", pose.size() )
print( "Num packable residues: ", task_design.num_to_be_packed() ) # this includes the ones being designed

num_designable = 0
for i in range( 1, pose.size() + 1 ):
    if( task_design.design_residue( i ) ):
        num_designable += 1;
print( "Num designable residues: ", num_designable )

Num residues:  454
Num packable residues:  116
Num designable residues:  53


## Running HBNet

This is an interface case so we will use HBNetStapleInterface.

In [31]:
from pyrosetta.rosetta.protocols.hbnet import HBNetStapleInterface

pose.assign(start_pose)
hbnet = HBNetStapleInterface()
hbnet.task_factory( task_factory )
#alternatively:
#hbnet.set_task( task_design )
hbnet.set_score_function( scorefxn )

#This is highly recommended, especially for large systems like asymmetric interfaces
#see PMID: 29652499
hbnet.set_monte_carlo_branch( True )

#We can normallly leave this as the default
#making it smaller now to let it run faster
hbnet.set_total_num_mc_runs( 1000 )

#This does two things:
#(1) speeds us up by decreasing the sample space
#(2) ensures that our final hbond network will be at least partially buried
hbnet.set_monte_carlo_seed_must_be_buried( True )

hbnet.apply(pose)
print( "Change in score", scorefxn(pose) - scorefxn(start_pose) )


[0mcore.scoring.ScoreFunctionFactory: [0mSCOREFUNCTION: [32mref2015[0m
[0mprotocols.hbnet.HBNet: [0m Creating packer task based on specified task operations...
[0mcore.select.residue_selector.LayerSelector: [0mSetting LayerSelector to use sidechain neighbors to determine burial.
[0mcore.select.residue_selector.LayerSelector: [0mSet cutoffs for core and surface to 5.2 and 2, respectively, in LayerSelector.
[0mcore.select.residue_selector.LayerSelector: [0mSetting core=true boundary=false surface=false in LayerSelector.
[0mcore.select.residue_selector.LayerSelector: [0mSet cutoffs for core and surface to 4.4 and 2, respectively, in LayerSelector.
[0mcore.select.residue_selector.LayerSelector: [0mSetting LayerSelector to use sidechain neighbors to determine burial.
[0mcore.select.residue_selector.LayerSelector: [0mSet cutoffs for core and surface to 5.2 and 2, respectively, in LayerSelector.
[0mcore.select.residue_selector.LayerSelector: [0mSetting core=false boundary=

Wait, my score is terrible.

__Question:__ Why?

## Finishing Design:

Well of course the score is terrible, the pose is covered in clases. We had 116 packable residues and only assigned states to 4 of them. The other 112 residues are still in their input conformations and likely clash with the 3 we just assigned.

We need to run the packer (either using PackRotamersMover or FastDesign) but we don't want to overwrite the residues we just assigned with HBNet. The trick here is to select the residues with "HBNet" labels and fix them.

In [32]:
from pyrosetta.rosetta.core.select.residue_selector import ResiduePDBInfoHasLabelSelector

#prevent hbnet residues from repacking/designing
hbnet_selector = ResiduePDBInfoHasLabelSelector( "HBNet" )
fix_hbnet = OperateOnResidueSubset( PreventRepackingRLT(), hbnet_selector )
task_factory.push_back( fix_hbnet ) #recycling the same factory as before, just adding a new operation
task_design2 = task_factory.create_task_and_apply_taskoperations( pose )

#sanity check
num_hbnet_residues = 0
for x in hbnet_selector.apply( pose ):
    if x:
        num_hbnet_residues += 1
print( "Num HBNet Residues", num_hbnet_residues )

#this is unrelated to the narrative but I highly recommend using the linear memory interaction graph whenever performing design. It's a huge speedup
#it does not seem to matter for the scope here, but it will when you start using extra chi sampling (-ex1, -ex2)
task_design2.or_linmem_ig( True )

from pyrosetta.rosetta.protocols.minimization_packing import PackRotamersMover
pack_mover = PackRotamersMover( scorefxn, task_design2 )
pack_mover.apply( pose )
print( "Change in score", scorefxn(pose) - scorefxn(start_pose) )

Num HBNet Residues 3
[0mcore.pack.pack_rotamers: [0mbuilt 12892 rotamers at 113 positions.
[0mcore.pack.interaction_graph.interaction_graph_factory: [0mInstantiating LinearMemoryInteractionGraph
Change in score -57.437722369426865


## We made it!
Whew! The change in score is finally negative. The main score function isn't the only way to evaluate these networks. HBNet also adds its own score terms:

In [36]:
from pyrosetta.rosetta.core.pose import hasPoseExtraScore, getPoseExtraScore
#TODO this doesn't work now, set store_network_scores_in_pose_ to true by default
if hasPoseExtraScore( pose, "HBNet_NumUnsatHpol" ):
    print( "HBNet_NumUnsatHpol", getPoseExtraScore( pose, "HBNet_NumUnsatHpol" ) )
    print( "HBNet_Saturation", getPoseExtraScore( pose, "HBNet_Saturation" ) )
    print( "HBNet_Score", getPoseExtraScore( pose, "HBNet_Score" ) )
else:
    print( "Somebody go bug Jack Maguire to enable this feature for PyRosetta" )

Somebody go bug Jack Maguire to enable this feature for PyRosetta


Let’s try to make this design more favorable. Select several surrounding residues for design, and set them also to enable mutations to any residue. Call the design mover again.

__Question:__ Now what do you find?

It should be noted that PyRosetta includes a handy toolbox method mutate_residue() that will change a specified residue in a given pose into another. However, the rotamer of this new residue will not be optimized. For example:

```
from pyrosetta.toolbox import mutate_residue
pose.assign(start_pose)
print(pose.residue(49))
mutate_residue(pose, 49, 'E')
print(pose.residue(49))
```

## Programming Exercises


- *Refinement and discrimination*. Download the “single misfold” decoy set from the Decoys ’R Us repository at dd.compbio.washington.edu/ddownload.cgi?misfold. (Documentation for this project is at dd.compbio.washington.edu.) This repository has a single “correct” and “incorrect” predicted structure for several proteins. For this exercise, analyze pdbs 2CI2 and 2CRO; each has two “incorrect” structures offered. (Technical note: These decoys have an empty occupancy field in the PDB *ATOM* records; a value of 1 needs to be added before Rosetta will load these structures.)

    Write a program that will calculate and output the score for each decoy (i) as is from the PDB file, (ii) after packing only, (iii) after minimization only, and (iv) after packing and minimizing. For each of the four cases, compare the scores of the “correct” structure with those of the “incorrect” structure. Which schemes successfully discriminate the correct structures?


- Write a refinement protocol that will iterate between side-chain packing, small and shear moves, and minimization. Where is the best place to position the Monte Carlo acceptance test? Test your protocol by making 10 independently-refined structures for the correct and incorrect decoys of 2CRO from the Decoys ’R Us single misfold set. Is this protocol able to discriminate the correct decoy? Submit your code.


- HIV-1 protease is a major drug target for antiretroviral therapies. Protease inhibitors are designed from substrate peptide mimics. We will attempt to take a natural substrate peptide of HIV-1 protease and design it for improved binding — potentially to serve as a good template for drug design. Use PDB file 1KJG for the following analysis.
    
    
    - Turn on side-chain packing for the protease active site (residues 8, 23, 25, 29, 30, 32, 45, 47, 50, 53, 82, and 84 of both chains A and B) and for the peptide (residues 2–9 on chain P; all of these numbers follow the PDB numbering).


    - Repack the above side chains and then energy minimize those same side chains with the backbone fixed. Generate 10 decoys and record the energies for each decoy. This will represent the reference state: the wild-type peptide bound to the protease.


    - For residue 2 of the peptide (chain P), allow repacking to any of the 20 amino acid residues, while leaving the packing and side-chain minimization the same as in step b. Generate 10 decoys and record the energies. These will represent single mutants at that residue position.


    - Repeat step c for each of the other 8 residues in the substrate peptide.
    
    
    - Take the lowest energy for each mutation position and compare it to the lowest energy for the wild type. Do single mutants at any of these positions improve the energy over the wild type? Which ones? By how much? Which energy components are mostly responsible?
    

    - Which peptide residue positions are easiest to improve? Which positions are the hardest?


    - Are there any other trends? Hydrophobic vs. polar, bulky residues vs. small residues, etc.?


    - Altman et al. (Proteins 2008) found, using their own computational design algorithm, that the most favorable sequences were a triple mutant E3D/T4I/V6L, a single mutant T4V, and a single mutant E3Q. How do their results compare with yours?


    - Natural substrates are often sub-optimal binders. Why would this be advantageous?


- Effect of backbone conformation on design. HIV-1 protease is promiscuous, meaning it can cleave a wide range of peptides beyond the ten natural substrates of the virus. Let’s examine the preferences of the enzyme through Rosetta design calculations.

    - Download HIV-1 protease in complex with CA-P2 peptide (1F7A). Select the eight peptide residues for unrestricted design and let Rosetta redesign the substrate sequence. What is the new sequence and how does it compare to the original? What percent of the original sequence was optimal for its structure?


    - Download HIV-1 protease in complex with RT-RH peptide (1KJG). (Note that the enzyme is the same here, but it is crystallized with a different substrate.) Again, design the eight substrate residues with Rosetta. What percent of this substrate sequence is optimal for this crystal structure? ____%


    - How do the designed sequences of (a) and (b) compare? Why should they be the same? Why would they not be the same? What are the implications for the field of computational protein design?


- Write a program which iterates between design of all residues of a protein and refinement via small, shear, and minimization moves.


## Thought Question

What is the thermodynamic meaning of the ref energy term, and what does it correspond to physically?
During evolution, the genome sequence may mutate to cause protein sequence changes. Alternately, one could consider the difference in evolutionary propensities for each residue type. How could you derive reference energies from sequence data, and what would that mean? 


How do Kuhlman & Baker fit the reference energies in their 2000 PNAS paper?


## References


- S. C. Lovell et al., “The penultimate rotamer library,” Proteins 40, 389-408 (2000).


- R. L. Dunbrack & F. E. Cohen, “Bayesian statistical analysis of protein side-chain rotamer preferences,” Protein Sci. 6, 1661-1681 (1997)

<!--NAVIGATION-->
< [Side-Chain Packing](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Side-chain-packing.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Docking](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/07.00-Docking.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>