#Computational SSM

The code below uses the software Rosetta to sample all possible mutations at a given protein-protein interface, calculate a number of interface metrics values, and create a heatmap of the mutations.

<font color='grey' > Created by Parisa Hosseinzadeh for *Protein Engineering and Design*, Winter 2022

### Preparing Pyrosetta on your computer



1.   Create a folder in your google drive and name it **PyRosetta**
2.   Download `pyrosetta-2020.50.post0.dev0+970.commits.3700df14560-cp37-cp37m-linux_x86_64.whl` from your course files and put it in the PyRosetta folder.



In [None]:
#@title Mounting Google Drive
#@markdown Please execute this cell by pressing the _Play_ button 
#@markdown on the left. 


google_drive_mount_point = '/content/google_drive'

import os, sys, time

if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount(google_drive_mount_point)

if not os.getenv("DEBUG"):
    google_drive = google_drive_mount_point + '/My Drive' 

Mounted at /content/google_drive


In [None]:
#@title Installing PyRosetta
#@markdown Download pyrosetta. 
#@markdown This cell will take a few min to run.

%%time
if not os.getenv("DEBUG"):
    # installing PyRosetta
    if sys.version_info.major != 3 or sys.version_info.minor != 7:
        print('Need Python-3.7 to run!')
        sys.exit(1)

    # upload PyRosetta Linux WHEEL package into your google drive and put it into /PyRosetta dir
    # or alternatively you can download PyRosetta directly from GrayLab web site (but this might take some time!)
    #!mkdir $notebook_path/PyRosetta
    #!cd $notebook_path/PyRosetta && wget --user USERNAME --password PASSWORD https://graylab.jhu.edu/download/PyRosetta4/archive/release/PyRosetta4.Release.python37.ubuntu.wheel/latest.html   

    pyrosetta_distr_path = google_drive + '/PyRosetta' 
    
    # finding path to wheel package, if multiple packages is found take first one
    # replace this with `wheel_path = pyrosetta_distr_path + /<wheel-file-name>.whl` if you want to use particular whl file
    wheel_path = pyrosetta_distr_path + '/' + [ f for f in os.listdir(pyrosetta_distr_path) if f.endswith('.whl')][0]
    
    print(f'Using PyRosetta wheel package: {wheel_path}')

    !pip3 install '{wheel_path}' 

Using PyRosetta wheel package: /content/google_drive/My Drive/PyRosetta/pyrosetta-2020.50.post0.dev0+970.commits.3700df14560-cp37-cp37m-linux_x86_64.whl
Processing ./google_drive/My Drive/PyRosetta/pyrosetta-2020.50.post0.dev0+970.commits.3700df14560-cp37-cp37m-linux_x86_64.whl
Installing collected packages: pyrosetta
Successfully installed pyrosetta-2020.50.post0.dev0+970.commits.3700df14560
CPU times: user 668 ms, sys: 98.8 ms, total: 767 ms
Wall time: 1min 19s


In [None]:
#@title Importing necessary modules
#@markdown Run this cell to download 
#@markdown necessary modules to run the code.

#importing modules necessary for plotting
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#importing the initialization module
from pyrosetta import init
#importing pose and score function getters
from pyrosetta import (
    pose_from_file,
    get_score_function,
)
#residue selectors for selecting interface
from pyrosetta.rosetta.core.select.residue_selector import (
    NeighborhoodResidueSelector,
    ChainSelector,
)
#importing interface filters
from pyrosetta.rosetta.protocols.simple_filters import ShapeComplementarityFilter
from pyrosetta.rosetta.protocols.simple_ddg import DdgFilter
#importing mover for mutation
from pyrosetta.rosetta.protocols.simple_moves import MutateResidue
#importing necessary packages for packing

#required for visualization
!pip install -q py3Dmol
import py3Dmol

In [None]:
#@title Visualizing the interface 
#@markdown By running this cell, you will see the interface
#@markdown we're optimizing.

#@markdown Please make sure you added the file comp_ssm.pdb
#@markdown to your colab. You can do this by clicking on the
#@markdown folder icon on the left bar and drag-drop the file.

#@markdown Open the same pdb in pymol. Do they look the same?

p = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js')
p.addModel(open("comp_ssm.pdb",'r').read(),'pdb')
p.setStyle({'cartoon': {'color':'spectrum'}})
p.zoomTo()
p.show()

In [None]:
#@title Selecting the interface
#@markdown Running this cell will select the interface.
#@markdown we define the interface as all residues in Chain B
#@markdown that are within 5 Å of chain A.

#@markdown in your pympl window terminal, make the selection 
#@markdown by typing `sel interface, byres chain B within 5 of chain A`

#@markdown Check to see if the residues printed after running
#@markdown this cell are the same as those you find in pymol.

#initializing Rosetta. Necessary to start
init(extra_options='-mute all')
#reading in the pdb file
p = pose_from_file('comp_ssm.pdb')
#selecting chain B
chain = ChainSelector('A')
#selecting 5 Å neighbors of A
neighbors = NeighborhoodResidueSelector()
neighbors.set_distance(5)
neighbors.set_focus_selector(chain)
neighbors.set_include_focus_in_subset(False)
#making the selection
#subset is a vector of 0,1 where the selected residues are 1
subset = neighbors.apply(p)
#getting the residue IDs for the neighbors
neighbor_ids = []
#looping through all residues
#Rosetta pose numbering starts at 1
for resi in range(1,p.size()+1):
  #finding places where the subset is 1 (or true)
  if (subset[resi]):
    #adding that to the list
    neighbor_ids.append(resi)

print(
    '\n',
    '\n',
    'The selected residues at the interface are:', 
    neighbor_ids
    )

PyRosetta-4 2021 [Rosetta PyRosetta4.MinSizeRel.python37.ubuntu 2020.50.post.dev+970.commits.3700df145608444753aabbec9c4681ec9b21f74b 2021-02-24T13:24:53] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.

 
 The selected residues at the interface are: [119, 134, 135, 138, 142]


In [None]:
#@title Mutagenesis
#@markdown By running this cell, you will mutate
#@markdown all the residues at the interface to all 20 AAs
#@markdown and calculate the shape complementarity and 
#@markdown ∆∆G of binding for each mutant. 

#@markdown **Shape complementarity (sc)** shows how well the shape of the biner
#@markdown matches the surface of the target.

#@markdown **∆∆G (ddg)** is an approximate energy of binding.

#@markdown <font color='red'> This cell takes ~10 minutes to run.


#defining some functions
def get_sc(p):
  '''calculates shape complementarity of pose p.'''
  sc=ShapeComplementarityFilter()
  sc.jump_id(1)
  sc_value = sc.report_sm(p)

  return sc_value

def get_ddg(p, sfn):
  '''calculates ddg of binding for pose p given score sfn.'''
  ddg = DdgFilter(0,scorefxn,1)
  ddg.repack(1)
  ddg_value = ddg.report_sm(p)

  return ddg_value

# def pack(p, sfn, subset):
#   '''
#   Allow sidechain packing for residues in subset in pose p
#   given scorefxn sfn.
#   '''
#   task_pack = standard_packer_task(p)
#   task_pack.restrict_to_repacking()
#   task_pack.temporarily_fix_everything()
#   task_pack.temporarily_set_pack_residue(subset)
#   pack_mover = PackRotamersMover(sfn, task_pack)
#   pack_mover.apply(p)


#creating a list of amino acids
aa_list = [
           'ASP',
           'GLU',
           'LYS',
           'ARG',
           'HIS',
           'ASN',
           'GLN',
           'SER',
           'THR',
           'ALA',
           'VAL',
           'LEU',
           'ILE',
           'MET',
           'PHE',
           'TYR',
           'TRP',
           'CYS',
           'PRO',
           'GLY',
]
#creating a dictionary to store values
#for easy access, residue IDs are the keys to the dictionary
sc_dict = {i:[] for i in neighbor_ids}
ddg_dict = {i:[] for i in neighbor_ids}

#calculating the original metrics
##getting the score
scorefxn = get_score_function()
sc_value = get_sc(p)
ddg_value = get_ddg(p, scorefxn)

print (
    'The values of shape complementarity and ∆∆G for starting scaffold are:',
    '\n',
    sc_value,
    ' and ',
    ddg_value,
    ' , respectively.'
    )

#looping through all the neighborhood residues
for resi in neighbor_ids:
  print("\n", "\n", "I'm mutating residue ", resi)
  #creating a new pose to make sure the original is retained
  new_p = p.clone()
  #looping through all 20 options
  for resn in aa_list:
    #applying mutation
    mut=MutateResidue()
    mut.set_res_name(resn)
    mut.set_target(resi)
    mut.set_preserve_atom_coords(True)
    mut.apply(new_p)
    #repacking the pose after making the mutations
    # pack(new_p, scorefxn, resi)
    #calculating new sc and ddg
    sc_value = get_sc(new_p)
    ddg_value = get_ddg(new_p, scorefxn)    
    #adding to the dictionaries
    sc_dict[resi].append(sc_value)
    ddg_dict[resi].append(ddg_value)
  print('sc values for residue',
        resi,
        'are:',
        '\n',
        sc_dict[resi])
  print('ddg values for residue',
        resi,
        'are:',
        '\n',
        ddg_dict[resi])

#calculating best ddg and sc
#the closer sc is to 1 the better
#the lower the ddg is the better
all_ddgs = ddg_dict.values()
best_ddg = min(min(all_ddgs))
all_scs = sc_dict.values()
best_sc = max(max(all_scs))

print(
    '\n',
    'The best shape complementarity and ∆∆G values are:',
    '\n',
    best_sc,
    '\n',
    best_ddg,
    '\n',
    ' , respectively.'
    )

The values of shape complementarity and ∆∆G for starting scaffold are: 
 0.5738674104213715  and  -12.066414183336455  , respectively.

 
 I'm mutating residue  119
sc values for residue 119 are: 
 [0.5676352679729462, 0.5653285682201385, 0.567544549703598, 0.5674268305301666, 0.5666573345661163, 0.567043125629425, 0.5650521516799927, 0.5722849667072296, 0.5713453888893127, 0.5705859959125519, 0.5833808183670044, 0.574176162481308, 0.5718203783035278, 0.573744922876358, 0.5665565729141235, 0.5665565729141235, 0.5545288026332855, 0.5731486082077026, 0.5749220252037048, 0.5748777091503143]
ddg values for residue 119 are: 
 [-9.688405540533147, -9.217893819428479, -12.072807843787917, -11.144778024982202, -10.648242240384398, -10.874639457626408, -9.849752915995705, -10.359238502156, -5.954686203749635, -10.80727902389506, 0.5745287570174664, -9.690467653113714, 2.07710209827256, -12.281803708282204, -10.027366007131725, -10.499511995472595, -9.53049564090529, -10.442094748611414, -9.2325

In [1]:
#@title Plotting the results
#@markdown By running this cell, you will plot the results of
#@markdown your computational mutagenesis studies.

#@markdown The first plot shows shape complementarity values,
#@markdown in other words how well the shape of the biner
#@markdown matches the surface of the target.
#@markdown It's a number between (0,1) and 
#@markdown the closer the number is to 1, the better.
#@markdown You can see the mutants with better shape complementarity
#@markdown with darker colors in the plot.

#@markdown The second plot shows ∆∆G values,
#@markdown in other words it shows whether the binding
#@markdown is energetically favored.
#@markdown As with all energy terms, 
#@markdown the more negative the number, the better.
#@markdown You can see the mutants with better ∆∆G
#@markdown with darker colors in the plot.

#@markdown Look at your pymol screen. Does this
#@markdown correlate with what you think? 
#@markdown How many of these variants you identified rationally?

#creating the dataframe for generating plots
sc_df=pd.DataFrame.from_dict(sc_dict, 
                             orient='index',
                             columns=aa_list)
ddg_df=pd.DataFrame.from_dict(ddg_dict,
                              orient='index',
                              columns=aa_list)
#plotting heatmaps
fig, ax =plt.subplots(2,1)
fig.tight_layout(pad=3.0)
sns.heatmap(ddg_df, ax=ax[1], vmax=10)
cmap = sns.cm.rocket_r
sns.heatmap(sc_df, ax=ax[0], vmin=0.55, cmap=cmap)
fig.show()

NameError: ignored