# Visualization of electrostatics

In this lab, we will visualize the electrostatic potential from APBS.

#Part 0 – Downloading and Installing the required software

Before we start, you must first **remember to start the hosted runtime in Google Colab**.

Then, we must install **py3Dmol**.

In [1]:
try:
  import py3Dmol
except:
  !pip install py3Dmol
  import py3Dmol

Collecting py3Dmol
  Downloading py3Dmol-1.8.0-py2.py3-none-any.whl (6.4 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-1.8.0


# Part I - Visualizing protonation

In [None]:
wget -L https://raw.githubusercontent.com/Career-Karma-Tutorials/ck-git/master/main/app.py

In [None]:
view = py3Dmol.view()
view.setBackgroundColor('white')
view.addModel(open('1wofa.pdb', 'r').read(),'pdb')
view.addModel(open('5rh2_aligned.pdb', 'r').read(),'pdb')
view.setStyle({'model':0}, {'cartoon': {'color':'purple'}})
view.setStyle({'model':1}, {'cartoon': {'color':'yellow'}})
view.zoomTo()
view.show()

Finally, compare side chain positions. How do they compare?

In [None]:
view = py3Dmol.view()
view.setBackgroundColor('white')
view.addModel(open('1wofa.pdb', 'r').read(),'pdb')
view.addModel(open('5rh2_aligned.pdb', 'r').read(),'pdb')
view.setStyle({'and':[{'model':0},{'chain':'A'}]}, {'cartoon': {'color':'purple'}})
view.addStyle({'and':[{'model':0},{'chain':'A'}]}, {'stick': {'colorscheme':'purpleCarbon'}})
view.setStyle({'model':1}, {'cartoon': {'color':'yellow'}})
view.addStyle({'model':1}, {'stick': {'colorscheme':'yellowCarbon'}})
view.zoomTo()
view.show()

MPro is actually a dimer. For simility we have only shown one of the monomers so far. Now let's look more closely at the side chain positions. Do you notice any patterns with side chains variations 

* on the surface 
* in the interior of the protein
* at the interface between subunits?

In [None]:
view = py3Dmol.view()
view.setBackgroundColor('white')
view.addModel(open('1wof.pdb', 'r').read(),'pdb')
view.addModel(open('5rh2_aligned.pdb', 'r').read(),'pdb')
view.setStyle({'and':[{'model':0},{'chain':'A'}]}, {'cartoon': {'color':'purple'}})
view.addStyle({'and':[{'model':0},{'chain':'A'}]}, {'stick': {'colorscheme':'purpleCarbon'}})
view.setStyle({'and':[{'model':0},{'chain':'B'}]}, {'cartoon': {'color':'grey'}})
view.setStyle({'model':1}, {'cartoon': {'color':'yellow'}})
view.addStyle({'model':1}, {'stick': {'colorscheme':'yellowCarbon'}})
view.zoomTo()
view.show()

# Part II - Template selection and sequence alignment

1. Let's move on to modeling the bat coronavirus MPro. The first step to structure prediction is to obtain the sequence that you want to model. We have obtained the amino acid sequence from [UniProt](https://www.uniprot.org/blast/?about=P0C6W6[3239-3544]&key=Chain&id=PRO_0000043082). Now lets store the sequence in a file.

In [None]:
%cd /content/
%mkdir modeller
%cd modeller

In [None]:
# sp|P0C6W6|3239-3544
F = open('target.fasta','w')
F.write(""">target
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDTVYCPRHVICTAEDMLNPNYEDLLIR
KSNHSFLVQAGNVQLRVIGHSMQNCLLRLKVDTSNPKTPKYKFVRIQPGQTFSVLACYNG
SPSGVYQCAMRPNHTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTEVHAGTDLEGK
FYGPFVDRQTAQAAGTDTTITLNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYE
PLTQDHVDILGPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILEDEFTPFDVVRQC
SGVTFQ""")
F.close()

Selecting an appropriate template for modeling a structure of a homologous protein is as crucial as an appropriate alignment to correctly position the different residues.

**QUESTION:** What features of a crystal structure do you think are important for choosing the best template?

2. We will find the potential best templates from the whole PDB database to model the structure of our target protein using the **profile.build()** command from MODELLER.

  For this purpose, we need a text file containing a list of non-redundant PDB sequences at 95% sequence identity and an appropriate script for running MODELLER.

In [None]:
#Downloading pdb_95.pir
!wget https://salilab.org/modeller/downloads/pdb_95.pir.gz
!gunzip pdb_95.pir.gz
#Downloading the build_profile.py script from GitHub
!wget https://raw.githubusercontent.com/pb3lab/ibm3202/master/scripts/build_profile.py

In [None]:
#Running the build_profile script
!mod10.1 build_profile.py
#Printing only the list of potential templates
!sed -n '/HITS FOUND IN ITERATION:     1/,/Weight Matrix/p;/Weight Matrix/q' build_profile.log

In this particular example, a BLOSUM62 similarity matrix is being used for determining the sequence identity between target and potential templates. Also, we are employing only one search iteration and the parameter max_aln_evalue is set to 0.01, indicating that only sequences with e-values smaller than or equal to 0.01 will be included in the final profile.

For simplicity, we just printed out the PDB table from the resulting log file generated during this analysis.

As you can see, several PDB files are indicated. The important columns to determine the best templates from this analysis are the fifth, seventh and eight columns, which correspond to the sequence length of the target protein, the sequence identity and the e-value, respectively.

**QUESTION:** From this analysis, which template would be better for modeling the structure of our target sequence?

In [None]:
# --> Enter information for the template below
template_pdb = ''
template_chain = ''

# Downloads and decompresses
prody.parsePDB(template_pdb)
!gunzip {template_pdb}.pdb.gz

3. Now, we will **align the sequence of our template protein with the sequence of our target protein**, such that we can model the structure.

In [None]:
F = open('align2D.py', 'w')
F.write(f"""from modeller import *

env = environ()
aln = alignment(env)
mdl = model(env, file='{template_pdb}', model_segment=('FIRST:{template_chain}','LAST:{template_chain}'))
aln.append_model(mdl, align_codes='{template_pdb}{template_chain}', atom_files='{template_pdb}.pdb')
aln.append(file='target.fasta', align_codes='target', alignment_format='FASTA')
aln.align2d()
aln.write(file='aligned.fasta', alignment_format='FASTA')
aln.write(file='aligned.ali', alignment_format='PIR')
aln.write(file='aligned.pap', alignment_format='PAP')
""")
F.close()

In [None]:
!mod10.1 align2D.py

4. You will end up with two new files (aligned.ali and aligned.fasta) that contain the pairwise alignment of the target and template sequences. Load the FASTA file into [Alignment Viewer 2.0](https://fast.alignmentviewer.org/). You can also use our Colab-mounted MSA viewer below:

In [None]:
#@title Protein MSA Viewer in Google Colab
#The following code is modified from the wonderful viewer developed by Damien Farrell
#https://dmnfarrell.github.io/bioinformatics/bokeh-sequence-aligner

#Importing all modules first
import os, io, random
import string
import numpy as np

from Bio.Seq import Seq
from Bio.Align import MultipleSeqAlignment
from Bio import AlignIO, SeqIO

import panel as pn
import panel.widgets as pnw
pn.extension()

from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, Plot, Grid, Range1d
from bokeh.models.glyphs import Text, Rect
from bokeh.layouts import gridplot

#Setting up the amino color code according to Zappo color scheme
def get_colors(seqs):
    #make colors for bases in sequence
    text = [i for s in list(seqs) for i in s]
    #Use Zappo color scheme
    clrs =  {'K':'red',
             'R':'red',
             'H':'red',             
             'D':'green',
             'E':'green',
             'Q':'blue',
             'N':'blue',
             'S':'blue',
             'T':'blue',
             'A':'blue',
             'I':'blue',
             'L':'blue',
             'M':'blue',
             'V':'blue',
             'F':'orange',
             'Y':'orange',
             'W':'orange',
             'C':'blue',
             'P':'yellow',
             'G':'orange',
             '-':'white'}
    colors = [clrs[i] for i in text]
    return colors

#Setting up the MSA viewer
def view_alignment(aln, fontsize="9pt", plot_width=800):
    """Bokeh sequence alignment view"""

    #make sequence and id lists from the aln object
    seqs = [rec.seq for rec in (aln)]
    ids = [rec.id for rec in aln]    
    text = [i for s in list(seqs) for i in s]
    colors = get_colors(seqs)    
    N = len(seqs[0])
    S = len(seqs)    
    width = .4

    x = np.arange(1,N+1)
    y = np.arange(0,S,1)
    #creates a 2D grid of coords from the 1D arrays
    xx, yy = np.meshgrid(x, y)
    #flattens the arrays
    gx = xx.ravel()
    gy = yy.flatten()
    #use recty for rect coords with an offset
    recty = gy+.5
    h= 1/S
    #now we can create the ColumnDataSource with all the arrays
    source = ColumnDataSource(dict(x=gx, y=gy, recty=recty, text=text, colors=colors))
    plot_height = len(seqs)*15+50
    x_range = Range1d(0,N+1, bounds='auto')
    if N>100:
        viewlen=100
    else:
        viewlen=N
    #view_range is for the close up view
    view_range = (0,viewlen)
    tools="xpan, xwheel_zoom, reset, save"

    #entire sequence view (no text, with zoom)
    p = figure(title=None, plot_width= plot_width, plot_height=50,
               x_range=x_range, y_range=(0,S), tools=tools,
               min_border=0, toolbar_location='below')
    rects = Rect(x="x", y="recty",  width=1, height=1, fill_color="colors",
                 line_color=None, fill_alpha=0.6)
    p.add_glyph(source, rects)
    p.yaxis.visible = False
    p.grid.visible = False  

    #sequence text view with ability to scroll along x axis
    p1 = figure(title=None, plot_width=plot_width, plot_height=plot_height,
                x_range=view_range, y_range=ids, tools="xpan,reset",
                min_border=0, toolbar_location='below')#, lod_factor=1)          
    glyph = Text(x="x", y="y", text="text", text_align='center',text_color="black",
                text_font="monospace",text_font_size=fontsize)
    rects = Rect(x="x", y="recty",  width=1, height=1, fill_color="colors",
                line_color=None, fill_alpha=0.4)
    p1.add_glyph(source, glyph)
    p1.add_glyph(source, rects)

    p1.grid.visible = False
    p1.xaxis.major_label_text_font_style = "bold"
    p1.yaxis.minor_tick_line_width = 0
    p1.yaxis.major_tick_line_width = 0

    p = gridplot([[p],[p1]], toolbar_location='below')
    return p

#Loading the viewer by indicating the MSA file and format to read
#@markdown Name of the MSA file (including the filetype)
MSAfile = 'aligned.fasta' #@param {type:"string"}
MSAformat = 'fasta' #@param {type:"string"}
aln = AlignIO.read(MSAfile,MSAformat)
p = view_alignment(aln, plot_width=900)
pn.pane.Bokeh(p)

# Part III - Generate and visualize a comparative model using MODELLER

1. Once your target and template sequences are aligned, use the **model-single.py** script for finally obtaining a structure of your target through comparative modeling. Again, read the script and check how the sequences and structures are called in MODELLER through these scripts. In this case, we are also performing this step on a separate folder.

  Please note that 1 model is not enough, as there is an energy function defining the optimal position of atomic coordinates, thus different models will have different energies. Generally, between 50-100 are generated for sufficient evaluation.

**💡 HINT:** For our example, the generation of 50 models takes around 15 min on Google Colab, whereas 10 models are generated in about 3 min. You can edit the number of models to generate on the `model-single.py` script.

In [None]:
%cd /content/
%mkdir modeller
%cd modeller

In [None]:
F = open('model-single.py', 'w')

F.write(f"""from modeller import *
from modeller.automodel import *
#from modeller import soap_protein_od

env = environ()
a = automodel(env, alnfile='aligned.ali',
              knowns='{template_pdb}{template_chain}', sequence='target',
              assess_methods=(assess.DOPE,
                              #soap_protein_od.Scorer(),
                              assess.GA341))
a.starting_model = 1
a.ending_model = 50
a.make()

# Get a list of all successfully built models from a.outputs
ok_models = filter(lambda x: x['failure'] is None, a.outputs)

# Rank the models by DOPE score
key = 'DOPE score'
ok_models.sort(lambda a,b: cmp(a[key], b[key]))

# Get top model
m = ok_models[0]
print "Top model: %s (DOPE score %.3f)" % (m['name'], m[key])
""")
F.close()

In [None]:
#Running the model-single script
!mod10.1 model-single.py

In [None]:
# Shows the end of the log file, which includes the best model
!tail model-single.log

2. The output from this process is a bunch of PDB files, each one of them corresponding to a comparative model of our target protein, that are numbered from 1 up to the total number of models requested during comparative modeling.

  Also, the **model-single.log** output has the total potential energy for each structure,according to MODELLER’s DOPE (discrete optimized protein energy) score. For simplicity, this script was modified to indicate the model with the best DOPE score. We will be working only with the model with the best score for the remainder of the session.
  
  As an example, our best model during preparation of this tutorial showed the following DOPE score:

```
Top model: target.B99990021.pdb (DOPE score -35317.168)
Total CPU time [seconds]                                 :     885.31
```

3. Before we check the quality of our model, we will take a look at it on **py3Dmol**.

**💡 HINT:** We are creating a copy of our model and changing the chain id, to make it easier to load both structures into py3Dmol.

In [None]:
# --> Modify this for your best model and chain
# Copying our best model with a new chain id
!sed "s/ A / B /g" target.B99990003.pdb > bestmodel.pdb

In [None]:
#Setting up py3Dmol for visualization
view=py3Dmol.view()
#Loading template
view.addModel(open(f'{template_pdb}.pdb', 'r').read(),'pdb')
#Loading best DOPE score model
view.addModel(open('bestmodel.pdb', 'r').read(),'pdb')
#Coloring the structures by chain id
view.setStyle({'model':0}, {'cartoon': {'color':'purple'}})
view.setStyle({'model':1}, {'cartoon': {'color':'yellow'}})
view.zoomTo()
view.setBackgroundColor('white')
view.show()

4. Finally, to check the stereochemical quality of the model and its comparison to experimentally solved structures, we will use the [SAVES server](https://saves.mbi.ucla.edu), which employs several structure-based scoring strategies:

* **VERIFY3D** (i.e. compatibility of an atomic 3D model to its 1D sequence when compared tothe energetics of good structures from the PDB).
* **ERRAT** (i.e. quality of non-bonded interactions of a region when compared to similar regions from highly refined structures).
* **PROCHECK** (stereochemical and geometrical quality of the model, via Ramachandran plots, sidechain rotamers, etc).

5. Download your best model, upload it to SAVES and wait for the results. Briefly:
- **Check the VERIFY3D results:** >80% of the residues should have an average score ≥ 0.2, whereas the score profile allows you to identify conflicting regions.
- **Check the Ramachandran plot:** Are there any residues outside the allowed regions? What types of residues are found within those regions? (Check it by clicking on each dot in the plot)
- **Check the errors in PROCHECK:** are the errors located within the loop regions?



<figure>
<center>
<img src='https://raw.githubusercontent.com/pb3lab/ibm3202/master/images/cm_04.png'/>
</center>
</figure>

[This article](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007449) contains more recommendations on comparative modelling of protein structures.

# Part IV - Modeling a complete dimer

In Part III, we built a model of a single subunit of MPro. In this part, we will build a model of the dimer by aligning two copies of the monomer to the template structure.

In [None]:
%cd /content/
%mkdir modeller
%cd modeller

First, we align the model to each chain from the template.

In [None]:
PDB_model = prody.parsePDB('bestmodel.pdb')
PDB_template = prody.parsePDB(f'{template_pdb}.pdb')

PDB_model_prot = PDB_model.select('protein')
PDB_model_template = PDB_template.select('protein')

matches = \
  prody.matchChains(PDB_model_prot, PDB_model_template)

for match in matches:
  map_model_template, map_template_model, seqid, overlap = match
  chain = map_template_model.getChids()[0]
  transformation = prody.calcTransformation(map_model_template, map_template_model)
  inv_transformation = prody.calcTransformation(map_template_model, map_model_template)
  PDB_model.setChids([chain for i in range(len(PDB_model.getChids()))])
  prody.applyTransformation(transformation, PDB_model)
  prody.writePDB(f'bestmodel_aligned_to_{template_pdb}{chain}.pdb', PDB_model)
  prody.applyTransformation(inv_transformation, PDB_model)

Next, we join the files together.

In [None]:
dat = ''
for chain in ['A','B']:
  with open(f'bestmodel_aligned_to_{template_pdb}{chain}.pdb', 'r') as F:
    dat += F.read()
with open(f'bestmodel_aligned.pdb', 'w') as F:
  F.write(dat)

Finally, we visualize the model and the template.

In [None]:
#Setting up py3Dmol for visualization
view=py3Dmol.view()
#Loading template
view.addModel(open(f'{template_pdb}.pdb', 'r').read(),'pdb')
view.addModel(open(f'bestmodel_aligned.pdb', 'r').read(),'pdb')
#Coloring the structures by model id
view.setStyle({'model':0}, {'cartoon': {'color':'purple'}})
view.setStyle({'model':1}, {'cartoon': {'color':'yellow'}})
view.zoomTo()
view.setBackgroundColor('white')
view.show()

Be sure to download `bestmodel_aligned.pdb`. We will use it in future computer labs.

**This is the end of the fourth tutorial! Good Science!!**