# Structural alignment of proteins by sequence

In the field of Bioinformatics is usual to perform a BLAST search in order to find
similar proteins to the one of your interest. If the structures of some of the results
are available, you might want to align them all together to see the differences. 
These structures might not have the same number of residues, or the exact same residues,
or even the labeling of the atoms can be different, which makes a typical alignment function fail.
However, HTMD provides the function ```htmd.molecule.util.sequenceStructureAlignment```, which takes two proteins and
aligns both structures using their longest **sequence** aligment.

In this example, we will use Dopamine receptor (PDB code: '3PBL') and Beta Adrenergic recptor (PDB code: '3NYA').
Both are GPCR proteins, so they share a great fraction of their sequence. We will use this feature to align their structures.

## Quick Example

Adrenaline receptor will be used as the reference

In [1]:
from htmd.ui import *
from htmd.molecule.util import sequenceStructureAlignment
config(viewer='ngl')

#We will use adrenaline receptor as the reference/template

#Load dopamine receptor
dop_receptor = Molecule('3PBL.pdb')
dop_receptor.filter('protein and chain A',_logger=False) #the crystal is a dimer, discard one of the units
dop_receptor.set('segid','DOP',sel='all') #identify the protein with a segid to visualize it 

#Load adrenaline receptor
adr_receptor = Molecule('3NYA.pdb')
adr_receptor.filter('protein')
adr_receptor.filter('resid 0 to 342',_logger=False) # filter out the G-protein
adr_receptor.set('segid','ADR',sel='all') #identify the protein with a segid

#adr_receptor acts as the template
dop_receptor_results = sequenceStructureAlignment(dop_receptor,adr_receptor) 

aligned_proteins = dop_receptor_results[0].copy() #pick the best result

#see the result, dopamine receptor is displayed in blue
aligned_proteins.append(adr_receptor)
aligned_proteins.view(style='NewCartoon',sel='segid ADR',hold=True)
aligned_proteins.view(style='NewCartoon',sel='segid DOP',color='blue')

ffevaluate module is in beta version

Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. 
https://dx.doi.org/10.1021/acs.jctc.6b00049
Documentation: http://software.acellera.com/
To update: conda update htmd -c acellera -c psi4

You are on the latest HTMD version (unpackaged : /shared/alejandro/software/latest_htmd/htmd).



2018-07-25 12:33:00,336 - htmd.molecule.molecule - INFO - Removed 81 atoms. 3527 atoms remaining in the molecule.
2018-07-25 12:33:01,053 - htmd.molecule.util - INFO - Alignment #0 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:33:01,081 - htmd.molecule.util - INFO - Alignment #1 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:33:01,107 - htmd.molecule.util - INFO - Alignment #2 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:33:01,133 - htmd.molecule.util - INFO - Alignment #3 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:33:01,159 - htmd.molecule.util - INFO - Alignment #4 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:33:01,185 - htmd.molecule.util - INFO - Alignment #5 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:33:01,212 - htmd.molecule.util - INFO - Alignment #6 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:33:01,238 - htmd.molecule.util - INFO - Alignment #7 

A Jupyter Widget

## Detailed Explanation

First, we load the dopamine receptor. The crystal is a dimer of two receptors, so we discard one of the units.

In [2]:
from htmd.ui import *
from htmd.molecule.util import sequenceStructureAlignment
config(viewer='ngl')

dop_receptor = Molecule('3PBL.pdb')
dop_receptor.filter('protein and chain A') #discard one of the units
dop_receptor.set('segid','DOP',sel='all') #identify the protein with a segid

2018-07-25 12:33:52,934 - htmd.molecule.molecule - INFO - Removed 3398 atoms. 3389 atoms remaining in the molecule.


Next, we load the beta adrenergic receptor. The G-protein from the Adrenergic receptor (residues from 0 to 342) is discarded, to ensure that these region
is not used to align both proteins

In [3]:
adr_receptor = Molecule('3NYA.pdb')
adr_receptor.filter('protein')
adr_receptor.filter('resid 0 to 342') # filter out the G-protein
adr_receptor.set('segid','ADR',sel='all') #identify the protein with a segid

2018-07-25 12:33:58,683 - htmd.molecule.molecule - INFO - Removed 81 atoms. 3527 atoms remaining in the molecule.
2018-07-25 12:33:58,730 - htmd.molecule.molecule - INFO - Removed 1275 atoms. 2252 atoms remaining in the molecule.


Let's see the two proteins together before the alignment. The dopamine receptor is displayed in blue.

In [4]:
both_proteins = adr_receptor.copy()
both_proteins.append(dop_receptor)
both_proteins.view(sel='segid ADR',style='NewCartoon',hold=True)
both_proteins.view(sel='segid DOP',style='NewCartoon',color='blue')

A Jupyter Widget

Now, both proteins are ready to be aligned, so we call the ```sequenceStructureAlignment``` function. The second protein molecule passed to the function acts as the reference, in our case, the adrenaline receptor

In [5]:
dop_receptor_results = sequenceStructureAlignment(dop_receptor,adr_receptor)

2018-07-25 12:34:43,585 - htmd.molecule.util - INFO - Alignment #0 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:34:43,611 - htmd.molecule.util - INFO - Alignment #1 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:34:43,638 - htmd.molecule.util - INFO - Alignment #2 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:34:43,664 - htmd.molecule.util - INFO - Alignment #3 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:34:43,690 - htmd.molecule.util - INFO - Alignment #4 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:34:43,717 - htmd.molecule.util - INFO - Alignment #5 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:34:43,743 - htmd.molecule.util - INFO - Alignment #6 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:34:43,770 - htmd.molecule.util - INFO - Alignment #7 was done on 7 residues: mol segid DOP resid 41-47
2018-07-25 12:34:43,796 - htmd.molecule.util - INFO - Alignment 

```dop_receptor_results``` stores different structural alignments, each using a different portion of the sequence alignment.
By default, only the best 10 alignments are stored, but you can modify this behaviour setting the parameter ```maxalignments``` to an arbitrary number.

Let's look at the best result by choosing the first item in ```dop_receptor_results```. Dopamine receptor is shown in blue.

In [6]:
aligned_proteins = dop_receptor_results[0].copy() #pick the best result 
aligned_proteins.append(adr_receptor)
aligned_proteins.view(style='NewCartoon',sel='segid ADR',hold=True)
aligned_proteins.view(style='NewCartoon',sel='segid DOP',color='blue')

A Jupyter Widget