## PV - the javascript PDB viewer

https://biasmv.github.io/pv/

- Here is how you load PV within a IPython notebook
- Steps
    - Use Biopython to download the file
    - Get the ligand names from the file for display in the structure
    - Load the structure using PV

In [1]:
# load Biopython PDB packages

# PDBList to download PDBs
from Bio.PDB.PDBList import PDBList
pdbl = PDBList()

# PDBParser to load and work with files
from Bio.PDB.PDBParser import PDBParser
parser = PDBParser()

import urllib2
import uuid

In [41]:
# download pdb
pdb_file_path = pdbl.retrieve_pdb_file('1A8O')

Downloading PDB structure '1A8O'...


In [42]:
# open the downloaded file
# COMT is just the name of this structure - can be arbitrary
structure = parser.get_structure('someprotein', pdb_file_path)

In [43]:
# get the ligands within this file for display
# from: http://stackoverflow.com/questions/25718201/remove-heteroatoms-from-pdb
ligands = []

for residue in structure.get_residues():
    tags = residue.get_full_id()
    # tags contains a tuple with (Structure ID, Model ID, Chain ID, (Residue ID))
    # Residue ID is a tuple with (*Hetero Field*, Residue ID, Insertion Code)

    # Thus you're interested in the Hetero Field, that is empty if the residue
    # is not a hetero atom or have some flag if it is (W for waters, H, etc.)
    if tags[3][0] != " " and tags[3][0] != "W":
        ligands.append(tags[3][0].split('_')[1].strip())
    else:
        continue
        
print(ligands)

['MSE', 'MSE', 'MSE', 'MSE']


In [44]:
class PDBViewer(object):
    '''
    Contributed by: Ali Ebrahim
    '''
    
    def __init__(self, f):
        self.pdb = open(f).read()

    def _repr_html_(self):
        div_id = str(uuid.uuid4())
        
        return """<div id="%s" style="width: 800px; height: 600px"><div>
        <!--script src="//biasmv.github.io/pv/js/pv.min.js"></script-->
        <script>
        require.config({paths: {"pv": "//biasmv.github.io/pv/js/pv.min"}});
        require(["pv"], function (pv) {
            pdb = "%s";
            structure = pv.io.pdb(pdb);
            viewer = pv.Viewer(document.getElementById('%s'),
                               {quality : 'medium', width: 'auto', height : 'auto',
                                antialias : true, outline : true});
            viewer.fitParent();
            var ligand = structure.select({rnames : %s});
            viewer.ballsAndSticks('ligand', ligand);
            viewer.cartoon('molecule', structure);
            viewer.centerOn(structure);
            
        });
        </script>
        """ % (div_id, self.pdb.replace("\n", "\\n"), div_id, ligands)

In [45]:
PDBViewer(pdb_file_path)

### Getting the PDB structure sequence

#### Method 1: use built-in Polypeptide builder function

The way this one works is it tries to build the structure based on atom-atom distances in the file. It just returns the sequence as a string.

Pros:
- easy to use

Cons:
- missing parts of the structure are not reflected within the sequence

In [48]:
from Bio.PDB.Polypeptide import PPBuilder
ppb=PPBuilder()
for pp in ppb.build_peptides(structure, aa_only=False):
    print('polypeptide:')
    print(pp.get_sequence())

polypeptide:
MDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPGATLEEMMTACQG


#### Problem:

The 3D structure may not have the whole sequence resolved - the method above does not really reflect that case. For example:

If we have a sequence:
  
    MGASGDLAKKKIYPTIWWLFRDGLLPENTFIV

and the structure is missing residues 3-10, the method above splits them into two "polypeptides":

    MG
    KIYPTIWWLFRDGLLPENTFIV
    
This makes alignments confusing later on.

#### Solution:

The next piece of code will return this instead, with X's in the "missing" residues:

    MGXXXXXXXXKIYPTIWWLFRDGLLPENTFIV
    
Also, it builds it by just iterating over the residues in the structure and returns a dictionary of the sequences with the chain IDs as keys.

In [49]:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
from Bio.PDB import Polypeptide

In [63]:
def accept(residue, standard_aa_only): 
    """Check if the residue is an amino acid (PRIVATE).""" 
    if Polypeptide.is_aa(residue, standard=standard_aa_only): 
        return True 
    elif not standard_aa_only and "CA" in residue.child_dict: 
        # It has an alpha carbon... 
        # We probably need to update the hard coded list of 
        # non-standard residues, see function is_aa for details. 
        warnings.warn("Assuming residue %s is an unknown modified " 
                      "amino acid" % residue.get_resname()) 
        return True 
    else: 
        # not a standard AA so skip 
        return False 

In [66]:
def get_pdb_seq(structure):
    '''
    Takes in a Biopython structure object and returns a list of the structure's sequences
    :param structure: Biopython structure object
    :return: Dictionary of sequence strings with chain IDs as the key
    '''
    
    structure_seqs = {}
    
    # loop over each chain of the PDB
    for chain in structure[0]:
        
        chain_it = iter(chain) 
        
        chain_seq = ''
        first = True
        tracker = 0
        
        # loop over the residues
        for res in chain.get_residues():
            
            # double check if the residue name is a standard residue - if not
            if Polypeptide.is_aa(res, standard=False):
                full_id = res.get_full_id()
                end_tracker = full_id[3][1]
                i_code = full_id[3][2]
                aa = Polypeptide.three_to_one(res.get_resname())
                if end_tracker != (tracker + 1) and first == False:
                    if i_code != ' ':
                        chain_seq += aa
                        tracker = end_tracker + 1
                        continue

                    else:
                        chain_seq += 'X'*(end_tracker - tracker - 1)
                        
                chain_seq += aa
                first = False
                tracker = end_tracker
            else:
                print res

        structure_seqs[chain.get_id()] = chain_seq

    return structure_seqs

In [67]:
 get_pdb_seq(structure)

NameError: global name 'is_connected' is not defined