# Argininosuccinate Synthetase (ASS1 or ARGSS) Case Study

Last edited by: Anna, to add Gene ID--> Accession number

Key Information: 
    * Uniprot ID: P00966
    * Entrez ID:445
    * Perturbation: Low levels of ARGSS expression observed (Bowles, Int J Cancer 2008)
    

   

## BioEntrez

In [1]:
from Bio import Entrez
Entrez.email="asleung@ucsd.edu"
gene_entrez_id=raw_input('Enter Gene Entrez ID:')
handle = Entrez.efetch(db="gene", id=gene_entrez_id ,rettype="fasta", retmode="xml")
records=Entrez.read(handle)
#records[0].keys()
gene_entrez_name = records[0]["Entrezgene_gene"]["Gene-ref"]["Gene-ref_locus"]
gene_entrez_location = records[0]["Entrezgene_gene"]["Gene-ref"]["Gene-ref_maploc"]
gene_entrez_syn = records[0]["Entrezgene_gene"]["Gene-ref"]["Gene-ref_syn"]
gene_entrez_summary=records[0]["Entrezgene_summary"]
print "\nName: %s\n\nLocation: %s\n\nSynonyms: %s \n\nSummary: %s" %(gene_entrez_name, gene_entrez_location, gene_entrez_syn, gene_entrez_summary)

Enter Gene Entrez ID:445

Name: ASS1

Location: 9q34.1

Synonyms: ['ASS', 'CTLN1'] 

Summary: The protein encoded by this gene catalyzes the penultimate step of the arginine biosynthetic pathway. There are approximately 10 to 14 copies of this gene including the pseudogenes scattered across the human genome, among which the one located on chromosome 9 appears to be the only functional gene for argininosuccinate synthetase. Mutations in the chromosome 9 copy of this gene cause citrullinemia. Two transcript variants encoding the same protein have been found for this gene. [provided by RefSeq, Aug 2012]


#### Entrez Gene ID to Accession Numbers

In [2]:
# pip install mygene         <-- on terminal
import mygene

In [3]:
mg = mygene.MyGeneInfo()

In [5]:
mg.getgene(gene_entrez_id, 'name,symbol,refseq')

{u'_id': u'445',
 u'name': u'argininosuccinate synthase 1',
 u'refseq': {u'genomic': [u'NC_000009',
   u'NC_018920',
   u'NG_011542',
   u'NT_008470',
   u'NW_004929367'],
  u'protein': [u'NP_000041', u'NP_446464', u'XP_005272257', u'XP_011517007'],
  u'rna': [u'NM_000050', u'NM_054012', u'XM_005272200', u'XM_011518705']},
 u'symbol': u'ASS1'}

In [6]:
import pandas as pd
GP = pd.read_csv('DF_GEMPRO.csv', index_col=0)
# forcing gene IDs to be read as strings
GP['m_gene_original'] = GP['m_gene_original'].astype(str)
GP['m_gene_entrez'] = GP['m_gene_entrez'].astype(str)
GP['m_gene_isoform'] = GP['m_gene_isoform'].astype(str)
GEM_PRO_available_refseq = GP[GP.m_gene_entrez == gene_entrez_id]['u_refseq']
print "These are the refseq IDs compatible with our workflow"
pd.DataFrame(GEM_PRO_available_refseq)

These are the refseq IDs compatible with our workflow


Unnamed: 0,u_refseq
3049,NP_446464
3050,


In [7]:
acc=raw_input('Enter the accession number of the sequence you would like to download:')

Enter the accession number of the sequence you would like to download:NP_446464


In [8]:
from Bio import SeqIO
from Bio import Entrez
Entrez.email='asleung@ucsd.edu'
temp = Entrez.efetch(db="nucleotide",rettype="gb",id=acc)
out = open(acc+".fasta",'w')
gbseq = SeqIO.read(temp, "genbank")
SeqIO.write(gbseq,out,"fasta")
temp.close()
out.close()
print(gbseq)

ID: NP_446464.1
Name: NP_446464
Description: argininosuccinate synthase [Homo sapiens].
Number of features: 10
/comment=REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from DB496935.1 and BC013224.1.
Summary: The protein encoded by this gene catalyzes the penultimate
step of the arginine biosynthetic pathway. There are approximately
10 to 14 copies of this gene including the pseudogenes scattered
across the human genome, among which the one located on chromosome
9 appears to be the only functional gene for argininosuccinate
synthetase. Mutations in the chromosome 9 copy of this gene cause
citrullinemia. Two transcript variants encoding the same protein
have been found for this gene. [provided by RefSeq, Aug 2012].
Transcript Variant: This variant (2) lacks an exon in the 5' UTR,
compared to variant 1. Variants 1 and 2 encode the same protein.
Publication Note:  This RefSeq record includes a subset of the
publications that are available fo

## Is there an associated Mutation to be studied?
#### User provides own mutation. For instance, we can analyze a mutation obtained from COSMIC:

Mutation S6F http://cancer.sanger.ac.uk/cosmic/gene/samples?coords=AA%3AAA&src=gene&end=413&mut=substitution_missense&ln=ASS1&all_data=&id=60232&seqlen=413&start=1

Two mutation assessors can be used to assess the effect of the mutation:
    * PROVEAN http://provean.jcvi.org/seq_submit.php
    * SIFT http://sift.bii.a-star.edu.sg/www/SIFT_seq_submit2.html

In [10]:
assessor_choice = raw_input("Which mutation assessor would you like to use (answer with PROVEAN or SIFT)?   ")
if assessor_choice.upper() == 'SIFT':
    mut_loc_SAA = raw_input("Which amino acid position is it located in? ")
    mut_init = raw_input("Which amino acid was it initially? ")
    mut_out = raw_input("Which amino acid was it changed to? ")
    answer = "this is your input for Step 2: %s%s%s" %(mut_init, mut_loc_SAA, mut_out)
elif assessor_choice.upper() == 'PROVEAN':
    mut_type = raw_input("What type of mutation is it (Single Amino Acid [SAA], Del, In, or Indel)?   ")
    if mut_type.upper() == 'SAA':
        mut_loc_SAA = raw_input("Which amino acid position is it located in? ")
        mut_init = raw_input("Which amino acid was it initially? ")
        mut_out = raw_input("Which amino acid was it changed to? ")
        answer = "this is your input for Step 2: %s%s%s" %(mut_init, mut_loc_SAA, mut_out)
    elif mut_type.upper() == 'DEL':
        mut_loc = raw_input("Which amino acid position was deleted? ")
        mut_init = raw_input("Which amino acid was it initially? ")
        answer = "this is your input for Step 2: %s%sdel" %(mut_init, mut_loc)
    elif mut_type.upper() == 'IN':
        mut_insert = raw_input("Which amino acid(s) does the insertion consist of? ")
        mut_aa1 = raw_input("Which amino acid was it inserted behind of? ")
        mut_aa1_pos =raw_input("What AA position was it inserted in? ")
        mut_aa2 = raw_input("Which amino acid is the insertion in front of? ")
        mut_aa2_pos =raw_input("What AA position was it inserted behind of? ")
        answer = "this is your input for Step 2: %s%s_%s%sins%s" %(mut_aa1, mut_aa1_pos, mut_aa2, mut_aa2_pos, mut_insert)
    elif mut_type.upper() == 'INDEL':
        mut_num_t = raw_input("Is there only one deletion or a range of deletions (answer with 'one' or 'range')   ")
        if mut_num_t.upper() == 'ONE':
            mut_del1 = raw_input("What is the AA that is being deleted?   ")
            mut_del1_pos = raw_input("What position is that AA in?   ")
            mut_indel = raw_input("What AAs are being inserted (can insert more than one, ie MKSS)   ")
            answer = "this is your input for Step 2: %s%sdelins%s" %(mut_del1, mut_del1_pos, mut_indel)
        elif mut_num_t.upper() == 'RANGE':
            mut_del1 = raw_input("What is the first AA that is being deleted?   ")
            mut_del1_pos = raw_input("What position is that AA in?   ")
            mut_del2 = raw_input("What is the last AA that is being deleted?   ")
            mut_del2_pos = raw_input("What position is that AA in?   ")
            mut_indel = raw_input("What AAs are being inserted (can insert more than one, ie MKSS)   ")
            answer = "this is your input for Step 2: %s%s_%s%sdelins%s" %(mut_del1, mut_del1_pos, mut_del2, mut_del2_pos, mut_indel)
        else:
            print "tbd"	
    else:
        redo = raw_input("please input SAA, Del, In, or Indel")
else:
    print "Please redo"
print '\n\n\nFirst open the mutation assessor website of choice'
print "copy and paste the amino acid FASTA file"
print "Submit the following for step 2: %s" %(answer)

Which mutation assessor would you like to use (answer with PROVEAN or SIFT)?   SIFT
Which amino acid position is it located in? 6
Which amino acid was it initially? S
Which amino acid was it changed to? F



First open the mutation assessor website of choice
copy and paste the amino acid FASTA file
Submit the following for step 2: this is your input for Step 2: S6F


## Visualizing the protein and its mutation

*Note: Need to have DF_GEMPRO.csv in local directory*

In [14]:
GP = pd.read_csv('DF_GEMPRO.csv', index_col=0)
# forcing gene IDs to be read as strings
GP['m_gene_original'] = GP['m_gene_original'].astype(str)
GP['m_gene_entrez'] = GP['m_gene_entrez'].astype(str)
GP['m_gene_isoform'] = GP['m_gene_isoform'].astype(str)
# this searches for an ID and prints out which row it is in

GP[GP.u_refseq == acc]

Unnamed: 0,m_gene_original,m_gene_entrez,m_gene_isoform,u_uniprot_acc,u_isoform_id,u_refseq,u_ensp,u_seq_len,u_seq,u_reviewed,...,ssb_p_aln_coverage,ssb_p_percent_seq_ident,ssb_p_no_deletions_in_pdb,ssb_p_aln_coverage_sim,ssb_si_score,ssb_rez_score,ssb_raw_score,ssb_above_cutoffs,ssb_rank,ssb_best_file
3049,445.1,445,1,P00966,P00966-1,NP_446464,ENSP00000253004,412,MSSKGSVVLAYSGGLDTSCILVWLKEQGYDVIAYLANIGQKEDFEE...,True,...,402,0.975728,True,402,1.565458,1.066667,2.632125,True,1,2nz2.pdb


In [None]:
gene_original_id = raw_input("What is the gene original ID?   ")

In [None]:
# This extracts all chains present
#print "These chains are present in the pdb structure: %s" %(chain_strings)
#pdb_chain_choose = raw_input("Which chain are you interested in?   ")
chains_avail = GP[GP.m_gene_original == gene_original_id].p_chains
chains_present = ""
for a in chains_avail:
    chains_present = a

# This automatically displays/chooses which chain to align as it is the "best"; a string of A,B,C is returned
best_pdb_chain = GP[GP.m_gene_original == gene_original_id].p_chain_uniprot_map.values[0][2]

In [None]:
# load Biopython PDB packages

# PDBList to download PDBs
from Bio.PDB.PDBList import PDBList
pdbl = PDBList()

# PDBParser to load and work with files
from Bio.PDB.PDBParser import PDBParser
parser = PDBParser()

import urllib2
import uuid

pdb_name = raw_input("What is the pdb ID?   ")

# download pdb
pdb_file_path = pdbl.retrieve_pdb_file(pdb_name)

In [None]:
# we can put a raw input to name the structure as well
pdb_structure = parser.get_structure('ARGSS', pdb_file_path)

In [None]:
# get the ligands within this file for display
# from: http://stackoverflow.com/questions/25718201/remove-heteroatoms-from-pdb
ligands = []

for residue in pdb_structure.get_residues():
    tags = residue.get_full_id()
    # tags contains a tuple with (Structure ID, Model ID, Chain ID, (Residue ID))
    # Residue ID is a tuple with (*Hetero Field*, Residue ID, Insertion Code)

    # Thus you're interested in the Hetero Field, that is empty if the residue
    # is not a hetero atom or have some flag if it is (W for waters, H, etc.)
    if tags[3][0] != " " and tags[3][0] != "W":
        ligands.append(tags[3][0].split('_')[1].strip())
    else:
        continue
        
print(ligands)

In [None]:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
from Bio.PDB import Polypeptide

In [None]:
def get_pdb_seq(structure):
    '''
    Takes in a Biopython structure object and returns a list of the structure's sequences
    :param structure: Biopython structure object
    :return: Dictionary of sequence strings with chain IDs as the key
    '''
    
    structure_seqs = {}
    
    # loop over each chain of the PDB
    for chain in structure[0]:
        
        chain_it = iter(chain) 
        
        chain_seq = ''
        tracker = 0
        
        # loop over the residues
        for res in chain.get_residues():
            # NOTE: you can get the residue number too
            res_num = res.id[1]
            
            # double check if the residue name is a standard residue
            # if it is not a standard residue (ie. selenomethionine),
            # it will be filled in with an X on the next iteration)
            if Polypeptide.is_aa(res, standard=True):
                full_id = res.get_full_id()
                end_tracker = full_id[3][1]
                i_code = full_id[3][2]
                aa = Polypeptide.three_to_one(res.get_resname())
                
                # tracker to fill in X's
                if end_tracker != (tracker + 1):# and first == False:
                    if i_code != ' ':
                        chain_seq += aa
                        tracker = end_tracker + 1
                        continue
                    else:
                        chain_seq += 'X'*(end_tracker - tracker - 1)
                        
                chain_seq += aa
                tracker = end_tracker
                
            else:
                continue

        structure_seqs[chain.get_id()] = chain_seq

    return structure_seqs

In [None]:
# represented in a single string
pdb_sequence = get_pdb_seq(pdb_structure)
string_pdb_seq = pdb_sequence[best_pdb_chain]

In [None]:
# outputs a fasta file format
faa_out1 = '> '
faa_out2 = '%s pdb sequence fasta' %(pdb_name)
faa_out3 = '\n%s' %(string_pdb_seq)
faa_out = faa_out1 + faa_out2 + faa_out3
print faa_out

In [None]:
file = open(faa_out2 + '.faa', "w")
# note the fasta file name is named faa_out2
file.write(faa_out)
file.close()

In [None]:
import os.path
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC

def write_fasta_file(sequence, fileout):
    '''
    This writes a fasta file for a SeqRecord object. It also checks if the file exists already and returns the filename.
    
    Input: sequence - Biopython SeqRecord object, identification - ID of the sequence.
    Output: Filename of fasta file
    '''
    
    outfile = "%s" % fileout
    if os.path.isfile(outfile):
        print 'FASTA file already exists %s' % outfile
        return outfile
    else:
        SeqIO.write(sequence, outfile, "fasta")
        return outfile

In [None]:
# example: for gene argss

# getting the IDs and making output file name 
seq_id = GP[GP.m_gene_original == gene_original_id].u_isoform_id.values[0]
# the /tmp/ in '/tmp/' + seq_id + '.faa' puts it in a temporary folder; I will remove the temp saving for now
seq_output = seq_id + '.faa'

# getting the sequence and making it into a Biopython SeqRecord object
seq = GP[GP.m_gene_original == gene_original_id].u_seq.values[0]
seq_biop = SeqRecord(Seq(seq, IUPAC.protein),id=seq_id,description='uniprot sequence')

# writing the SeqRecord object (formats it as FASTA file)
out_file = write_fasta_file(seq_biop, seq_output)

In [None]:
import os.path
from Bio.Emboss.Applications import NeedleCommandline

def run_alignment(fasta1_id, fasta1, fasta2_id, fasta2):
    '''
    Runs the needle alignment program and writes the result to a file. Returns the filename. Standard gap inputs are used.
    
    Input:  fasta1 - fasta file name ("reference" sequence)
            fasta2 - fasta file name (what you're interested in aligning)
    Output: alignment_file - file name of alignment
    '''

    alignment_file = "%s_%s_align.txt" % (fasta1_id, fasta2_id)
    
    if os.path.isfile(alignment_file):
        print 'Alignment %s file already exists' % alignment_file
        return alignment_file

    else:
        print '**RUNNING ALIGNMENT FOR %s AND %s**' % (fasta1_id, fasta2_id)
        needle_cline = NeedleCommandline(asequence=fasta1, bsequence=fasta2, gapopen=10, gapextend=0.5, outfile=alignment_file)
        stdout, stderr = needle_cline()
        return alignment_file

In [None]:
SEQUENCE_FILES = '/Users/LAURENCE/Desktop/Senior Design/Untitled Folder'
UNIPROT_FILES = '/Users/LAURENCE/Desktop/Senior Design/Untitled Folder'
PDB_SEQ_FILES = '/Users/LAURENCE/Desktop/Senior Design/Untitled Folder'

# 1. get the uniprot sequence file
seq_id = GP[GP.m_gene_original == gene_original_id].u_isoform_id.values[0]
seq_fasta = os.path.join(UNIPROT_FILES, seq_id + '.faa')

if os.path.exists(seq_fasta):
    print('found uniprot fasta file {}'.format(seq_fasta))
    
# 2. get the pdb sequence file
pdb_id = GP[GP.m_gene_original == gene_original_id].ssb_best_file.values[0].strip('.pdb')
pdb_fasta = os.path.join(PDB_SEQ_FILES, faa_out2 + '.faa')

if os.path.exists(pdb_fasta):
    print('found pdb fasta file {}'.format(pdb_fasta))
    
# 3. run the alignment using the function above
os.chdir('/tmp/')
alignment_filename = run_alignment(seq_id, seq_fasta, pdb_id, pdb_fasta)

In [None]:
!cat $alignment_filename

In [None]:
import numpy as np
from Bio import AlignIO
from collections import defaultdict

def get_alignment_allpos_df(alignment_file, a_seq_id=None, b_seq_id=None):
    '''
    Takes in a needle alignment file and returns a pandas dataframe of the results
    Input: alignment_file - the path to the alignment file, 
            a_seq_id - optional ID of the reference sequence, 
            b_seq_id - optional ID of the second sequence
    Output: alignment_df - a pandas dataframe of the alignment results
    '''
    alignments = list(AlignIO.parse(alignment_file, "emboss"))

    appender = defaultdict(dict)
    idx = 0
    for alignment in alignments:
    #         if not switch:
        if not a_seq_id:
            a_seq_id = list(alignment)[0].id
        a_seq = str(list(alignment)[0].seq)
        if not b_seq_id:
            b_seq_id = list(alignment)[1].id
        b_seq = str(list(alignment)[1].seq)

        a_idx = 1
        b_idx = 1

        for i, (a,b) in enumerate(zip(a_seq,b_seq)):
            if a == b and a != '-' and b != '-':
                aa_flag = 'match'
            if a != b and a == '-' and b != '-':
                aa_flag = 'insertion'
            if a != b and a != '-' and b == '-':
                aa_flag = 'deletion'
            if a != b and a != '-' and b == 'X':
                aa_flag = 'unresolved'
            if a != b and b != '-' and a == 'X':
                aa_flag = 'unresolved'
            elif a != b and a != '-' and b != '-':
                aa_flag = 'mutation'
                
            appender[idx]['Uniprot_ID'] = a_seq_id
            appender[idx]['Structure'] = b_seq_id
            appender[idx]['type'] = aa_flag
            
            if aa_flag == 'match' or aa_flag == 'unresolved' or aa_flag == 'mutation':
                appender[idx]['Uniprot_sequence'] = a
                appender[idx]['Uniprot_sequence_position'] = a_idx
                appender[idx]['PDB_sequence'] = b
                appender[idx]['PDB_sequence_position'] = b_idx
                a_idx += 1
                b_idx += 1

            if aa_flag == 'deletion':
                appender[idx]['Uniprot_sequence'] = a
                appender[idx]['Uniprot_sequence_position'] = a_idx
                a_idx += 1

            if aa_flag == 'insertion':
                appender[idx]['PDB_sequence'] = b
                appender[idx]['PDB_sequence_position'] = b_idx
                b_idx += 1
            
            idx += 1

    alignment_df = pd.DataFrame.from_dict(appender, orient='index')
    alignment_df = alignment_df[['Uniprot_ID', 'Structure', 'type', 'Uniprot_sequence', 'Uniprot_sequence_position', 'PDB_sequence', 'PDB_sequence_position']].fillna(value=np.nan)
    
    return alignment_df

In [None]:
# load Biopython PDB packages

# PDBList to download PDBs
from Bio.PDB.PDBList import PDBList
pdbl = PDBList()

# PDBParser to load and work with files
from Bio.PDB.PDBParser import PDBParser
parser = PDBParser()

import urllib2
import uuid

In [None]:
structure = parser.get_structure('someprotein', pdb_file_path)

In [None]:
def get_pdb_seq2(structure):
    '''
    Takes in a Biopython structure object and returns a list of the structure's sequences
    :param structure: Biopython structure object
    :return: Dictionary of sequence strings with chain IDs as the key
    '''
    
    structure_seqs = {}
    
    # loop over each chain of the PDB
    for chain in structure[0]:
        
        chain_it = iter(chain) 
        
        chain_seq = []
        tracker = 0
        
        # loop over the residues
        for res in chain.get_residues():
            # NOTE: you can get the residue number too
            res_num = res.id[1]
            
            # double check if the residue name is a standard residue
            # if it is not a standard residue (ie. selenomethionine),
            # it will be filled in with an X on the next iteration)
            # TODO: except when it's at the beginning or end...
            if Polypeptide.is_aa(res, standard=True):
                full_id = res.get_full_id()
                end_tracker = full_id[3][1]
                i_code = full_id[3][2]
                aa = Polypeptide.three_to_one(res.get_resname())
                
                # tracker to fill in X's
                if end_tracker != (tracker + 1):
                    if i_code != ' ':
                        chain_seq.append((aa,end_tracker))
                        tracker = end_tracker + 1
                        continue
                    else:
                        xes = 'X'*(end_tracker - tracker - 1)
                        for x in xes:
                            chain_seq.append((x,end_tracker))
                        
                chain_seq.append((aa,end_tracker))
                tracker = end_tracker
                
            else:
                continue

        structure_seqs[chain.get_id()] = chain_seq

    return structure_seqs

In [None]:
my_structure_sequence = get_pdb_seq2(structure)
from Bio.PDB import Selection

In [None]:
get_alignment_allpos_df(alignment_filename).head(30) #How many rows does the user want to see?

In [None]:
# again, open debate as to how this can be called
my_mutation_resnum = int(raw_input("What is the corresponding mutation on the PDB structure?   "))

In [None]:
# let's get the info from the structure
my_mutation_residue = structure[0][best_pdb_chain][my_mutation_resnum]
#print my_mutation_residue
# we can use the Selection class to select all atoms of this residue
# 'A' here stands for ATOM (http://biopython.org/DIST/docs/api/Bio.PDB.Selection-module.html)
atom_list = Selection.unfold_entities(my_mutation_residue, 'A')

# then you can format this information for PV:
for a in atom_list:
    print('{}.{}.{}').format('A',my_mutation_resnum,a.id)

In [None]:
# how to guide user through this??
label_input = raw_input("Copy and paste the desired label  ")

In [None]:
print "These are the chains present in the structure:   " + chains_present
chain_display = raw_input("Would you like to display all chains (answer with yes or no)?   ")
if chain_display.upper() == 'YES':
    cnames_pv_var = ''
    structure_var = 'structure'
    chain_iso = ''
elif chain_display.upper() == 'NO':
    cnames = raw_input("Type in the chain you would like to display:   ")
    cnames_pv = "cname: '" + cnames + "'"
    cnames_pv_var = "var chain = structure.select({cname: '" + cnames + "'})"
    structure_var = 'chain'
else:
    print "not sure what's here yet..."

In [None]:
bind_site_avail = raw_input("Is there a binding or active site you would like to display (answer with yes or no)?   ")
if bind_site_avail.upper() == 'YES':
    chain_res_disp_choice = raw_input("Would you like to display the site on both chains (answer with yes or no)?   ")
    if chain_res_disp_choice.upper() == 'YES':
        chain_iso = ''
        is_res = ''
    elif chain_res_disp_choice.upper() == 'NO':
        chain_res_disp_chain_choice = raw_input("Which chain would you like to display the site on?   ")
        is_res = ''
    else:
        print "tbd"
elif bind_site_avail.upper() == 'NO':
    site_start = 0
    site_end = 0
    rnums_script = ''
    is_res = '//'
else:
    print "tbd"
    
print chain_iso
print rnums_script

In [None]:
from IPython.display import Image
binding_url = ('http://www.rcsb.org/pdb/explore/remediatedChain.do?structureId=' + '%s' + '&chainId=A') %(pdb_name)
Image(url = binding_url)

In [None]:
binding_site_list = raw_input("List the sequences of the active sites (ie 6, 7, 120-134):   ")
binding_site_list_strip = binding_site_list.replace(' ','')
mylist = binding_site_list_strip.split(',')
indexx = 0
numberz = ''
for number in mylist:
    if '-' not in mylist[indexx]:
        numberz += ', ' + mylist[indexx]
    if '-' in mylist[indexx]:
        indexxx = 0
        for everydigit in mylist[indexx]:
            if everydigit == '-':
                hyphen_pos = indexxx
            indexxx = indexxx + 1
        rnums_seq_script = mylist[indexx][0:hyphen_pos]
        num_counter = mylist[indexx][0:hyphen_pos]
        numberz += ', '+ num_counter
        for y in range(int(mylist[indexx][(hyphen_pos + 1):]) - int(mylist[indexx][0:hyphen_pos])):
            num_counter = str(int(num_counter) + 1)
            numberz += ', ' + str(num_counter)
    indexx = indexx + 1
tidied_output = numberz[2:]
rnums_script = "rnums : [" + tidied_output + "]"
print rnums_script

In [None]:
class PDBViewer_options(object):
    '''
    Contributed by: Ali Ebrahim
    '''
    
    def __init__(self, f):
        self.pdb = open(f).read()

    def _repr_html_(self):
        div_id = str(uuid.uuid4())
        
        return """<div id="%s" style="width: 800px; height: 600px">
    <div>
    
        <!--testing static label-->
        <style>
            .static-label {
                position: absolute;
                background: #0000;
                text-align: right;
                z-index: 1;
                font-weight: bold;
                width: 800px;
            }
        </style>
        
        <script>
            require.config({
                paths: {
                    "pv": "//biasmv.github.io/pv/js/pv.min"
                }
            });
            
            require(["pv"], function(pv) {
                pdb = "%s";
                
                <!--append the static label to the parent-->
                var parent = document.getElementById('%s');
                var staticLabel = document.createElement('div');
                staticLabel.innerHTML = 'myProtein';
                staticLabel.className = 'static-label';
                parent.appendChild(staticLabel);
                
                <!--load the structure-->
                structure = pv.io.pdb(pdb);
                
                // select a chain to display and see if user wants to only display one
                %s
                
                <!--choose atom to label-->
                var carbonAlpha = structure.atom("A.6.CA");
                
                // choose a ligand to color (later on), if want to see on both chains remove cname
                %s var residues = structure.select({%s %s});
                
                viewer = pv.Viewer(parent, {
                    width: '800',
                    height: '600',
                    antialias: true,
                    outline: true,
                    quality: 'medium',
                    style: 'hemilight',
                    background: 'white',
                    animateTime: 500,
                    selectionColor: '#f00'
                });
            
                
                <!--misc viewer functions-->
                viewer.fitParent();
                
                // add cartoon visualization
                viewer.cartoon('molecule', %s);
                
                // color the selected residues in red, and display as red lines
                %s viewer.spheres('residues', residues,  { color: pv.color.uniform('red') });
                
                // center on the structure
                viewer.centerOn(%s);
                
                <!--atom label options-->
                var options = {
                 fontSize : 16, fontColor: '#f22', backgroundAlpha : 0.4
                };
                
                <!--display the label-->
                viewer.label('label', carbonAlpha.qualifiedName(), carbonAlpha.pos(), options);
                
                <!--not sure how the auto zoom works-->
                viewer.autoZoom();
            });
        </script>
        """ % (div_id, self.pdb.replace("\n", "\\n"), div_id, cnames_pv_var, is_res, chain_iso, rnums_script, structure_var, is_res, structure_var)

In [None]:
PDBViewer_options(pdb_file_path)

## Cobrapy and ESCHER Map

In [None]:
import escher
import escher.urls
import cobra
import cobra.test
import json
import os
from IPython.display import HTML

# Make this local
b = escher.Builder(map_json='/Users/LAURENCE/Desktop/Senior Design/RECON1.Central.json')
b.display_in_browser(scroll_behavior='pan')

In [None]:
model = cobra.io.load_matlab_model('/Users/LAURENCE/Desktop/Senior Design/modelRecon2', 'modelRecon2')

### Default is set to optimize growth? 

In [None]:
solution = model.optimize()
print('Growth rate: %.2f' % solution.f)

In [None]:
b = escher.Builder(map_json='/Users/LAURENCE/Desktop/Senior Design/RECON1.Central.json',
                   reaction_data=solution.x_dict,
                   # color and size according to the absolute value
                   reaction_styles=['color', 'size', 'abs', 'text'],
                   # change the default colors
                   reaction_scale=[{'type': 'min', 'color': 'red', 'size': 4},
                                   {'type': 'mean', 'color': 'green', 'size': 20},
                                   {'type': 'max', 'color': 'blue', 'size': 40}],
                   # only show the primary metabolites
                   hide_secondary_metabolites=True)
b.display_in_browser(scroll_behavior='pan')

#### Normalizing reaction fluxes for better ESCHER visualization

In [None]:
### Normalizing the values in solution.x_dict
import math
normalized_x_dict = {}
for a, b in solution.x_dict.iteritems():
    if b == 0:
        solution.x_dict[a]==0
    elif b != 0:
        solution.x_dict[a] = math.log10(abs(b)*1000000)
    else:
        print "error"

In [None]:
### Remapping
b = escher.Builder(map_json='/Users/LAURENCE/Desktop/Senior Design/RECON1.Central.json',
                   reaction_data=solution.x_dict,
                   # color and size according to the absolute value
                   reaction_styles=['color', 'size', 'abs', 'text'],
                   # change the default colors
                   reaction_scale=[{'type': 'min', 'color': 'red', 'size': 4},
                                   {'type': 'mean', 'color': 'green', 'size': 20},
                                   {'type': 'max', 'color': 'blue', 'size': 40}],
                   # only show the primary metabolites
                   hide_secondary_metabolites=True)
b.display_in_browser(scroll_behavior='pan')

### Deleting ARGSS (SIFT/PROVEAN also indicates mutation is deleterious -- treat as gene ko)

In [None]:
model_modified = model.copy()
# for example, delete a reaction
model_modified.reactions.ARGSS.delete()

In [None]:
model_modified.reactions.get_by_id("ARGSS")
# An error indicates that the gene has succesfully been deleted

In [None]:
b_m = escher.Builder(map_json='/Users/LAURENCE/Desktop/Senior Design/RECON1.Central.json',
                   model=model_modified,
                   # in the map, highlight all reactions that are missing from the model
                   highlight_missing=True)
b_m.display_in_browser()

### Newly optimized growth rate with gene knockout

In [None]:
solution_m = model_modified.optimize()
print('Growth rate: %.2f' % solution.f)
b_m.display_in_browser()