DAVID Web Service
-------------------

For more informations please see: http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html

Available functions:

 * `authenticate()`
    * authenticate user by email address
    * return true if user has registered email with DAVID knowledge base
 * `addList(inputIds, idType, listName, listType)`
    * add a gene list or background list to current session 
 * `getAllAnnotationCategoryNames()`
    * return all available annotation category names
 * `getAllListNames()`
    * return all list names
 * `getAllPopulationNames()`
    * return background names
 * `getChartReport(threshold, count)`
    * generate chart report
 * `getConversionTypes()`
    * return all acceptable idTypes
 * `getCurrentList()`
    * return the position of current list
 * `getCurrentSpecies()`
    * return current species of the current list
 * `getCurrentPopulation()`
    * return the position of current background list
 * `getDefaultCategoryNames()`
    * return all default category names
 * `getGeneClusterReport(overlap, initialSeed, finalSeed, linkage, kappa)`
    * generate gene cluster report
 * `getGeneReportCategories()
    * return gene report categories
    * no argument needed
 * `getListName(pos)
    * get the name of a list
    * argument is the position of the list
 * `getListReport()
    * generate list report
 * `getSpecies()
    * return species of the current list
 * `getSummaryReport()
    * return a summary report
 * `getTableReport()
    * generate table report
 * `getTermClusterReport(overlap, initialSeed, finalSeed, linkage, kappa)
    * generate term cluster report
 * `setCurrentList(pos)
    * switch between gene lists
    * argument is the position of the list
 * `setCurrentPopulation(pos)
    * switch between background lists
    * argument is the position of the list
 * `setCurrentSpecies(string)
    * select specie(s) to use; argument is a string of integers delimited by commas
 * `setCategories()
    * let user select categories
    * argument is a string with category names delimited by commas
    * return a list of validated category names

In [26]:
# we need to install suds and nvd3 to display our results
!pip install --user --quiet suds
!pip install --user --quiet python-nvd3

In [27]:
# set your registered email address here
email = ''

In [28]:
import sys
import pandas
from StringIO import StringIO
from suds.client import Client
david_wsdl_url = 'http://david.abcc.ncifcrf.gov/webservice/services/DAVIDWebService?wsdl'
client = Client(david_wsdl_url)
registered = client.service.authenticate(email)

In [29]:
# set your input data here; For example in Galaxy you could enter `get(4)`
uniprot = pandas.read_csv('/home/bag/Downloads/uniprot-cytochrome.tab', sep='\t')

In [30]:
%%javascript
require.config({paths: {d3: "//d3js.org/d3.v3.min"}});

<IPython.core.display.Javascript object>

In [31]:
from IPython.display import HTML
from nvd3 import pieChart
import nvd3
nvd3.ipynb.initialize_javascript(use_remote=True)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [32]:
# define a plotting fucntion based on d3.js and nvd3
def pie_chart(x, y, name='piechart'):
    """
    x and y are lists of values and label
    name needs to be different between different plots, otherwise one plot overwrites the other
    """
    chart = pieChart(name=name, color_category='category20c', height=650, width=650)
    chart.set_containerheader("\n\n<h2>PieChart</h2>\n\n")
    xdata = x
    ydata = y
    extra_serie = {"tooltip": {"y_start": "", "y_end": " score"}}
    chart.add_serie(y=ydata, x=xdata, extra=extra_serie)
    chart.buildcontent()
    return chart.htmlcontent 

In [33]:
def david_setup(input_ids, id_type='UNIPROT_ACCESSION', 
                bg_ids=[], bg_name='IPython_bg_name',
                list_name='IPython_example_list', category=''):
    """
    possible categories:
        * BBID,GOTERM_CC_FAT,BIOCARTA,GOTERM_MF_FAT,SMART,COG_ONTOLOGY,SP_PIR_KEYWORDS,
        KEGG_PATHWAY,INTERPRO,UP_SEQ_FEATURE,OMIM_DISEASE,GOTERM_BP_FAT,PIR_SUPERFAMILY
    
    """
    david = client.service
    input_ids = ','.join(input_ids)
    if bg_ids:
        bg_ids = ','.join(bg_ids)

    list_type = 0
    print 'Percentage mapped: %s' % david.addList(input_ids, id_type, list_name, list_type)
    if bg_ids:
        list_type = 1
        print 'Percentage mapped (background): %s' % david.addList(bg_ids, id_type, bg_name, list_type)

    david.setCategories(category)
    return david

In [34]:
def report_to_table(request):
    """
    Converts a DAVID report to a pandas DataFrame.
    """
    results = list()
    for row in request:
        results.append(dict(row))
    df = pandas.DataFrame()
    return df.from_dict(results)

In [35]:
david = david_setup(uniprot['Entry'][:100], 'UNIPROT_ACCESSION', category='GOTERM_CC_FAT')

Percentage mapped: 0.99


In [36]:
ct = 2
thd = 0.1
request = david.getChartReport(thd, ct)
table = report_to_table(request)
table[['categoryName','termName', 'listHits', 'percent', 'ease', 'foldEnrichment', 'benjamini']]

Unnamed: 0,categoryName,termName,listHits,percent,ease,foldEnrichment,benjamini
0,GOTERM_CC_FAT,GO:0005739~mitochondrion,59,60.204082,1.250867e-40,7.708658,1.6761609999999998e-38
1,GOTERM_CC_FAT,GO:0005740~mitochondrial envelope,44,44.897959,2.134157e-40,14.914028,1.4298849999999998e-38
2,GOTERM_CC_FAT,GO:0031966~mitochondrial membrane,42,42.857143,1.4083219999999998e-38,15.139425,6.290504e-37
3,GOTERM_CC_FAT,GO:0031090~organelle membrane,56,57.142857,1.448194e-36,7.25661,4.851449e-35
4,GOTERM_CC_FAT,GO:0044429~mitochondrial part,45,45.918367,3.648952e-35,10.741176,9.779191e-34
5,GOTERM_CC_FAT,GO:0031967~organelle envelope,45,45.918367,2.176204e-34,10.308065,4.860189e-33
6,GOTERM_CC_FAT,GO:0031975~envelope,45,45.918367,2.501831e-34,10.27492,4.78922e-33
7,GOTERM_CC_FAT,GO:0005743~mitochondrial inner membrane,29,29.591837,3.29451e-24,13.459622,5.518305000000001e-23
8,GOTERM_CC_FAT,GO:0019866~organelle inner membrane,29,29.591837,2.454137e-23,12.518676,3.6539380000000003e-22
9,GOTERM_CC_FAT,GO:0070469~respiratory chain,14,14.285714,3.276517e-15,26.510815,4.463097e-14


In [37]:
overlap = 2
initialSeed = 2
finalSeed = 1
linkage = 1
kappa = 1
request = david.getGeneClusterReport(overlap, initialSeed, finalSeed, linkage, kappa)
table = report_to_table(request)
table[['name', 'score']]

Unnamed: 0,name,score
0,Gene Cluster 2,33.474752
1,Gene Cluster 5,33.474752
2,Gene Cluster 25,33.385599
3,Gene Cluster 12,31.105503
4,Gene Cluster 24,28.330592
5,Gene Cluster 21,27.484442
6,Gene Cluster 6,25.763687
7,Gene Cluster 16,25.763687
8,Gene Cluster 10,24.133724
9,Gene Cluster 9,23.591595


In [42]:
overlap = 3
initialSeed = 3
finalSeed = 3
linkage = 0.5
kappa = 50
request = david.getTermClusterReport(overlap, initialSeed, finalSeed, linkage, kappa)
table = report_to_table(request)
table[['name', 'score']]

Unnamed: 0,name,score
0,GO:0005739~mitochondrion,33.450913
1,GO:0070469~respiratory chain,11.341551
2,GO:0005746~mitochondrial respiratory chain,6.558009
3,GO:0005789~endoplasmic reticulum membrane,6.52891
4,GO:0005741~mitochondrial outer membrane,6.149982
5,GO:0043025~cell soma,0.597256
6,GO:0031410~cytoplasmic vesicle,0.058873
7,GO:0005654~nucleoplasm,0.00777
8,GO:0005615~extracellular space,0.007698
9,GO:0005887~integral to plasma membrane,0.000484


In [43]:
HTML(pie_chart(table['name'], table['score'], name="relaxed"))

In [44]:
overlap = 5
initialSeed = 5
finalSeed = 5
linkage = 0.5
kappa = 50 
request = david.getTermClusterReport(overlap, initialSeed, finalSeed, linkage, kappa)
table = report_to_table(request)
print table[['name', 'score']]

                                        name      score
0                   GO:0005739~mitochondrion  33.450913
1  GO:0005789~endoplasmic reticulum membrane   6.528910


In [45]:
HTML(pie_chart(table['name'], table['score']))