# Datasets2Tools API Manual
**Denis Torre**

*September 20, 2017*

## 1. Overview
This notebook explains how to extract data from the Datasets2Tools API using Python.  The notebook can be downloaded at the following GitHub page: [`https://github.com/denis-torre/datasets2tools/tree/master/api`](https://github.com/denis-torre/datasets2tools/tree/master/api "title" target="_blank").
   
##### Basics
- The Datasets2Tools search API can be accessed at the following URL: [`http://amp.pharm.mssm.edu/datasets2tools-dev/api/search`](http://amp.pharm.mssm.edu/datasets2tools-dev/api/search).
- Searches are refined by adding several parameters, which are explained in more detail below.
- The API returns a list of JSON objects containing information about the search results.

##### Object Types
The Datasets2Tools API can be used to search three types of objects:
- **Canned Analyses** ([`http://amp.pharm.mssm.edu/datasets2tools-dev/api/search?object_type=canned_analysis`](http://amp.pharm.mssm.edu/datasets2tools-dev/api/search?object_type=canned_analysis))
- **Datasets** ([`http://amp.pharm.mssm.edu/datasets2tools-dev/api/search?object_type=dataset`](http://amp.pharm.mssm.edu/datasets2tools-dev/api/search?object_type=dataset))
- **Tools** ([`http://amp.pharm.mssm.edu/datasets2tools-dev/api/search?object_type=tool`](http://amp.pharm.mssm.edu/datasets2tools-dev/api/search?object_type=tool))

More detailed explanation on searching these objects is available below.

##### Demo
Here is an example of search results for the analyses endpoint.

In [1]:
# Import modules
import json
import requests
import pandas as pd

# Get API URL
url = 'http://amp.pharm.mssm.edu/datasets2tools-dev/api/search'

# Search 5 analyses
data = {
    'object_type': 'canned_analysis',
    'page_size': 5
}

# Get response
response = requests.post(url, params=data)

# Read response
results = json.loads(response.text)

# Convert to dataframe
results_dataframe = pd.DataFrame(results)
results_dataframe

Unnamed: 0,canned_analysis_accession,canned_analysis_description,canned_analysis_preview_url,canned_analysis_title,canned_analysis_url,contribution_fk,dataset_accession,dataset_description,dataset_landing_url,dataset_title,...,repository_fk,repository_homepage_url,repository_icon_url,repository_name,tool_accession,tool_description,tool_homepage_url,tool_icon_url,tool_name,username
0,DCA00000024,Highly interactive web-based heatmap visualiza...,https://github.com/denis-torre/images/blob/mas...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,5,GSE16256,The human embryonic stem cells (hESCs) are a u...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,UCSD Human Reference Epigenome Mapping Project,...,1,https://www.ncbi.nlm.nih.gov/geo/,https://www.ncbi.nlm.nih.gov/geo/img/geo_main.gif,Gene Expression Omnibus,DCT00010052,ARCHS4 provides access to gene counts from HiS...,http://amp.pharm.mssm.edu/archs4/,http://amp.pharm.mssm.edu/datasets2tools-dev/s...,ARCHS4,denis
1,DCA00000025,Highly interactive web-based heatmap visualiza...,https://github.com/denis-torre/images/blob/mas...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,5,GSE17312,The NIH Roadmap Epigenomics Mapping Consortium...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,BI Human Reference Epigenome Mapping Project,...,1,https://www.ncbi.nlm.nih.gov/geo/,https://www.ncbi.nlm.nih.gov/geo/img/geo_main.gif,Gene Expression Omnibus,DCT00010052,ARCHS4 provides access to gene counts from HiS...,http://amp.pharm.mssm.edu/archs4/,http://amp.pharm.mssm.edu/datasets2tools-dev/s...,ARCHS4,denis
2,DCA00000026,Highly interactive web-based heatmap visualiza...,https://github.com/denis-torre/images/blob/mas...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,5,GSE18927,The NIH Roadmap Epigenomics Mapping Consortium...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,University of Washington Human Reference Epige...,...,1,https://www.ncbi.nlm.nih.gov/geo/,https://www.ncbi.nlm.nih.gov/geo/img/geo_main.gif,Gene Expression Omnibus,DCT00010052,ARCHS4 provides access to gene counts from HiS...,http://amp.pharm.mssm.edu/archs4/,http://amp.pharm.mssm.edu/datasets2tools-dev/s...,ARCHS4,denis
3,DCA00000027,Highly interactive web-based heatmap visualiza...,https://github.com/denis-torre/images/blob/mas...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,5,GSE22959,Deep Sequencing of protein-coding and non-prot...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Sequencing of the non-ribosomal transcriptome ...,...,1,https://www.ncbi.nlm.nih.gov/geo/,https://www.ncbi.nlm.nih.gov/geo/img/geo_main.gif,Gene Expression Omnibus,DCT00010052,ARCHS4 provides access to gene counts from HiS...,http://amp.pharm.mssm.edu/archs4/,http://amp.pharm.mssm.edu/datasets2tools-dev/s...,ARCHS4,denis
4,DCA00000028,Highly interactive web-based heatmap visualiza...,https://github.com/denis-torre/images/blob/mas...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,5,GSE24565,This data was generated by ENCODE. If you have...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Small RNA-seq from ENCODE/Cold Spring Harbor Lab,...,1,https://www.ncbi.nlm.nih.gov/geo/,https://www.ncbi.nlm.nih.gov/geo/img/geo_main.gif,Gene Expression Omnibus,DCT00010052,ARCHS4 provides access to gene counts from HiS...,http://amp.pharm.mssm.edu/archs4/,http://amp.pharm.mssm.edu/datasets2tools-dev/s...,ARCHS4,denis


## 2. Search Examples
For convenience, we define a function to search the API and return a pandas DataFrame.

In [2]:
# Import modules
import json
import requests
import pandas as pd

def search_datasets2tools(search_options):
    
    # Get API URL
    url = 'http://localhost:5000/datasets2tools-dev/api/search'

    # Get response
    response = requests.post(url, params=search_options)

    try:
        # Read response
        results_dict = json.loads(response.text)

        # Convert to dataframe
        results_dataframe = pd.DataFrame(results_dict)
        
        # Set index
        results_dataframe.set_index(search_options['object_type']+'_accession', inplace=True)
        
        return results_dataframe
        
    except:
        
        return 'Sorry, there has been an error.'

### 2.1 Canned Analyses
We can search canned analyses by text, dataset, tool, or metadata tags.

##### 2.1.1 By Text
Search all canned analyses that contain the keyword *prostate cancer*.

In [3]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'q': 'prostate cancer'})
results.head()

Unnamed: 0_level_0,canned_analysis_description,canned_analysis_title,canned_analysis_url,datasets,date,metadata,tools
canned_analysis_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
DCA00000060,Highly interactive web-based heatmap visualiza...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,[GSE35126],"September 20, 2017",{},[ARCHS4]
DCA00000123,Highly interactive web-based heatmap visualiza...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,[GSE39509],"September 20, 2017",{},[ARCHS4]
DCA00000139,Highly interactive web-based heatmap visualiza...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,[GSE40050],"September 20, 2017",{},[ARCHS4]
DCA00000262,Highly interactive web-based heatmap visualiza...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,[GSE43986],"September 20, 2017",{},[ARCHS4]
DCA00000448,Highly interactive web-based heatmap visualiza...,Interactive heatmap visualization of RNA-seq d...,http://amp.pharm.mssm.edu/datasets2tools/analy...,[GSE48403],"September 20, 2017",{},[ARCHS4]


##### 2.1.2 By Dataset
Search all canned analyses associated to GEO dataset GSE775.

In [4]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'dataset_accession': 'GSE775'})
results.head()

Unnamed: 0_level_0,canned_analysis_description,canned_analysis_title,canned_analysis_url,datasets,date,metadata,tools
canned_analysis_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
DCA00000002,An enrichment analysis was performed on the to...,Enrichment analysis of genes downregulated in ...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE775],"September 19, 2017","{u'do_id': u'DOID:9408', u'cell_type': u'Heart...",[Enrichr]
DCA00000003,An enrichment analysis was performed on the to...,Enrichment analysis of genes upregulated in ac...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE775],"September 19, 2017","{u'do_id': u'DOID:9408', u'cell_type': u'Heart...",[Enrichr]
DCA00000004,An enrichment analysis was performed on the to...,Enrichment analysis of genes downregulated in ...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE775],"September 19, 2017","{u'do_id': u'DOID:9408', u'cell_type': u'Heart...",[Enrichr]
DCA00000005,An enrichment analysis was performed on the to...,Enrichment analysis of genes upregulated in ac...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE775],"September 19, 2017","{u'do_id': u'DOID:9408', u'cell_type': u'Heart...",[Enrichr]
DCA00000006,The L1000 database was queried in order to ide...,Small molecules which mimic acute myocardial i...,http://amp.pharm.mssm.edu/L1000CDS2/#/result/5...,[GSE775],"September 19, 2017","{u'do_id': u'DOID:9408', u'direction': u'mimic...",[L1000CDS2]


##### 2.1.3 By Tool
Search all canned analyses generated by Enrichr.

In [5]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'tool_name': 'Enrichr'})
results.head()

Unnamed: 0_level_0,canned_analysis_description,canned_analysis_title,canned_analysis_url,datasets,date,metadata,tools
canned_analysis_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
DCA00000002,An enrichment analysis was performed on the to...,Enrichment analysis of genes downregulated in ...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE775],"September 19, 2017","{u'do_id': u'DOID:9408', u'cell_type': u'Heart...",[Enrichr]
DCA00000003,An enrichment analysis was performed on the to...,Enrichment analysis of genes upregulated in ac...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE775],"September 19, 2017","{u'do_id': u'DOID:9408', u'cell_type': u'Heart...",[Enrichr]
DCA00000004,An enrichment analysis was performed on the to...,Enrichment analysis of genes downregulated in ...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE775],"September 19, 2017","{u'do_id': u'DOID:9408', u'cell_type': u'Heart...",[Enrichr]
DCA00000005,An enrichment analysis was performed on the to...,Enrichment analysis of genes upregulated in ac...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE775],"September 19, 2017","{u'do_id': u'DOID:9408', u'cell_type': u'Heart...",[Enrichr]
DCA00059407,An enrichment analysis was performed on the to...,Enrichment analysis of genes downregulated in ...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE775],"September 20, 2017","{u'do_id': u'DOID:9408', u'cell_type': u'Heart...",[Enrichr]


##### 2.1.4 By Metadata
Search all canned analyses with the *colon cancer* disease name.

In [6]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'disease_name': 'colon cancer'})
results.head()

Unnamed: 0_level_0,canned_analysis_description,canned_analysis_title,canned_analysis_url,datasets,date,metadata,tools
canned_analysis_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
DCA00032919,The analysis explores the gene interaction net...,Interaction network and enrichment analysis of...,http://genemania.org/#/search/mouse/Lgals6|Guc...,[GSE2178],"September 20, 2017","{u'do_id': u'DOID:219', u'cell_type': u'Intest...",[GeneMANIA]
DCA00032920,The analysis explores the gene interaction net...,Interaction network and enrichment analysis of...,http://genemania.org/#/search/mouse/Slpi|Gcnt2...,[GSE2178],"September 20, 2017","{u'do_id': u'DOID:219', u'cell_type': u'Intest...",[GeneMANIA]
DCA00033223,The analysis explores the gene interaction net...,Interaction network and enrichment analysis of...,http://genemania.org/#/search/human/RPS4Y1|NDR...,[GSE4107],"September 20, 2017","{u'do_id': u'DOID:219', u'cell_type': u'Intest...",[GeneMANIA]
DCA00033224,The analysis explores the gene interaction net...,Interaction network and enrichment analysis of...,http://genemania.org/#/search/human/FOS|SH3KBP...,[GSE4107],"September 20, 2017","{u'do_id': u'DOID:219', u'cell_type': u'Intest...",[GeneMANIA]
DCA00033763,The analysis explores the gene interaction net...,Interaction network and enrichment analysis of...,http://genemania.org/#/search/human/RPS26|RPL1...,[GSE34299],"September 20, 2017","{u'do_id': u'DOID:219', u'cell_type': u'HT29 C...",[GeneMANIA]


##### 2.1.5 Combined Search
Search all analyses generated by Enrichr on dataset GSE31106, where the geneset is upregulated.

In [7]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'tool_name': 'Enrichr',
                                 'dataset_accession': 'GSE31106',
                                 'geneset': 'upregulated'})
results.head()

Unnamed: 0_level_0,canned_analysis_description,canned_analysis_title,canned_analysis_url,datasets,date,metadata,tools
canned_analysis_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
DCA00059528,An enrichment analysis was performed on the to...,Enrichment analysis of genes upregulated in co...,http://amp.pharm.mssm.edu/Enrichr/enrich?datas...,[GSE31106],"September 20, 2017","{u'do_id': u'DOID:0050861', u'cell_type': u'Co...",[Enrichr]


## 2.2 Datasets
We can search datasets by accession, text-based search, names of tools which have analyzed them, accessions of canned analyses generated using them.

##### 2.2.1 By Accession
Search dataset GSE775.

In [8]:
results = search_datasets2tools({'object_type': 'dataset',
                                 'dataset_accession': 'GSE775'})
results.head()

Unnamed: 0_level_0,analyses,dataset_description,dataset_landing_url,dataset_title,repository_name
dataset_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
GSE775,42,Temporal analysis of acute myocardial infarcti...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Myocardial infarction time course,Gene Expression Omnibus


##### 2.2.2 By Text
Search all datasets which contain the keyword *asthma*.

In [9]:
results = search_datasets2tools({'object_type': 'dataset',
                                 'q': 'asthma'})
results.head()

Unnamed: 0_level_0,analyses,dataset_description,dataset_landing_url,dataset_title,repository_name
dataset_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
GSE43696,49,Analysis of bronchial epithelial cells from pa...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Severe asthma: bronchial epithelial cell,Gene Expression Omnibus
GSE31773,33,Analysis of circulating CD4+ and CD8+ T-cells ...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Severe asthma: circulating CD4+ and CD8+ T-cells,Gene Expression Omnibus
GSE27011,28,Analysis of white blood cells from children wi...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Asthma: white blood cells,Gene Expression Omnibus
GSE6858,7,Comparison of whole lungs of wild-type and rec...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Asthma model: lungs,Gene Expression Omnibus
GSE18965,7,Analysis of airway epithelial cells (AEC) from...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Asthmatic atopic epithelium,Gene Expression Omnibus


##### 2.2.3 By Tool
Search all datasets which have been analyzed by Enrichr.

In [10]:
results = search_datasets2tools({'object_type': 'dataset',
                                 'tool_name': 'Enrichr'})
results.head()

Unnamed: 0_level_0,analyses,dataset_description,dataset_landing_url,dataset_title,repository_name
dataset_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
GSE50588,294,One goal of human genetics is to understand ho...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,The Functional Consequences of Variation in Tr...,Gene Expression Omnibus
GSE6930,119,Analysis of Ewings sarcoma A673 cells for up t...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Cytosine arabinoside effect on Ewing's sarcoma...,Gene Expression Omnibus
GSE7002,119,Analysis of nasal epithelia exposed to various...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Formaldehyde effect on nasal epithelium: dose ...,Gene Expression Omnibus
GSE47856,119,Chemo-resistance to platinum such as cisplatin...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Expression data from cultured human ovarian ca...,Gene Expression Omnibus
GSE47150,112,Analysis of E16 primary cortical neuron cultur...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Embryonic primary cortical neuron response to ...,Gene Expression Omnibus


##### 2.3.4 By Canned Analysis
Search all datasets which have been used to generate canned analysis DCA00000002.

In [11]:
results = search_datasets2tools({'object_type': 'dataset',
                                 'canned_analysis_accession': 'DCA00000002'})
results.head()

Unnamed: 0_level_0,analyses,dataset_description,dataset_landing_url,dataset_title,repository_name
dataset_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
GSE775,42,Temporal analysis of acute myocardial infarcti...,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi...,Myocardial infarction time course,Gene Expression Omnibus


### 2.3 Tools
We can search tools by name, text-based search, accessions of analyzed datasets, accessions of canned analyses generated using them.

##### 2.3.1 By Name
Search ARCHS4.

In [12]:
results = search_datasets2tools({'object_type': 'tool',
                                 'tool_name': 'ARCHS4'})
results.head()

Unnamed: 0_level_0,analyses,articles,tool_description,tool_name
tool_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
DCT00010052,4645,[https://doi.org/10.1101/189092],ARCHS4 provides access to gene counts from HiS...,ARCHS4


##### 2.3.2 By Text
Search all tools which contain the keyword *enrichment*.

In [13]:
results = search_datasets2tools({'object_type': 'tool',
                                 'q': 'enrichment'})
results.head()

Unnamed: 0_level_0,analyses,articles,tool_description,tool_name
tool_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
DCT00004702,7758,[https://doi.org/10.1093/nar/gkw377],A comprehensive gene set enrichment analysis w...,Enrichr
DCT00000149,0,[https://doi.org/10.1093/bioinformatics/btq503],An R/C++ package to identify patterns and biol...,CoGAPS
DCT00004852,0,[https://doi.org/10.1093/nar/gkx295],A web-based tool for comprehensive statistical...,MicrobiomeAnalyst
DCT00002174,0,[https://doi.org/10.1093/bioinformatics/btw511],Translating PubMed and PMC texts to networks f...,HiPub
DCT00004565,0,[https://doi.org/10.1093/nar/gkv1216],The database of human microbial communities fr...,HPMCD


##### 2.3.3 By Dataset
Search all tools which have analyzed GEO dataset GSE775.

In [14]:
results = search_datasets2tools({'object_type': 'tool',
                                 'dataset_accession': 'GSE775'})
results.head()

Unnamed: 0_level_0,analyses,articles,tool_description,tool_name
tool_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
DCT00004702,7758,[https://doi.org/10.1093/nar/gkw377],A comprehensive gene set enrichment analysis w...,Enrichr
DCT00010043,7756,[],,L1000CDS2
DCT00003348,7435,[https://doi.org/10.1093/nar/gkq537],Biological network integration for gene priori...,GeneMANIA
DCT00010044,3879,[],,PAEA


##### 2.3.4 By Canned Analysis
Search all tools which have been used to generate canned analysis DCA00000002.

In [15]:
results = search_datasets2tools({'object_type': 'tool',
                                 'canned_analysis_accession': 'DCA00000002'})
results.head()

Unnamed: 0_level_0,analyses,articles,tool_description,tool_name
tool_accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
DCT00004702,7758,[https://doi.org/10.1093/nar/gkw377],A comprehensive gene set enrichment analysis w...,Enrichr
