# L1000CDS<sup>2</sup> API - Python script
The following notebook contains a brief walkthrough of a Python script to use the L1000CDS<sup>2</sup> API.  For more information, please contact Denis (denis.torre@mssm.edu).

### 1. Function overview

##### Inputs:
* *Required*:
    1. **inputSignatureDataframe**: a pandas DataFrame containing gene symbols on rows, differential gene expression signatures on columns, and signature values in each cell.
    2. **column**: a string containing the column name of the signature to be queried using L1000CDS<sup>2</sup>
    
 
    
* *Optional*:
    1. **searchType**: a string specifying what type of search to be performed.  Can be one of the following: *reverse*, *mimic*, or *both* (default).
    2. **nGenes**: an integer representing the number of genes to use for the signature (API doesn't always support querying ~20,000+ genes).  Genes are selected by sorting by absolute signature level.  Default 5000.

##### Outputs:
* A **Python dict** containing two elements:
    1. **links**, a pandas DataFrame containing links to the pages of the L1000CDS<sup>2</sup> results.  It contains two columns:
        * *URL*, with the links to the results pages
        * *aggravate*, a boolean column specifying whether the search has been performed to mimic (aggravate=True) or reverse (aggravate=False) the input signature.
    
    2. **signatures**, a pandas DataFrame containing information about the top signatures identified by the query.
       * The *aggravate* column denotes whether the signature mimics (*aggravate=True*) or reverses (*aggravate=False*) the input signature.
       * Other columns contains information relative to each signature, as provided by L1000CDS<sup>2</sup>.

In [110]:
# Import modules
import requests, json
import pandas as pd

# Define function
def getL1000CDS2Results(inputSignatureDataframe, column, searchType='both', nGenes=5000):

    # Define result dataframe and list
    resultSignatureDataframe = pd.DataFrame()
    linkList = []
    
    # Get search type
    if searchType == 'both':
        aggravateOpts = [True, False]
    elif searchType == 'mimic':
        aggravateOpts = [True]
    elif searchType == 'reverse':
        aggravateOpts = [False]
    else:
        raise ValueError('Option searchType must be one of "reverse", "mimic", or "both"')
    
    # Loop through search types
    for aggravate in aggravateOpts:
        
        # Get top genes, sorting by absolute signature level
        topGenes = abs(inputSignatureDataframe[column]).sort_values(ascending=False).index.tolist()[:nGenes]

        # Set data
        data = {"genes": topGenes, "vals":inputSignatureDataframe.loc[topGenes, column].tolist()}
        data['genes'] = [x.upper() for x in data['genes']]

        # Set configuration
        config = {"aggravate":aggravate, "searchMethod":"CD", "share":True, "combination":False, "db-version":"latest"}
        payload = {"data":data, "config":config}
        headers = {'content-type':'application/json'}

        # Perform request
        r = requests.post('http://amp.pharm.mssm.edu/L1000CDS2/query',data=json.dumps(payload),headers=headers)
        resCD = r.json()

        # Add URL
        resCD['URL'] = 'http://amp.pharm.mssm.edu/L1000CDS2/#/result/' + resCD['shareId']

        # Get signature dataframe
        signatureDataframe = pd.DataFrame(resCD['topMeta']).drop('overlap', axis=1).replace('-666', '')

        # Add aggravate column
        signatureDataframe['aggravate'] = aggravate

        # Concatenate
        resultSignatureDataframe = pd.concat([resultSignatureDataframe, signatureDataframe])

        # Add link
        linkList.append({'URL': resCD['URL'], 'aggravate': aggravate})

    # Convert link list to dataframe
    linkDataframe = pd.DataFrame(linkList)

    # Create result dict
    resultDict = {'signatures': resultSignatureDataframe, 'links': linkDataframe}

    # Return dictionary
    return resultDict

### 2. Example Usage

In [111]:
# Read signature dataframe
signatureDataframe = pd.read_table('../../hiv-analysis/hiv-cell-line-analysis/f5-characteristic_direction.dir/hiv_cell_line-combat_cd.txt', index_col='gene_symbol')
signatureDataframe.head()

Unnamed: 0_level_0,12-24h,48h,6h
gene_symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A1BG,-0.015604,0.006413,-0.013468
A1CF,-0.005026,0.00667,-0.00344
A2M,-0.014543,-0.011545,-0.009034
A2M-AS1,0.006655,0.005796,-0.006928
A2ML1,0.005586,0.000972,-0.000915


In [114]:
# Run L1000CDS2
resultDict = getL1000CDS2Results(signatureDataframe, column='48h')
print resultDict.keys()

['signatures', 'links']


In [115]:
# Links dataframe
pd.set_option('max.colwidth', -1) # show full results
resultDict['links']

Unnamed: 0,URL,aggravate
0,http://amp.pharm.mssm.edu/L1000CDS2/#/result/590b8454b09d47a600bc9504,True
1,http://amp.pharm.mssm.edu/L1000CDS2/#/result/590b8456b09d47a600bc9506,False


In [125]:
# Get signature dataframe
signatureDataframe = resultDict['signatures']

# Split intwo: reverse and mimic
mimicSignatures = signatureDataframe[signatureDataframe['aggravate'] == True].sort_values(by='score', ascending=False)
reverseSignatures = signatureDataframe[signatureDataframe['aggravate'] == False].sort_values(by='score', ascending=False)

In [126]:
# Mimic
mimicSignatures.head()

Unnamed: 0,aggravate,cell_id,drugbank_id,pert_desc,pert_dose,pert_dose_unit,pert_id,pert_time,pert_time_unit,pubchem_id,score,sig_id
49,True,HA1E,,AT-7519,3.33,um,BRD-K13390322,24,h,11338033,0.6867,LJP006_HA1E_24H:BRD-K13390322:3.33
48,True,HA1E,,JNK-9L,3.33,um,BRD-K19220233,24,h,59588070,0.6865,LJP006_HA1E_24H:BRD-K19220233:3.33
47,True,MDAMB231,,CGP-60474,10.0,um,BRD-K79090631,24,h,644215,0.686,LJP006_MDAMB231_24H:BRD-K79090631:10
46,True,MCF10A,,mitoxantrone,0.37,um,BRD-K21680192,24,h,5458171,0.6856,LJP005_MCF10A_24H:BRD-K21680192:0.37
45,True,LNCAP,,CGP-60474,10.0,um,BRD-K79090631,24,h,644215,0.6856,LJP006_LNCAP_24H:BRD-K79090631:10


In [127]:
# Reverse
reverseSignatures.head()

Unnamed: 0,aggravate,cell_id,drugbank_id,pert_desc,pert_dose,pert_dose_unit,pert_id,pert_time,pert_time_unit,pubchem_id,score,sig_id
0,False,ASC,,,10.0,um,BRD-K91370081,24.0,h,253602,1.2519,CPC016_ASC_24H:BRD-K91370081:10.0
1,False,HA1E,,Emetine Dihydrochloride Hydrate (74),0.63,um,BRD-K01976263,24.0,h,11957493,1.225,CPC006_HA1E_24H:BRD-K01976263:0.63
2,False,PC3,,Emetine Dihydrochloride Hydrate (74),0.63,um,BRD-K01976263,24.0,h,11957493,1.2246,CPC006_PC3_24H:BRD-K01976263:0.63
3,False,A549,,Emetine Dihydrochloride Hydrate (74),0.63,um,BRD-K01976263,24.0,h,11957493,1.216,CPC006_A549_24H:BRD-K01976263:0.63
4,False,ASC,,,10.0,um,BRD-K36055864,24.0,h,6197,1.2142,CPC018_ASC_24H:BRD-K36055864:10.0
