# MELODI Presto API Example Usage

In [1]:
import json
import pandas as pd
import requests
import time
from random import randint
import scipy.stats as stats
from utils import enrich, overlap, sentence

### Configure parameters

In [2]:
API_URL = "https://melodi-presto.mrcieu.ac.uk/api/"

requests.get(f"{API_URL}/status").json()

True

### How the enrichment is performed

This is a basic Fisher's exact test, using the scipy stats function.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.fisher_exact.html

In [3]:
import scipy.stats as stats

queryTripleCount,queryTripleTotal,globalTripleCount,globalTripleTotal=[10,3505,147,6533824]
oddsratio, pvalue = stats.fisher_exact([[queryTripleCount, queryTripleTotal], [globalTripleCount, globalTripleTotal]])
oddsratio,pvalue

(126.81250303259678, 3.4724305806153405e-18)

### Performance 

We can compare the performance before and after the initial query. For example, if we use a query that has not already been run, e.g. `physical activity or 123`. In a PubMed search this returns over 550,000 articles. First time it takes over 20 seconds to run, creating a data set of around 10,000 triples, second time is a few seconds.

In [5]:
r=randint(0, 1000000)
q='physical activity or '+str(r)
print(q)

def run_enrich(query_term):
    start = time.time()
    enrich_df = enrich(q)
    print(enrich_df.shape)
    end = time.time()
    t = "{:.4f}".format(end-start)
    return t
    
t1 = run_enrich(q)
t2 = run_enrich(q)
print('t1:',t1,'\nt2:',t2)

physical activity or 504382
(10381, 16)
(10381, 16)
t1: 32.0775 
t2: 5.3735


Likewise, we can run the overlap query with two new queries, and then run again with the same

In [12]:
r=randint(0, 1000000)
q1=['vitamin d or '+str(r)]
q2=['prostate cancer or '+str(r)]
print(q1,':',q2)

def run_overlap(q1,q2):
    start = time.time()
    overlap_df = overlap(q1,q2)
    print(overlap_df.shape)
    end = time.time()
    t = "{:.4f}".format(end-start)
    return t
    
t1 = run_overlap(q1,q2)
t2 = run_overlap(q1,q2)
print('t1:',t1,'\nt2:',t2)

['vitamin d or 680522'] : ['prostate cancer or 680522']
(530, 32)
(530, 32)
t1: 28.2129 
t2: 1.0195


##### Comparing performance of similar tools

To our knowledge, the only methods providing this kind of overlap analysis are Arrowsmith (http://arrowsmith.psych.uic.edu/) and MELODI (http://melodi.biocompute.org.uk/). A query of `vitamin d` and `prostate cancer` takes over 30 minutes on both platforms. 

### Comparing output

We can attempt to compare the output of the same overlap query across the three platforms. In this case, MELODI Presto data will be derived in real time, whereas data from Arrowsmith and MELODI have been pre-calculated.

MELODI - http://melodi.biocompute.org.uk/results/b1741206-90ae-490b-8580-4ad7c50f7f45/
Arrowmsith - 

In [13]:
#load the MELODI data
melodi_df=pd.read_csv('melodi_result_4534.csv')
melodi_df.head()

Unnamed: 0,name1,name2,name3,name4,name5,mean_cp,mean_odds,uniq_a,uniq_b,shared,score,treeLevel
0,Human Cell Line (cell),LOCATION_OF,Vitamin D3 Receptor (gngm),ASSOCIATED_WITH,Malignant neoplasm of prostate (neop),3.27e-22,266.11,14,14,0,28.0,2.0
1,Human Cell Line (cell),LOCATION_OF,Vitamin D3 Receptor (gngm),PREDISPOSES,Malignant neoplasm of prostate (neop),1.47e-20,265.34,14,13,0,25.07,2.0
2,Human Cell Line (cell),LOCATION_OF,Vitamin D3 Receptor (gngm),ASSOCIATED_WITH,Malignant neoplasm of prostate (neop),1.5e-09,277.61,14,6,0,8.57,2.0
3,Diagnosis (hlca),TREATS,Vitamin D Deficiency (dsyn),PREDISPOSES,Malignant neoplasm of prostate (neop),3.65e-07,215.75,4,10,0,5.6,4.0
4,Vitamin D (phsu),PREDISPOSES,Vitamin D Deficiency (dsyn),PREDISPOSES,Malignant neoplasm of prostate (neop),3.77e-16,277.63,18,10,0,15.56,21.0
