The actual Jupyter Notebook version could be found [here](https://github.com/biothings/JSON-LD_BioThings_API_DEMO/blob/master/src/Crawling%20API%20with%20JSON-LD.ipynb)

## Task Description

**Input**: Variant HGVS ID (e.g. chr6:g.26093141G>A)

1. Retrieve Entrez Gene ID(s) **set1g** related to HGVS ID(s)
2. Retrieve Wikipathways ID(s) **set1p** in which Entrez Gene ID(s) set1g are involved
3. Retrieve Other Entrez Gene ID(s) **set2g** which are included in Wikipathways ID(s) set1p
4. Retrieve Uniprot ID(s) **set2u** which correspond to Entrez Gene ID(s) set2g
5. Retrieve Drug Inchi Key(s) **set1d** which target Uniprot ID(s) set2u

**Output**: Available drugs targetting genes/pathways related to the input HGVS ID

### Requirements

1. Clone the demo repo and run the code under **'src'** folder. **JSON-LD_BioThings_API_DEMO** Repo stores all codes used for the paper. The repo could be found at [github](https://github.com/biothings/JSON-LD_BioThings_API_DEMO). In this demo code, it uses python code **'biothings_helper'**, **'biothings'** and **'jsonld_processor'**. 

### Input

In [1]:
# input a list of variants
variant_list = ['chr6:g.26093141G>A', 'chr12:g.111351981C>T']

### Import biothings library

In [2]:
# IdListHandler class convert a list of IDs from a given input type to another list of IDs from a given output type
from biothings import IdListHandler
ih = IdListHandler()
# dict2list help convert the dictionary output to list
from biothings import dict2list

### Step 1: HGVS ID >- Entrez Gene ID

In [3]:
gene_result = ih.list_handler(input_id_list=variant_list, 
                              _input_type='http://identifiers.org/hgvs/', 
                              _output_type='http://identifiers.org/ncbigene/', 
                              uri=True)
print(gene_result)
set1g = dict2list(gene_result)

{'chr12:g.111351981C>T': [{'source': 'myvariant.info', 'entrez_gene_id': '4633'}], 'chr6:g.26093141G>A': [{'source': 'myvariant.info', 'entrez_gene_id': '3077'}]}


### Step 2: Entrez Gene ID >- Wikipathways ID

In [4]:
wikipathway_result = ih.list_handler(input_id_list=set1g, 
                                     _input_type='http://identifiers.org/ncbigene/', 
                                     _output_type='http://identifiers.org/wikipathways/', 
                                     uri=True)
print(wikipathway_result)
set1p = dict2list(wikipathway_result)

{'3077': [{'wikipathway_id': 'WP3924', 'source': 'mygene.info'}], '4633': [{'wikipathway_id': ['WP3888', 'WP2406', 'WP383', 'WP289'], 'source': 'mygene.info'}]}


### Step 3: Wikipathways ID >- Entrez Gene ID

In [5]:
gene_result2 = ih.list_handler(input_id_list=set1p, 
                               _input_type='http://identifiers.org/wikipathways/', 
                               _output_type='http://identifiers.org/ncbigene/', 
                               uri=True)
set2g = dict2list(gene_result2)
print(len(set2g))

453


### Step 4: Entrez Gene ID >- Uniprot ID

In [6]:
uniprot_result = ih.list_handler(input_id_list=set2g, 
                                 _input_type='http://identifiers.org/ncbigene/', 
                                 _output_type='http://identifiers.org/uniprot/', 
                                 uri=True)
set1u = dict2list(uniprot_result)
print(len(set1u))

458


### Step 5: Uniprot ID >- Inchi Key

In [7]:
inchi_result = ih.list_handler(input_id_list=set1u,
                              _input_type='http://identifiers.org/uniprot/',
                              _output_type='http://identifiers.org/inchikey/',
                              uri=True)
inchi_list = dict2list(inchi_result)
print(len(inchi_list))
print(inchi_list[0:5])

466
['ACPOUJIDANTYHO-UHFFFAOYSA-N', 'AECPTICWHONWNW-UHFFFAOYSA-N', 'AFSDNFLWKVMVRB-UHFFFAOYSA-N', 'AMNKRBRQQAMACZ-UHFFFAOYSA-N', 'APYXQTXFRIDSGE-UHFFFAOYSA-N']


### Breakdown of list_handler function 

The following code shows each step involved in list_handler function demonstrated above using entrez gene id-> wikipathway transformation as an example. 

[Metadata information about BioThings API(config)](https://github.com/biothings/JSON-LD_BioThings_API_DEMO/blob/master/src/config.py)

#### Step 1: Specify input and output

In [8]:
input_value = '3077'
input_type='http://identifiers.org/ncbigene/'
output_type='http://identifiers.org/wikipathways/'

#### Step 2: Iterate through API metadata info, and find corresponding API based on input & output

In [9]:
from config import AVAILABLE_API_ENDPOINTS
from jsonld_processor import flatten_doc
from biothings_helper import construct_url, find_id_from_uri, find_value_from_output_type

In [10]:
# convert to internal input name and output name
input_name = find_id_from_uri(input_type)
output_name = find_id_from_uri(output_type)
# look up api in api metadata info
endpoint_list = []
for i in range(0, len(AVAILABLE_API_ENDPOINTS)):
    if input_name in AVAILABLE_API_ENDPOINTS[i]["input"] and output_name in AVAILABLE_API_ENDPOINTS[i]["output"]:
        endpoint_list.append(i)
print(AVAILABLE_API_ENDPOINTS[endpoint_list[0]]["api"])

mygene.info


#### Step 3: Make API call

In [11]:
# construct url based on metadata info
url = construct_url(endpoint_list[0], input_value, input_name)
# make API call
import requests
doc = requests.get(url).json()
#for better display in ipython notebook, we are not printing the whole json doc here
#the following code could be used to display the json_doc returned from api call
# print(doc)

#### Step 4: Transform JSON doc to JSON-LD doc and Nquads format

In [12]:
# load context file
import json
json_doc = flatten_doc(doc)
context = json.load(open(AVAILABLE_API_ENDPOINTS[endpoint_list[0]]["jsonld"]))
# construct json-ld doc
json_doc.update(context)
# transform json-ld doc to nquads format
from pyld import jsonld
t = jsonld.JsonLdProcessor()
nquads = t.parse_nquads(jsonld.to_rdf(json_doc, {'format': 'application/nquads'}))['@default']
print(nquads[1])
# for better display in ipython notebook, we are not printing the whole nquads doc here\
# the following code could be used to display the whole nquads doc
# print(nquads)

{'object': {'type': 'literal', 'datatype': 'http://www.w3.org/2001/XMLSchema#string', 'value': 'ENSP00000259699'}, 'subject': {'type': 'blank node', 'value': '_:b0'}, 'predicate': {'type': 'IRI', 'value': 'http://identifiers.org/ensembl.protein/'}}


#### Step 5: Fetch value using URI from Nquads format

In [13]:
value_list = []
for item in nquads:
    if item['predicate']['value'] == output_type:
        value_list.append(item['object']['value'])
value = list(set(value_list))
print(value)

['WP3924']
