# Notebook to explore treatments for mTORopathies during NCATS Hackathon 2019-09-15

This is one of the domain areas that we seek to explore during the Sep 2019 Hackathon in Seattle.  The Subject Matter Expert (SME) graph that we are trying to reconstruct using BioThings Explorer (BTE) is shown below:

![title](img/Smith-Kingsmore.jpeg)

## BTE configuration

### Important prerequisite
1. The package doesn't work with the newest version of Jupyter Notebook, run the following commands in your terminal before initiating the Notebook
2. pip install notebook==5.7.5
3. pip install tornado==4.5.3

### Install dependencies

Don't worry about the error messages during installation, you will be fine.

In [1]:
# uncomment the following line if you haven't installed bte_schema
# !pip install git+https://github.com/kevinxin90/bte_schema#egg=bte_schema

In [2]:
# uncomment the following line if you haven't installed biothings_schema
#pip install git+https://github.com/biothings/biothings_schema.py#egg=biothings_schema.py

### Initiating the package

In [3]:
# import the query module
from biothings_explorer.user_query_dispatcher import SingleEdgeQueryDispatcher
# import the hint module (suggest hits based on your input)
from biothings_explorer.hint import Hint
# import the registry module
from biothings_explorer.registry import Registry
reg = Registry()
ht = Hint()

## Find disease nodes

In [4]:
# use the hint the module to let BioThings Explorer suggest the inputs for you
a = ht.query('smith kingsmore')
# the output of the hint module is grouped by semantic types
a

{'Gene': [],
 'SequenceVariant': [],
 'ChemicalSubstance': [],
 'DiseaseOrPhenotypicFeature': [{'mondo': 'MONDO:0014716',
   'umls': 'C4225259',
   'name': 'macrocephaly-intellectual disability-neurodevelopmental disorder-small thorax syndrome',
   'display': 'mondo(MONDO:0014716) umls(C4225259) name(macrocephaly-intellectual disability-neurodevelopmental disorder-small thorax syndrome) ',
   'type': 'DiseaseOrPhenotypicFeature',
   'primary': {'identifier': 'mondo',
    'cls': 'DiseaseOrPhenotypicFeature',
    'value': 'MONDO:0014716'}}],
 'Pathway': [],
 'MolecularFunction': [],
 'CellularComponent': [],
 'BiologicalProcess': [],
 'Anatomy': [],
 'PhenotypicFeature': []}

In [5]:
node_sks = a['DiseaseOrPhenotypicFeature'][0]
node_sks

{'mondo': 'MONDO:0014716',
 'umls': 'C4225259',
 'name': 'macrocephaly-intellectual disability-neurodevelopmental disorder-small thorax syndrome',
 'display': 'mondo(MONDO:0014716) umls(C4225259) name(macrocephaly-intellectual disability-neurodevelopmental disorder-small thorax syndrome) ',
 'type': 'DiseaseOrPhenotypicFeature',
 'primary': {'identifier': 'mondo',
  'cls': 'DiseaseOrPhenotypicFeature',
  'value': 'MONDO:0014716'}}

In [6]:
# use the hint the module to let BioThings Explorer suggest the inputs for you
a = ht.query('familial focal epilepsy')
# the output of the hint module is grouped by semantic types
a

{'Gene': [],
 'SequenceVariant': [],
 'ChemicalSubstance': [],
 'DiseaseOrPhenotypicFeature': [{'mondo': 'MONDO:0000215',
   'mesh': 'C565785',
   'name': 'epilepsy, familial focal, with variable foci',
   'display': 'mondo(MONDO:0000215) mesh(C565785) name(epilepsy, familial focal, with variable foci) ',
   'type': 'DiseaseOrPhenotypicFeature',
   'primary': {'identifier': 'mondo',
    'cls': 'DiseaseOrPhenotypicFeature',
    'value': 'MONDO:0000215'}},
  {'mondo': 'MONDO:0020310',
   'name': 'familial focal epilepsy with variable foci',
   'display': 'mondo(MONDO:0020310) name(familial focal epilepsy with variable foci) ',
   'type': 'DiseaseOrPhenotypicFeature',
   'primary': {'identifier': 'mondo',
    'cls': 'DiseaseOrPhenotypicFeature',
    'value': 'MONDO:0020310'}},
  {'mondo': 'MONDO:0024556',
   'umls': 'C1858477',
   'name': 'epilepsy, familial focal, with variable foci 1',
   'display': 'mondo(MONDO:0024556) umls(C1858477) name(epilepsy, familial focal, with variable foci 1

#### NOTE: the second result is also a reasonable node for FFEVF

In [7]:
node_ffevf = a['DiseaseOrPhenotypicFeature'][0]
node_ffevf

{'mondo': 'MONDO:0000215',
 'mesh': 'C565785',
 'name': 'epilepsy, familial focal, with variable foci',
 'display': 'mondo(MONDO:0000215) mesh(C565785) name(epilepsy, familial focal, with variable foci) ',
 'type': 'DiseaseOrPhenotypicFeature',
 'primary': {'identifier': 'mondo',
  'cls': 'DiseaseOrPhenotypicFeature',
  'value': 'MONDO:0000215'}}

## Are there any compounds directly connected to either disease?

### First test SKS

In [8]:
seqd = SingleEdgeQueryDispatcher(input_obj=node_sks,
                                 output_cls='ChemicalSubstance',
                                 output_id='bts:chembl',
                                 registry=reg)
seqd.query()

In [9]:
seqd.output_ids

{}

### Next test FFEVF

In [14]:
seqd = SingleEdgeQueryDispatcher(input_obj=node_ffevf,
                                 output_cls='ChemicalSubstance',
                                 output_id='bts:chembl',
                                 registry=reg)
seqd.query()

In [22]:
seqd.to_json()

{'directed': True,
 'multigraph': True,
 'graph': {},
 'nodes': [{'type': 'DiseaseOrPhenotypicFeature',
   'identifier': 'bts:mondo',
   'level': 1,
   'equivalent_ids': {'bts:mondo': ['MONDO:0000215'],
    'bts:doid': [],
    'bts:bfo': [],
    'bts:cohd': [],
    'bts:hp': [],
    'bts:kegg': [],
    'bts:meddra': [],
    'bts:medgen': [],
    'bts:mesh': ['C565785'],
    'bts:omim': [],
    'bts:umls': []},
   'id': 'MONDO:0000215'},
  {'identifier': 'bts:chembl',
   'type': 'ChemicalSubstance',
   'level': 2,
   'equivalent_ids': {'bts:inchi': ['InChI=1S/C12H9Cl2NO3/c1-3-12(2)10(16)15(11(17)18-12)9-5-7(13)4-8(14)6-9/h3-6H,1H2,2H3'],
    'bts:inchikey': ['FSCWZHGZWWDELK-UHFFFAOYSA-N'],
    'bts:rxcui': [],
    'bts:smiles': ['CC1(OC(=O)N(C1=O)c2cc(Cl)cc(Cl)c2)C=C'],
    'bts:pubchem': [39676],
    'bts:chembl': ['CHEMBL513221'],
    'bts:drugbank': [],
    'bts:unii': [],
    'bts:mesh': ['C025643']},
   'id': 'CHEMBL513221'},
  {'identifier': 'bts:chembl',
   'type': 'ChemicalSubst

There are three compounds listed as "associatedWith" FFEVF.  All list MONDO as a source, but I don't see the relationships at https://monarchinitiative.org/disease/MONDO:0000215. ***Ask Kevin***

* CHEMBL513221: Vinclozolin https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL513221/
* CHEMBL590: MENADIONE https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL590/
* CHEMBL98: Vorinostat https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL98/


Regardless, none of these three seem to have any relevance to the SKS/FFEVF graph.

## Multi-hop query through a gene

In [27]:
from biothings_explorer.user_query_dispatcher import MultiEdgeQueryDispatcher

### This is a weird example below -- I start the edge template with a chemical substance, but the input node was a disease...  ***Ask Kevin***

In [25]:
edges = [('ChemicalSubstance', None, 'Gene'), ('Gene', None, 'DiseaseOrPhenotypicFeature')]
meqd = MultiEdgeQueryDispatcher(input_obj=node_sks,
                                edges=edges,
                                registry=reg)
meqd.query()

start to query for associations between ChemicalSubstance and Gene...
finished! Find 2 hits.
start to query for associations between Gene and DiseaseOrPhenotypicFeature...
finished! Find 143 hits.


In [26]:
meqd.show_all_edges()

[('MONDO:0014716', '2475'),
 ('MONDO:0014716', '2475'),
 ('2475', 'MONDO:0024462'),
 ('2475', 'MONDO:0017884'),
 ('2475', 'MONDO:0017884'),
 ('2475', 'MONDO:0011818'),
 ('2475', 'MONDO:0011818'),
 ('2475', 'MONDO:0014716'),
 ('2475', 'MONDO:0014716'),
 ('2475', 'MONDO:0017101'),
 ('2475', 'MONDO:0017102'),
 ('2475', 'C0345967'),
 ('2475', 'C0149721'),
 ('2475', 'C0036341'),
 ('2475', 'C0334634'),
 ('2475', 'C1955869'),
 ('2475', 'C0011573'),
 ('2475', 'C0085159'),
 ('2475', 'C0345904'),
 ('2475', 'C2239176'),
 ('2475', 'C1336839'),
 ('2475', 'C2931899'),
 ('2475', 'C0687133'),
 ('2475', 'C0687133'),
 ('2475', 'C0020443'),
 ('2475', 'C0027055'),
 ('2475', 'C0699748'),
 ('2475', 'C0699748'),
 ('2475', 'C0346429'),
 ('2475', 'C0425424'),
 ('2475', 'C0153676'),
 ('2475', 'C0242184'),
 ('2475', 'C0242184'),
 ('2475', 'C1273937'),
 ('2475', 'C0027651'),
 ('2475', 'C0178874'),
 ('2475', 'C0233794'),
 ('2475', 'C0596263'),
 ('2475', 'C0029456'),
 ('2475', 'C0038002'),
 ('2475', 'C0038435'),
 (

### Testing SKS

In [28]:
edges = [('DiseaseOrPhenotypicFeature', None, 'Gene'), ('Gene', None, 'ChemicalSubstance')]
meqd = MultiEdgeQueryDispatcher(input_obj=node_sks,
                                edges=edges,
                                registry=reg)
meqd.query()

start to query for associations between DiseaseOrPhenotypicFeature and Gene...
finished! Find 2 hits.
start to query for associations between Gene and ChemicalSubstance...
finished! Find 235 hits.


In [29]:
meqd.show_all_edges()

[('MONDO:0014716', '2475'),
 ('MONDO:0014716', '2475'),
 ('2475', 'CHEMBL1200686'),
 ('2475', 'CHEMBL1200686'),
 ('2475', 'CHEMBL413'),
 ('2475', 'CHEMBL413'),
 ('2475', 'CHEMBL413'),
 ('2475', 'CHEMBL1801204'),
 ('2475', 'CHEMBL1908360'),
 ('2475', 'CHEMBL1908360'),
 ('2475', 'CHEMBL1908360'),
 ('2475', 'CHEMBL269259'),
 ('2475', 'CHEMBL2326966'),
 ('2475', 'CHEMBL3545366'),
 ('2475', 'CHEMBL1201182'),
 ('2475', 'CHEMBL1201182'),
 ('2475', 'CHEMBL1201182'),
 ('2475', 'CHEMBL1922094'),
 ('2475', 'CHEMBL3545151'),
 ('2475', 'CHEMBL1684984'),
 ('2475', 'CHEMBL1879463'),
 ('2475', 'CHEMBL592445'),
 ('2475', 'CHEMBL3545056'),
 ('2475', 'CHEMBL1431'),
 ('2475', 'CHEMBL3544999'),
 ('2475', 'CHEMBL1236962'),
 ('2475', 'CHEMBL1234354'),
 ('2475', 'CHEMBL573339'),
 ('2475', 'CHEMBL1081312'),
 ('2475', 'CHEMBL850'),
 ('2475', 'CHEMBL2103839'),
 ('2475', 'CHEMBL2103839'),
 ('2475', 'CHEMBL2103839'),
 ('2475', 'CHEMBL15245'),
 ('2475', 'CHEMBL1256459'),
 ('2475', 'CHEMBL1374379'),
 ('2475', 'CHEMB

gene `2475` is mTOR -- great!  This association comes from both DisGeNet and MONDO. BTE reports that both resources use the relatively generic predicate of `associatedWith`. MONDO actually reports that the relationship is as a causative gene (https://monarchinitiative.org/disease/MONDO:0014716#gene-causal).  DisGeNet reports "Gene-Disease association".

In [31]:
meqd.display_edge_info('MONDO:0014716', '2475')

{0: {'info': {'bts:entrez': [2475],
   '@type': 'Gene',
   '$input': 'bts:umls',
   '$source': 'disgenet'},
  'label': 'bts:associatedWith',
  'source': 'disgenet'},
 1: {'info': {'bts:hgnc': ['3942'],
   'bts:source': ['https://data.monarchinitiative.org/ttl/omim.ttl',
    'https://data.monarchinitiative.org/ttl/orphanet.ttl',
    'https://data.monarchinitiative.org/ttl/clinvar.nt'],
   'bts:publication': [{'id': 'PMID:25851998', 'label': None},
    {'id': 'PMID:27159400', 'label': None},
    {'id': 'PMID:27753196', 'label': None},
    {'id': 'PMID:26542245', 'label': None}],
   'bts:taxid': ['NCBITaxon:9606'],
   '@type': 'Gene',
   '$input': 'bts:mondo',
   '$source': 'biolink'},
  'label': 'bts:associatedWith',
  'source': 'biolink'}}

----------
# boilerplate below
----------

## Feature 1: Single Hop Query
Query from a specific biological entity (e.g. entrezgene 1017) to other semantic types (e.g. ChemicalSubstance)

### The example below will query all ChemicalSubstances related to Entrezgene 1017

### Step1: Get your input

In [3]:
# use the hint the module to let BioThings Explorer suggest the inputs for you
a = ht.query('smith kingsmore')
# the output of the hint module is grouped by semantic types
a

{'Gene': [],
 'SequenceVariant': [],
 'ChemicalSubstance': [],
 'DiseaseOrPhenotypicFeature': [{'mondo': 'MONDO:0014716',
   'umls': 'C4225259',
   'name': 'macrocephaly-intellectual disability-neurodevelopmental disorder-small thorax syndrome',
   'display': 'mondo(MONDO:0014716) umls(C4225259) name(macrocephaly-intellectual disability-neurodevelopmental disorder-small thorax syndrome) ',
   'type': 'DiseaseOrPhenotypicFeature',
   'primary': {'identifier': 'mondo',
    'cls': 'DiseaseOrPhenotypicFeature',
    'value': 'MONDO:0014716'}}],
 'Pathway': [],
 'MolecularFunction': [],
 'CellularComponent': [],
 'BiologicalProcess': [],
 'Anatomy': [],
 'PhenotypicFeature': []}

In [7]:
# use the hint the module to let BioThings Explorer suggest the inputs for you
a = ht.query('familial focal epilepsy')
# the output of the hint module is grouped by semantic types
a

{'Gene': [],
 'SequenceVariant': [],
 'ChemicalSubstance': [],
 'DiseaseOrPhenotypicFeature': [{'mondo': 'MONDO:0000215',
   'mesh': 'C565785',
   'name': 'epilepsy, familial focal, with variable foci',
   'display': 'mondo(MONDO:0000215) mesh(C565785) name(epilepsy, familial focal, with variable foci) ',
   'type': 'DiseaseOrPhenotypicFeature',
   'primary': {'identifier': 'mondo',
    'cls': 'DiseaseOrPhenotypicFeature',
    'value': 'MONDO:0000215'}},
  {'mondo': 'MONDO:0020310',
   'name': 'familial focal epilepsy with variable foci',
   'display': 'mondo(MONDO:0020310) name(familial focal epilepsy with variable foci) ',
   'type': 'DiseaseOrPhenotypicFeature',
   'primary': {'identifier': 'mondo',
    'cls': 'DiseaseOrPhenotypicFeature',
    'value': 'MONDO:0020310'}},
  {'mondo': 'MONDO:0024556',
   'umls': 'C1858477',
   'name': 'epilepsy, familial focal, with variable foci 1',
   'display': 'mondo(MONDO:0024556) umls(C1858477) name(epilepsy, familial focal, with variable foci 1

In [4]:
# select your input object
input_obj = a['Gene'][0]
input_obj

{'entrez': '207',
 'name': 'AKT serine/threonine kinase 1',
 'symbol': 'AKT1',
 'taxonomy': 9606,
 'umls': 'C0812228',
 'display': 'entrez(207) name(AKT serine/threonine kinase 1) symbol(AKT1) taxonomy(9606) umls(C0812228) ',
 'type': 'Gene',
 'primary': {'identifier': 'entrez', 'cls': 'Gene', 'value': '207'}}

### Step2: Make Single Hop Query

A couple required parameters in order to perform the query:
1. input_obj: This is the object which serves as the start point of your query.
2. output_cls: This is the output class which serves as the end point of your query. If you don't specify this parameter, this tool will automatically search all possible output classes for you.
3. output_id: This is optional. The identifier type you want the output to be.
4. pred: This is also optional. The predicate linking your input and output. If you don't specify, this tool will automatically search all possible linkages.

In [5]:
seqd = SingleEdgeQueryDispatcher(input_obj=input_obj,
                                 output_cls='ChemicalSubstance',
                                 output_id='bts:chembl',
                                 registry=reg)
seqd.query()

### Step3: Understand the results

The result of the query would be a networkx MultiDiGraph connecting from your input to all possible outputs.

#### Show results in JSON format

nodes: All nodes in the graph

links: All edges in the graph

In [6]:
seqd.to_json()

{'directed': True,
 'multigraph': True,
 'graph': {},
 'nodes': [{'type': 'Gene',
   'identifier': 'bts:entrez',
   'level': 1,
   'equivalent_ids': {'bts:ensembl': ['ENSG00000142208'],
    'bts:hgnc': ['391'],
    'bts:omim': ['164730'],
    'bts:entrez': ['207'],
    'bts:pharos': [],
    'bts:umls': ['C0812228'],
    'bts:unigene': ['Hs.525622'],
    'bts:pharmgkb': ['PA24684'],
    'bts:symbol': ['AKT1']},
   'id': '207'},
  {'identifier': 'bts:chembl',
   'type': 'ChemicalSubstance',
   'level': 2,
   'equivalent_ids': {'bts:inchi': ['InChI=1S/C10H16N5O13P3/c11-8-5-9(13-2-12-8)15(3-14-5)10-7(17)6(16)4(26-10)1-25-30(21,22)28-31(23,24)27-29(18,19)20/h2-4,6-7,10,16-17H,1H2,(H,21,22)(H,23,24)(H2,11,12,13)(H2,18,19,20)/t4-,6-,7-,10-/m1/s1'],
    'bts:inchikey': ['ZKHQWZAMYRWXGA-KQYNXXCUSA-N'],
    'bts:rxcui': ['318'],
    'bts:smiles': ['Nc1ncnc2c1ncn2[C@@H]3O[C@H](COP(=O)(O)OP(=O)(O)OP(=O)(O)O)[C@@H](O)[C@H]3O'],
    'bts:pubchem': [5957],
    'bts:chembl': ['CHEMBL14249'],
    'bts:

#### List all output ids

In [7]:
seqd.output_ids

{'ChemicalSubstance': {'chembl:CHEMBL14249': {'bts:inchi': ['InChI=1S/C10H16N5O13P3/c11-8-5-9(13-2-12-8)15(3-14-5)10-7(17)6(16)4(26-10)1-25-30(21,22)28-31(23,24)27-29(18,19)20/h2-4,6-7,10,16-17H,1H2,(H,21,22)(H,23,24)(H2,11,12,13)(H2,18,19,20)/t4-,6-,7-,10-/m1/s1'],
   'bts:inchikey': ['ZKHQWZAMYRWXGA-KQYNXXCUSA-N'],
   'bts:rxcui': ['318'],
   'bts:smiles': ['Nc1ncnc2c1ncn2[C@@H]3O[C@H](COP(=O)(O)OP(=O)(O)OP(=O)(O)O)[C@@H](O)[C@H]3O'],
   'bts:pubchem': [5957],
   'bts:chembl': ['CHEMBL14249'],
   'bts:drugbank': ['DB00171'],
   'bts:unii': ['8L70Q75FXE'],
   'bts:mesh': ['D000255']},
  'chembl:CHEMBL1200978': {'bts:chembl': ['CHEMBL1200978']},
  'chembl:CHEMBL23552': {'bts:inchi': ['InChI=1S/C6H16O18P4/c7-1-3(21-25(9,10)11)2(8)5(23-27(15,16)17)6(24-28(18,19)20)4(1)22-26(12,13)14/h1-8H,(H2,9,10,11)(H2,12,13,14)(H2,15,16,17)(H2,18,19,20)/t1-,2-,3-,4+,5-,6-/m0/s1'],
   'bts:inchikey': ['CIPFCGZLFXVXBG-CNWJWELYSA-N'],
   'bts:rxcui': [],
   'bts:smiles': ['O[C@H]1[C@H](OP(=O)(O)O)[C@H](O

#### Dig deep into the network

In [8]:
# List all nodes
seqd.show_all_nodes()

['207',
 'CHEMBL14249',
 'CHEMBL1200978',
 'CHEMBL23552',
 'CHEMBL428963',
 'CHEMBL259833',
 'CHEMBL428496',
 'CHEMBL1079175',
 'CHEMBL2178577',
 'CHEMBL3544935',
 'CHEMBL372764',
 'CHEMBL2177390',
 'CHEMBL85',
 'CHEMBL331237',
 'CHEMBL2219423',
 'CHEMBL462018',
 'CHEMBL584',
 'CHEMBL3545422',
 'CHEMBL3184679',
 'CHEMBL1908360',
 'CHEMBL300138',
 'CHEMBL1236962',
 'CHEMBL1229517',
 'CHEMBL3137336',
 'CHEMBL53463',
 'CHEMBL413',
 'CHEMBL3545049',
 'CHEMBL1201577',
 'CHEMBL481',
 'CHEMBL2396661',
 'CHEMBL521851',
 'CHEMBL1201182',
 'CHEMBL428647',
 'CHEMBL1922094',
 'CHEMBL888',
 'CHEMBL1336',
 'CHEMBL379300',
 'CHEMBL379218',
 'CHEMBL2219422',
 'CHEMBL494089',
 'CHEMBL3545000',
 'CHEMBL3545143',
 'CHEMBL3545134',
 'CHEMBL3545003',
 'DB05971',
 'DB01169',
 'CHEMBL125',
 'CHEMBL1448',
 'CHEMBL50',
 'C0000379',
 'C0000641',
 'C0001056',
 'C0001128',
 'C0001443',
 'C0001455',
 'C0001617',
 'C0001771',
 'C0001962',
 'C0002335',
 'C0002475',
 'C0002607',
 'C0002844',
 'C0002932',
 'C0002934',

In [9]:
# list all edges
seqd.show_all_edges()

[('207', 'CHEMBL14249'),
 ('207', 'CHEMBL14249'),
 ('207', 'CHEMBL1200978'),
 ('207', 'CHEMBL23552'),
 ('207', 'CHEMBL23552'),
 ('207', 'CHEMBL428963'),
 ('207', 'CHEMBL428963'),
 ('207', 'CHEMBL259833'),
 ('207', 'CHEMBL259833'),
 ('207', 'CHEMBL428496'),
 ('207', 'CHEMBL1079175'),
 ('207', 'CHEMBL2178577'),
 ('207', 'CHEMBL3544935'),
 ('207', 'CHEMBL372764'),
 ('207', 'CHEMBL372764'),
 ('207', 'CHEMBL2177390'),
 ('207', 'CHEMBL85'),
 ('207', 'CHEMBL331237'),
 ('207', 'CHEMBL2219423'),
 ('207', 'CHEMBL462018'),
 ('207', 'CHEMBL584'),
 ('207', 'CHEMBL3545422'),
 ('207', 'CHEMBL3184679'),
 ('207', 'CHEMBL1908360'),
 ('207', 'CHEMBL300138'),
 ('207', 'CHEMBL300138'),
 ('207', 'CHEMBL1236962'),
 ('207', 'CHEMBL1229517'),
 ('207', 'CHEMBL3137336'),
 ('207', 'CHEMBL53463'),
 ('207', 'CHEMBL413'),
 ('207', 'CHEMBL3545049'),
 ('207', 'CHEMBL1201577'),
 ('207', 'CHEMBL481'),
 ('207', 'CHEMBL2396661'),
 ('207', 'CHEMBL521851'),
 ('207', 'CHEMBL1201182'),
 ('207', 'CHEMBL428647'),
 ('207', 'CHEM

In [11]:
# see details of a specific edge
seqd.display_edge_info('207', 'C0009014')

{0: {'info': {'bts:umls': ['C0009014'],
   'bts:pubmed': ['9117117'],
   '@type': 'ChemicalSubstance',
   '$input': 'bts:umls',
   '$source': 'semmed'},
  'label': 'bts:molecularlyInteractsWith',
  'source': 'semmed'},
 1: {'info': {'bts:umls': ['C0009014'],
   'bts:pubmed': ['26632178'],
   '@type': 'ChemicalSubstance',
   '$input': 'bts:umls',
   '$source': 'semmed'},
  'label': 'bts:molecularlyInteractsWith',
  'source': 'semmed'}}

In [13]:
# see details of a specific node
seqd.display_node_info('C0009014')

{'identifier': 'bts:umls',
 'type': 'ChemicalSubstance',
 'level': 2,
 'equivalent_ids': {'bts:umls': ['C0009014']}}

### (Optional) Do it the hard way

You can still use this tool without the hint module. In this case, specify your parameters as the following:
1. input_cls: required, The semantic type of your input
2. input_id: required, The identifier type of your input
3. values: required, The input value
4. output_cls: required, The semantic type of your output
5. output_id: optional, the identifier type of your output

In [14]:
seqd = SingleEdgeQueryDispatcher(input_cls="Gene",
                                 input_id="bts:entrez",
                                 values="1019",
                                 output_cls="ChemicalSubstance",
                                 output_id="bts:chembl",
                                 registry=reg)
seqd.query()

In [15]:
seqd.G.nodes()

NodeView(('1019', 'CHEMBL3301610', 'CHEMBL189963', 'CHEMBL564829', 'CHEMBL3544942', 'CHEMBL574737', 'CHEMBL428690', 'CHEMBL445813', 'CHEMBL1802728', 'CHEMBL602937', 'CHEMBL448', 'CHEMBL23327', 'CHEMBL23254', 'CHEMBL3545083', 'CHEMBL3545110', 'CHEMBL3545283', 'CHEMBL1956070', 'CHEMBL258805', 'CHEMBL126955', 'CHEMBL65', 'CHEMBL384467', 'CHEMBL488436', 'CHEMBL1230607', 'CHEMBL384304', 'CHEMBL3544940', 'CHEMBL3545218', 'CHEMBL514800', 'CHEMBL52885', 'CHEMBL3545420', 'CHEMBL3707266', 'CHEMBL3', 'CHEMBL103', 'CHEMBL196', 'CHEMBL2403108', 'CHEMBL91829', 'CHEMBL50', 'CHEMBL502835', 'CHEMBL535', 'C0005456', 'C0007090', 'C0013227', 'C0016360', 'C0017725', 'C0018282', 'C0034243', 'C0036681', 'C0038317', 'C0040845', 'C0073096', 'C0144576', 'C0243077', 'C0608663', 'C0661318', 'C0662253', 'C0675974', 'C0729218', 'C0962559', 'C1101610', 'C1518434', 'C1522485', 'C1568660', 'C0001455', 'C0011777', 'C0012854', 'C0013879', 'C0034760', 'C0042866', 'C0061202', 'C0061275', 'C0062565', 'C0074554', 'C0085170'

## Feature 2: Multi Hop Query

Query from a specific biological entity (e.g. entrezgene 1017) to other semantic types (e.g. ChemicalSubstance) through multiple hops, e.g. ChemicalSubstance -> Gene -> Disease

In [16]:
# Initiate the multi hop module
from biothings_explorer.user_query_dispatcher import MultiEdgeQueryDispatcher

### The tutorial below create edges from Riluzole(ChemicalSubstance) to Gene, then to Disease, which is to first find genes related to riluzole, and then find diseases related to genes

### Step1: Decide on your input

In [17]:
a = ht.query('riluzole')
a

{'Gene': [],
 'SequenceVariant': [],
 'ChemicalSubstance': [{'chembl': 'CHEMBL744',
   'drugbank': 'DB00740',
   'name': 'RILUZOLE',
   'pubchem': 5070,
   'display': 'chembl(CHEMBL744) drugbank(DB00740) name(RILUZOLE) pubchem(5070) ',
   'type': 'ChemicalSubstance',
   'primary': {'identifier': 'chembl',
    'cls': 'ChemicalSubstance',
    'value': 'CHEMBL744'}},
  {'name': 'Riluzole',
   'umls': 'C0073379',
   'display': 'name(Riluzole) umls(C0073379) ',
   'type': 'ChemicalSubstance',
   'primary': {'identifier': 'umls',
    'cls': 'ChemicalSubstance',
    'value': 'C0073379'}}],
 'DiseaseOrPhenotypicFeature': [],
 'Pathway': [],
 'MolecularFunction': [],
 'CellularComponent': [],
 'BiologicalProcess': [],
 'Anatomy': [],
 'PhenotypicFeature': []}

In [18]:
input_obj = a['ChemicalSubstance'][0]
input_obj

{'chembl': 'CHEMBL744',
 'drugbank': 'DB00740',
 'name': 'RILUZOLE',
 'pubchem': 5070,
 'display': 'chembl(CHEMBL744) drugbank(DB00740) name(RILUZOLE) pubchem(5070) ',
 'type': 'ChemicalSubstance',
 'primary': {'identifier': 'chembl',
  'cls': 'ChemicalSubstance',
  'value': 'CHEMBL744'}}

### Step2: Construct your edges

the format of edges should be like [(subject1, pred1, object1), (subject2, pred2, object2), ...]

The object of an edge should always be the same as the subject of its proceeding edge

Note: You can leave pred as None. In this case, the tool will search for all potential edges

In [19]:
edges = [('ChemicalSubstance', None, 'Gene'), ('Gene', None, 'DiseaseOrPhenotypicFeature')]

### Step3: Execute the query

In [20]:
meqd = MultiEdgeQueryDispatcher(input_obj=input_obj,
                                edges=edges,
                                registry=reg)

In [21]:
meqd.query()

start to query for associations between ChemicalSubstance and Gene...
finished! Find 23 hits.
start to query for associations between Gene and DiseaseOrPhenotypicFeature...
finished! Find 741 hits.


### Step4: Explore the results

In [22]:
# show all nodes in the graph
meqd.show_all_nodes()

['CHEMBL744',
 '6331',
 '23657',
 '50801',
 '3780',
 '3783',
 '6329',
 '6334',
 '6335',
 '6336',
 '6332',
 '6326',
 '6323',
 '11280',
 '6328',
 '1083',
 'PTR1',
 '3782',
 '6530',
 '116443',
 '3781',
 '1544',
 '1543',
 'MONDO:0001823',
 'MONDO:0004981',
 'MONDO:0016333',
 'MONDO:0010086',
 'MONDO:0019171',
 'MONDO:0008685',
 'MONDO:0015470',
 'MONDO:0015263',
 'MONDO:0015281',
 'MONDO:0018054',
 'MONDO:0024562',
 'MONDO:0011376',
 'MONDO:0011377',
 'MONDO:0011003',
 'MONDO:0011001',
 'MONDO:0012061',
 'MONDO:0013530',
 'MONDO:0019490',
 'MONDO:0007240',
 'MONDO:0008646',
 'C0340493',
 'C0264913',
 'C1842820',
 'C1861983',
 'C1861984',
 'C0085610',
 'C0151636',
 'C0232216',
 'C2752013',
 'C0151878',
 'C1832603',
 'C0011071',
 'C1861987',
 'C0600228',
 'C2748542',
 'C0007820',
 'C0264912',
 'C0085615',
 'C0030252',
 'C0264886',
 'C0235480',
 'C0039070',
 'C0013404',
 'C0023211',
 'C1843738',
 'C2931401',
 'C0003811',
 'C1859062',
 'C3276240',
 'C1838527',
 'C0855329',
 'C1841659',
 'C0522

In [23]:
# Display the path connecting two nodes in the graph
meqd.show_path('CHEMBL744', 'MONDO:0011376')

[['CHEMBL744', '6331', 'MONDO:0011376'],
 ['CHEMBL744', '6331', 'MONDO:0011376'],
 ['CHEMBL744', '6331', 'MONDO:0011376'],
 ['CHEMBL744', '6331', 'MONDO:0011376']]

In [24]:
# Display detailed edge information
meqd.display_edge_info('CHEMBL744', '6331')

{0: {'info': {'bts:entrez': [6331],
   'bts:name': ['SODIUM VOLTAGE-GATED CHANNEL ALPHA SUBUNIT 5'],
   'bts:source': ['TdgClinicalTrial', 'ChemblInteractions', 'DrugBank'],
   'bts:publication': [20590601, 17139284, 9262334, 17016423, 12440368],
   '@type': 'Gene',
   '$input': 'bts:chembl',
   '$source': 'dgidb'},
  'label': 'bts:target',
  'source': 'dgidb'},
 1: {'info': {'bts:name': ['Sodium channel protein type 5 subunit alpha'],
   'bts:symbol': ['SCN5A'],
   'bts:action': ['BLOCKER'],
   '@type': 'Gene',
   '$input': 'bts:chembl',
   '$source': 'drugcentral'},
  'label': 'bts:target',
  'source': 'drugcentral'}}

In [25]:
# display detailed edge information
meqd.display_edge_info('6331', 'MONDO:0011376')

{0: {'info': {'bts:mondo': ['MONDO:0011376'],
   'bts:source': ['https://data.monarchinitiative.org/ttl/omim.ttl',
    'https://data.monarchinitiative.org/ttl/orphanet.ttl',
    'https://data.monarchinitiative.org/ttl/clinvar.nt'],
   '@type': 'DiseaseOrPhenotypicFeature',
   '$input': 'bts:entrez',
   '$source': 'biolink'},
  'label': 'bts:associatedWith',
  'source': 'biolink'},
 1: {'info': {'@type': 'DiseaseOrPhenotypicFeature',
   'bts:umls': ['C2751898'],
   '$source': 'mydisease.info'},
  'label': 'bts:umls',
  'source': 'mydisease.info'}}

In [26]:
meqd.display_node_info('MONDO:0011376')

{'identifier': 'bts:mondo',
 'type': 'DiseaseOrPhenotypicFeature',
 'level': 2,
 'equivalent_ids': {'bts:mondo': ['MONDO:0011376'],
  'bts:doid': [],
  'bts:bfo': [],
  'bts:cohd': [],
  'bts:hp': [],
  'bts:kegg': [],
  'bts:meddra': [],
  'bts:medgen': [],
  'bts:mesh': ['C567851'],
  'bts:omim': ['603829'],
  'bts:umls': ['C2751898']}}

## Feature3: Discover connections between two bio-entities

Find connections between two bio-entities through one or more intermediate nodes

In [27]:
# initialize the connect module
from biothings_explorer.user_query_dispatcher import Connect

### Step 1: Decide on your input and output

In [28]:
# search for riluzole
a = ht.query("riluzole")
a

{'Gene': [],
 'SequenceVariant': [],
 'ChemicalSubstance': [{'chembl': 'CHEMBL744',
   'drugbank': 'DB00740',
   'name': 'RILUZOLE',
   'pubchem': 5070,
   'display': 'chembl(CHEMBL744) drugbank(DB00740) name(RILUZOLE) pubchem(5070) ',
   'type': 'ChemicalSubstance',
   'primary': {'identifier': 'chembl',
    'cls': 'ChemicalSubstance',
    'value': 'CHEMBL744'}},
  {'name': 'Riluzole',
   'umls': 'C0073379',
   'display': 'name(Riluzole) umls(C0073379) ',
   'type': 'ChemicalSubstance',
   'primary': {'identifier': 'umls',
    'cls': 'ChemicalSubstance',
    'value': 'C0073379'}}],
 'DiseaseOrPhenotypicFeature': [],
 'Pathway': [],
 'MolecularFunction': [],
 'CellularComponent': [],
 'BiologicalProcess': [],
 'Anatomy': [],
 'PhenotypicFeature': []}

In [29]:
# select the input object from the hint results
input_obj = a['ChemicalSubstance'][0]
input_obj

{'chembl': 'CHEMBL744',
 'drugbank': 'DB00740',
 'name': 'RILUZOLE',
 'pubchem': 5070,
 'display': 'chembl(CHEMBL744) drugbank(DB00740) name(RILUZOLE) pubchem(5070) ',
 'type': 'ChemicalSubstance',
 'primary': {'identifier': 'chembl',
  'cls': 'ChemicalSubstance',
  'value': 'CHEMBL744'}}

In [30]:
# search for "Amyotrophic Lateral Sclerosis"
b = ht.query("Amyotrophic Lateral Sclerosis")
b

{'Gene': [{'entrez': '406238',
   'name': 'amyotrophic lateral sclerosis 7',
   'symbol': 'ALS7',
   'taxonomy': 9606,
   'umls': 'C2681918',
   'display': 'entrez(406238) name(amyotrophic lateral sclerosis 7) symbol(ALS7) taxonomy(9606) umls(C2681918) ',
   'type': 'Gene',
   'primary': {'identifier': 'entrez', 'cls': 'Gene', 'value': '406238'}},
  {'entrez': '253',
   'name': 'amyotrophic lateral sclerosis 3 (autosomal dominant)',
   'symbol': 'ALS3',
   'taxonomy': 9606,
   'umls': 'C1412368',
   'display': 'entrez(253) name(amyotrophic lateral sclerosis 3 (autosomal dominant)) symbol(ALS3) taxonomy(9606) umls(C1412368) ',
   'type': 'Gene',
   'primary': {'identifier': 'entrez', 'cls': 'Gene', 'value': '253'}},
  {'entrez': '40410',
   'name': 'Amyotrophic lateral sclerosis 2',
   'symbol': 'Als2',
   'taxonomy': 7227,
   'display': 'entrez(40410) name(Amyotrophic lateral sclerosis 2) symbol(Als2) taxonomy(7227) ',
   'type': 'Gene',
   'primary': {'identifier': 'entrez', 'cls': 'G

In [31]:
# select the output object from the hint results
output_obj = b['DiseaseOrPhenotypicFeature'][0]
output_obj

{'mondo': 'MONDO:0004976',
 'doid': 'DOID:332',
 'umls': 'C0002736',
 'mesh': 'D000690',
 'name': 'amyotrophic lateral sclerosis',
 'display': 'mondo(MONDO:0004976) doid(DOID:332) umls(C0002736) mesh(D000690) name(amyotrophic lateral sclerosis) ',
 'type': 'DiseaseOrPhenotypicFeature',
 'primary': {'identifier': 'mondo',
  'cls': 'DiseaseOrPhenotypicFeature',
  'value': 'MONDO:0004976'}}

### Step2: Find connections between your input and output

In [32]:
cc = Connect(input_obj=input_obj, output_obj=output_obj, registry=reg)

In [33]:
cc.connect()

processing step 1 ...
processing step 2 ...
query completed
Find connection


### Step3: Find how your input and output is connected

In [34]:
# find the path connecting from your input to output
cc.show_path()

[['CHEMBL744', '6332', 'MONDO:0004976']]

In [35]:
# show detailed edge information
cc.display_edge_info(start_node='CHEMBL744', end_node="6332")

{0: {'info': {'bts:entrez': [6332],
   'bts:name': ['SODIUM VOLTAGE-GATED CHANNEL ALPHA SUBUNIT 7'],
   'bts:source': ['ChemblInteractions'],
   '@type': 'Gene',
   '$input': 'bts:chembl',
   '$source': 'dgidb'},
  'label': 'bts:target',
  'source': 'dgidb'}}

In [36]:
# show detailed edge information
cc.display_edge_info(start_node="6332", end_node="MONDO:0004976")

{0: {'info': {'bts:mondo': ['MONDO:0004976'],
   'bts:source': ['https://data.monarchinitiative.org/ttl/gwascatalog.ttl'],
   '@type': 'DiseaseOrPhenotypicFeature',
   '$input': 'bts:entrez',
   '$source': 'biolink'},
  'label': 'bts:associatedWith',
  'source': 'biolink'}}

In [37]:
# return the graph connectin from input to output in JSON
cc.to_json()

{'directed': True,
 'multigraph': True,
 'graph': {},
 'nodes': [{'type': 'ChemicalSubstance',
   'identifier': 'bts:chembl',
   'level': 1,
   'equivalent_ids': {'bts:inchi': ['InChI=1S/C8H5F3N2OS/c9-8(10,11)14-4-1-2-5-6(3-4)15-7(12)13-5/h1-3H,(H2,12,13)'],
    'bts:inchikey': ['FTALBRSUTCGOEG-UHFFFAOYSA-N'],
    'bts:rxcui': ['35623'],
    'bts:smiles': ['Nc1nc2ccc(OC(F)(F)F)cc2s1'],
    'bts:pubchem': [5070],
    'bts:chembl': ['CHEMBL744'],
    'bts:drugbank': ['DB00740'],
    'bts:unii': ['7LJ087RS6F'],
    'bts:mesh': ['D019782']},
   'id': 'CHEMBL744'},
  {'identifier': 'bts:entrez',
   'type': 'Gene',
   'level': 2,
   'equivalent_ids': {'bts:ensembl': ['ENSG00000136546'],
    'bts:hgnc': ['10594'],
    'bts:omim': ['182392'],
    'bts:entrez': ['6332'],
    'bts:pharos': [],
    'bts:umls': ['C1419865'],
    'bts:unigene': ['Hs.644853', 'Hs.596087'],
    'bts:pharmgkb': ['PA35008'],
    'bts:symbol': ['SCN7A']},
   'id': '6332'},
  {'identifier': 'bts:mondo',
   'type': 'Disea