# MAVS graph
## NCATS Hackathon 2019-09-15



Trying to reproduce the edges in this graph and retrieve the predicates: 

<img src="img/MAVS.png" width="1200">

**IMPORTANT**: Current BTE limitation is that any single query is limited to 100 results per query, so it may not entirely show all results from each source

### Initiating the package

In [1]:
# import the query module
from biothings_explorer.user_query_dispatcher import SingleEdgeQueryDispatcher
# import the hint module (suggest hits based on your input)
from biothings_explorer.hint import Hint
# import the registry module
from biothings_explorer.registry import Registry
reg = Registry()
ht = Hint()
# initialize the connect module
from biothings_explorer.user_query_dispatcher import Connect

## Find inflammasome node

In [2]:
# use the hint the module to let BioThings Explorer suggest the inputs for you
a = ht.query('inflammasome')
# the output of the hint module is grouped by semantic types
a['CellularComponent'][:3]

[{'name': 'inflammasome complex',
  'go': 'GO:0061702',
  'display': 'name(inflammasome complex) go(GO:0061702) ',
  'type': 'CellularComponent',
  'primary': {'identifier': 'go',
   'cls': 'CellularComponent',
   'value': 'GO:0061702'}},
 {'name': 'NLRP3 inflammasome complex',
  'go': 'GO:0072559',
  'display': 'name(NLRP3 inflammasome complex) go(GO:0072559) ',
  'type': 'CellularComponent',
  'primary': {'identifier': 'go',
   'cls': 'CellularComponent',
   'value': 'GO:0072559'}},
 {'name': 'NLRP1 inflammasome complex',
  'go': 'GO:0072558',
  'display': 'name(NLRP1 inflammasome complex) go(GO:0072558) ',
  'type': 'CellularComponent',
  'primary': {'identifier': 'go',
   'cls': 'CellularComponent',
   'value': 'GO:0072558'}}]

In [3]:
node_inflammasome = a['CellularComponent'][0]
node_inflammasome

{'name': 'inflammasome complex',
 'go': 'GO:0061702',
 'display': 'name(inflammasome complex) go(GO:0061702) ',
 'type': 'CellularComponent',
 'primary': {'identifier': 'go',
  'cls': 'CellularComponent',
  'value': 'GO:0061702'}}

## What genes are directly connected to the inflammasome?

In [6]:
a = ht.query('inflammasome')
node_inflammasome = a['CellularComponent'][0]
seqd = SingleEdgeQueryDispatcher(input_obj=node_inflammasome,
                                 output_cls='Gene',
                                 output_id='bts:entrez',
                                 registry=reg)
seqd.query()
seqd.show_all_nodes()

['GO:0061702', '338321', '22900', '837', '171389', '22861']

In [4]:
seqd = SingleEdgeQueryDispatcher(input_obj=node_inflammasome,
                                 output_cls='Gene',
                                 output_id='bts:entrez',
                                 registry=reg)
seqd.query()

In [5]:
seqd.show_all_nodes()

['GO:0061702', '338321', '22900', '837', '171389', '22861']

### No results -- switch to biothings client

In [21]:
from biothings_client import get_client
mg = get_client("gene")
g = mg.query("GO:0061702 OR GO:0097169 OR GO:0072557 OR GO:0072558 OR GO:0072559",size=1000,species="human")
len(g['hits'])

14

## Look for relationships between all pairs of genes

In [7]:
genes = ['4790',    #NFKB1
         '3553',    #IL1-beta
         '834',     #CASP1
         '114548',  #NLRP3
         '57506',   #MAVS
         '29108',   #PYCARD
         '23586',   #DDX58
         '3661'     #IRF3
        ]

seqd = SingleEdgeQueryDispatcher(input_cls="Gene",
                                 input_id="bts:entrez",
                                 values=genes,
                                 output_cls="Gene",
                                 output_id="bts:entrez",
                                 registry=reg)
seqd.query()

In [8]:
len(seqd.show_all_nodes())

897

In [9]:
seqd.display_edge_info("3661","57506")

{0: {'info': {'@type': 'Gene',
   'bts:umls': ['C1864770'],
   '$source': 'semmedgene'},
  'label': 'bts:molecularlyInteractsWith',
  'source': 'semmedgene'}}

In [10]:
seqd.display_edge_info("57506","3661")

{0: {'info': {'@type': 'Gene',
   'bts:umls': ['C1334139'],
   '$source': 'semmedgene'},
  'label': 'bts:molecularlyInteractsWith',
  'source': 'semmedgene'}}

In [11]:
for gene1 in genes:
    for gene2 in genes:
        if gene1==gene2:
            continue
        else:
            print("genes: "+gene1+","+gene2)
            try:
                print(seqd.display_edge_info(gene1,gene2))
            except:
                pass

genes: 4790,3553
genes: 4790,834
genes: 4790,114548
genes: 4790,57506
genes: 4790,29108
genes: 4790,23586
{0: {'info': {'bts:hgnc': ['19102'], 'bts:taxid': ['NCBITaxon:9606'], 'bts:source': ['https://data.monarchinitiative.org/ttl/string.ttl'], '@type': 'Gene', '$input': 'bts:entrez', '$source': 'biolink'}, 'label': 'bts:molecularlyInteractsWith', 'source': 'biolink'}}
genes: 4790,3661
genes: 3553,4790
{0: {'info': {'bts:hgnc': ['7794'], 'bts:taxid': ['NCBITaxon:9606'], 'bts:source': ['https://data.monarchinitiative.org/ttl/string.ttl'], '@type': 'Gene', '$input': 'bts:entrez', '$source': 'biolink'}, 'label': 'bts:molecularlyInteractsWith', 'source': 'biolink'}}
genes: 3553,834
genes: 3553,114548
{0: {'info': {'bts:hgnc': ['16400'], 'bts:taxid': ['NCBITaxon:9606'], 'bts:source': ['https://data.monarchinitiative.org/ttl/string.ttl'], '@type': 'Gene', '$input': 'bts:entrez', '$source': 'biolink'}, 'label': 'bts:molecularlyInteractsWith', 'source': 'biolink'}}
genes: 3553,57506
genes: 355

#### NOTE: there is currently a bug here in batch mode in which the predicate is not shown. Kevin is fixing...
Therefore, try the one-by-one method below...

## Check relationship between MAVS and other genes
Trying this because of the above-mentioned bug...

First, get gene neighbors to MAVS

In [24]:
gene = "57506"
seqd = SingleEdgeQueryDispatcher(input_cls="Gene",
                                input_id="bts:entrez",
                                values=gene,
                                output_cls="Gene",
                                output_id="bts:entrez",
                                registry=reg)
seqd.query()

In [25]:
for gene in genes:
    print("A: MAVS -> "+gene)
    try:
        print(seqd.display_edge_info("57506",gene))
    except:
        pass

A: MAVS -> 4790
{0: {'info': {'bts:hgnc': ['7794'], 'bts:taxid': ['NCBITaxon:9606'], 'bts:source': ['https://data.monarchinitiative.org/ttl/string.ttl'], '@type': 'Gene', '$input': 'bts:entrez', '$source': 'biolink'}, 'label': 'bts:molecularlyInteractsWith', 'source': 'biolink'}}
A: MAVS -> 3553
A: MAVS -> 834
A: MAVS -> 114548
{0: {'info': {'bts:hgnc': ['16400'], 'bts:taxid': ['NCBITaxon:9606'], 'bts:source': ['https://data.monarchinitiative.org/ttl/string.ttl'], '@type': 'Gene', '$input': 'bts:entrez', '$source': 'biolink'}, 'label': 'bts:molecularlyInteractsWith', 'source': 'biolink'}}
A: MAVS -> 57506
A: MAVS -> 29108
A: MAVS -> 23586
{0: {'info': {'bts:hgnc': ['19102'], 'bts:taxid': ['NCBITaxon:9606'], 'bts:source': ['https://data.monarchinitiative.org/ttl/biogrid.ttl', 'https://data.monarchinitiative.org/ttl/string.ttl'], '@type': 'Gene', '$input': 'bts:entrez', '$source': 'biolink'}, 'label': 'bts:molecularlyInteractsWith', 'source': 'biolink'}}
A: MAVS -> 3661
{0: {'info': {'@t

Next, do the reverse query, i.e., iterate through all genes and look for MAVS

In [26]:
for gene in genes:
    seqd = SingleEdgeQueryDispatcher(input_cls="Gene",
                                     input_id="bts:entrez",
                                     values=gene,
                                     output_cls="Gene",
                                     output_id="bts:entrez",
                                     registry=reg)
    seqd.query()
    print("A: "+gene+" -> MAVS")
    try:
        print(seqd.display_edge_info(gene,"57506"))
    except:
        pass

A: 4790 -> MAVS
A: 3553 -> MAVS
A: 834 -> MAVS
A: 114548 -> MAVS
{0: {'info': {'bts:hgnc': ['29233'], 'bts:taxid': ['NCBITaxon:9606'], 'bts:source': ['https://data.monarchinitiative.org/ttl/string.ttl'], '@type': 'Gene', '$input': 'bts:entrez', '$source': 'biolink'}, 'label': 'bts:molecularlyInteractsWith', 'source': 'biolink'}}
A: 57506 -> MAVS
A: 29108 -> MAVS
A: 23586 -> MAVS
{0: {'info': {'bts:hgnc': ['29233'], 'bts:taxid': ['NCBITaxon:9606'], 'bts:source': ['https://data.monarchinitiative.org/ttl/biogrid.ttl', 'https://data.monarchinitiative.org/ttl/string.ttl'], '@type': 'Gene', '$input': 'bts:entrez', '$source': 'biolink'}, 'label': 'bts:molecularlyInteractsWith', 'source': 'biolink'}}
A: 3661 -> MAVS
{0: {'info': {'bts:hgnc': ['29233'], 'bts:taxid': ['NCBITaxon:9606'], 'bts:source': ['https://data.monarchinitiative.org/ttl/biogrid.ttl', 'https://data.monarchinitiative.org/ttl/string.ttl'], '@type': 'Gene', '$input': 'bts:entrez', '$source': 'biolink'}, 'label': 'bts:molecularly

**Ask Kevin** what is the API call to get the biolink data, how I would have found that out ("semmedgene" is a key in the config.py file, but "biolink" is not), and should that be added directly to the results?