# Using ROBOKOP's Quick Question Answering Service

The expand, enrich, and similarity services each offer bite-sized approaches to interacting with ROBOKOP.  The ROBOKOP "quick" service offers a slightly more complex approach to answering questions.   We'll be using the following function to call the quick service.  Note that calling the service requires posting a question.  We'll discuss the format of the question below.

In [6]:
import requests

def quick(question):
    #url=f'http://127.0.0.1:80/api/simple/quick/'
    url=f'http://robokop.renci.org:80/api/simple/quick/'
    response = requests.post(url,json=question)
    print( f"Return Status: {response.status_code}" )
    if response.status_code == 200:
        return response.json()
    return response

## Question Format and Basic Usage

The question is a python dictionary.  It takes a key `machine_question`, which is a dictionary containing a list of `nodes` and a list of `edges`.   Each node object needs an integer `id`.   Any node can also have a `curie` specifying that node.  The edges define connections between the nodes using the identifiers of the nodes as source and targets. The following function shows how to construct a single-hop question.  It takes a specified node of `type1` and looks for any node of `type2`.

In [92]:
def make_one_step_question(type1, id1, type2):
    question = {
                'machine_question': {
                    'nodes': [
                        {
                            'id': 0,
                            'curie': id1,
                            'type': type1
                        },
                        {
                            'id': 1,
                            'type': type2
                        }
                    ],
                    'edges': [
                        {
                            'source_id': 0,
                            'target_id': 1
                        }
                    ]
                }
            }
    return question

Here, we will specify a one-hop question asking for the phenotypes associated with Fanconi Anemia (MONDO:0019391).  We first construct the question then use it to call the quick service.  In fact, this is how the expand function is implemented internally.

In [7]:
q = make_one_step_question('disease','MONDO:0019391','phenotypic_feature')
r = quick(q)

Return Status: 200


In [8]:
import json
print( json.dumps(r, indent=4))

{
    "answers": [
        {
            "id": null,
            "answerset": null,
            "natural_answer": null,
            "nodes": [
                {
                    "id": "MONDO:0019391",
                    "name": "Fanconi anemia",
                    "equivalent_identifiers": [
                        "MONDO:0019391",
                        "MEDDRA:10055206",
                        "ORPHANET:84",
                        "MEDDRA:10016218",
                        "DOID:13636",
                        "UMLS:C0015625",
                        "MESH:D005199"
                    ],
                    "type": "disease",
                    "omnicorp_article_count": 4009
                },
                {
                    "id": "HP:0005528",
                    "equivalent_identifiers": [
                        "HP:0005528",
                        "UMLS:C1855710",
                        "MEDDRA:10065553"
                    ],
                    "name": "Bone ma

The output can be simplified with a function like this:

In [11]:
import pandas as pd

def parse_answer(returnanswer):
    nodes = [answer['nodes'][1] for answer in returnanswer['answers']]
    edges = [answer['edges'][0] for answer in returnanswer['answers']]
    answers = [ {"result_id": node["id"], 
                 "result_name": node["name"], 
                 "relation": edge["relation_label"],
                 "source": edge['edge_source']}
              for node,edge in zip(nodes,edges)]
    return pd.DataFrame(answers)

In [12]:
parse_answer(r)

Unnamed: 0,relation,result_id,result_name,source
0,has phenotype,HP:0005528,Bone marrow hypocellularity,biolink.disease_get_phenotype
1,has phenotype,HP:0004810,Congenital hypoplastic anemia,biolink.disease_get_phenotype
2,has phenotype,HP:0001908,Hypoplastic anemia,biolink.disease_get_phenotype
3,has phenotype,HP:0010972,Anemia of inadequate production,biolink.disease_get_phenotype
4,has phenotype,HP:0003974,Absent radius,biolink.disease_get_phenotype
5,has phenotype,HP:0004820,Acute myelomonocytic leukemia,biolink.disease_get_phenotype
6,has phenotype,HP:0000953,Hyperpigmentation of the skin,biolink.disease_get_phenotype
7,has phenotype,HP:0001972,Macrocytic anemia,biolink.disease_get_phenotype
8,has phenotype,HP:0001876,Pancytopenia,biolink.disease_get_phenotype
9,has phenotype,HP:0001000,Abnormality of skin pigmentation,biolink.disease_get_phenotype


## Multi-step questions

It's straightforward to generalize this linear query to an N-item path.  This function constructs a question from a list of types and a list of identifiers.  The types are the types of nodes that will be traversed along the path, and the identifiers represent fixed elements in the path.  The length of types and ids should be equal, and free nodes should specify an id of None:

In [33]:
def make_N_step_question(types,curies):
    question = {
                'machine_question': {
                    'nodes': [],
                    'edges': []
                }
            }
    for i,t in enumerate(types):
        newnode = {'id': i, 'type': t}
        if curies[i] is not None:
            newnode['curie'] = curies[i]
        question['machine_question']['nodes'].append(newnode)
        if i > 0:
            question['machine_question']['edges'].append( {'source_id': i-1, 'target_id': i})
    return question

We can recapitulate our previous question with this new function like this:

In [16]:
newq = make_N_step_question(['disease','phenotypic_feature'],['MONDO:0019391',None])
q == newq

True

Now we could expand to a longer query.  The following question will start at the disease MONDO:0019391, go to a gene, and from there to a biological process or activity.

In [19]:
two_step_question = make_N_step_question(['disease','gene','biological_process_or_activity'],['MONDO:0019391',None,None])

In [20]:
two_step_answer = quick(two_step_question)

Return Status: 200


We can extract the node names along the paths with a simple function:

In [21]:
def extract_node_names(returnanswer):
    nodes = [{f'node_{i}': node['name'] for i,node in enumerate(answer['nodes'])} for answer in returnanswer['answers']]
    return pd.DataFrame(nodes)

In [23]:
two_step_nodes = extract_node_names(two_step_answer)
two_step_nodes

Unnamed: 0,node_0,node_1,node_2
0,Fanconi anemia,FANCD2,interstrand cross-link repair
1,Fanconi anemia,FANCD2,interstrand cross-link repair
2,Fanconi anemia,FANCD2,interstrand cross-link repair
3,Fanconi anemia,FANCD2,interstrand cross-link repair
4,Fanconi anemia,FANCD2,interstrand cross-link repair
5,Fanconi anemia,FANCD2,interstrand cross-link repair
6,Fanconi anemia,FANCM,interstrand cross-link repair
7,Fanconi anemia,FANCM,interstrand cross-link repair
8,Fanconi anemia,FANCA,interstrand cross-link repair
9,Fanconi anemia,FANCA,interstrand cross-link repair


At the node name level there are what appear to be several repeated paths.  In fact, these paths differ in the predicates, which we are not showing:

In [29]:
print(f"The first answer's second edge comes from {two_step_answer['answers'][0]['edges'][1]['edge_source']}")
print(f"The second answer's second edge comes from {two_step_answer['answers'][1]['edges'][1]['edge_source']}")

The first answer's second edge comes from biolink.gene_get_process_or_function
The second answer's second edge comes from quickgo.go_term_to_gene_annotation


## Specifying multiple fixed nodes

The examples above have all started from one known node and expanded from it, sometimes in multiple steps.  It's also possible to have more than one node specified.  For instance, if we wanted to look for genes that link Fanconi Aneima (MONDO:0019391) and DNA repair (GO:0006281), we could rerun our previous query, but setting the final curie, like this:

In [36]:
two_step_question_fixed_ends = \
   make_N_step_question(['disease','gene','biological_process_or_activity'],['MONDO:0019391',None,'GO:0006281'])
two_step_answer_fixed_ends = quick(two_step_question_fixed_ends)

Return Status: 200


In [37]:
extract_node_names(two_step_answer_fixed_ends)

Unnamed: 0,node_0,node_1,node_2
0,Fanconi anemia,FANCD2,DNA repair
1,Fanconi anemia,FANCD2,DNA repair
2,Fanconi anemia,FANCD2,DNA repair
3,Fanconi anemia,FANCD2,DNA repair
4,Fanconi anemia,FANCD2,DNA repair
5,Fanconi anemia,FANCD2,DNA repair
6,Fanconi anemia,FANCA,DNA repair
7,Fanconi anemia,FANCA,DNA repair
8,Fanconi anemia,FANCA,DNA repair
9,Fanconi anemia,FANCA,DNA repair


## Non-path queries

So far, we've only looked at linear paths.  But the question format is actually more general than that - we can define a path pattern generally. So for instance, in the above query, we find all genes that are linked to both FA and DNA repair.  But what if we wanted to find entities that are connected to more than two specified entities.  Here is a query-generation function for the star query:

In [68]:
def make_star_question(types,curies,shared_type):
    """Create a question to find entities of shared_type that are linked to all of the nodes specified in the
    types and curies arrays."""
    question = {
                'rebuild': True,
                'machine_question': {
                    'nodes': [],
                    'edges': []
                }
            }
    question['machine_question']['nodes'].append( {'id': 0, 'type': shared_type})
    for i,t in enumerate(types):
        newnode = {'id': i+1, 'type': t}
        if curies[i] is not None:
            newnode['curie'] = curies[i]
        question['machine_question']['nodes'].append(newnode)
        question['machine_question']['edges'].append( {'source_id': 0, 'target_id': i+1})
    return question

Suppose I have this set of GO terms, and I'd like to find genes that they all have in common:

* 'voltage-gated sodium channel activity(GO:0005248)',
* 'muscle contraction(GO:0006936)',
* 'voltage-gated ion channel activity(GO:0005244)',
* 'regulation of ion transmembrane transport(GO:0034765)',
* 'sodium ion transmembrane transport(GO:0035725)',
* 'neuronal action potential(GO:0019228)',
* 'membrane depolarization during action potential(GO:0086010)',
* 'sodium ion transport(GO:0006814)'

In [76]:
go_terms=['GO:0005248','GO:0006936','GO:0005244','GO:0034765','GO:0035725','GO:0019228','GO:0086010','GO:0006814']
types = ['biological_process_or_activity' for g in go_terms]
star_q = make_star_question(types,go_terms,'gene')
star_q

{'machine_question': {'edges': [{'source_id': 0, 'target_id': 1},
   {'source_id': 0, 'target_id': 2},
   {'source_id': 0, 'target_id': 3},
   {'source_id': 0, 'target_id': 4},
   {'source_id': 0, 'target_id': 5},
   {'source_id': 0, 'target_id': 6},
   {'source_id': 0, 'target_id': 7},
   {'source_id': 0, 'target_id': 8}],
  'nodes': [{'id': 0, 'type': 'gene'},
   {'curie': 'GO:0005248', 'id': 1, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0006936', 'id': 2, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0005244', 'id': 3, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0034765', 'id': 4, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0035725', 'id': 5, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0019228', 'id': 6, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0086010', 'id': 7, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0006814',
    'id': 8,
    'type': 'biological_process_or_activit

In [77]:
common_gene_answer = quick(star_q)

Return Status: 200


In [79]:
extract_node_names(common_gene_answer)

Unnamed: 0,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8
0,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport
1,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport
2,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport
3,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport
4,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport
5,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport
6,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport
7,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport
8,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport
9,SCN4A,voltage-gated sodium channel activity,muscle contraction,voltage-gated ion channel activity,regulation of ion transmembrane transport,sodium ion transmembrane transport,neuronal action potential,membrane depolarization during action potential,sodium ion transport


Again, we get back multiple paths with the same answer node (node_0) due to differences in edge sources.  However, we can see that there are two genes that share this set of GO terms, SCN4A and SCN7A.

We can also have more than one unspecified node.  Suppose, for instance that we wanted to do the previous query, but we also wanted to know what chemicals interact with the genes that we find.  We can do another star query, where one of our spokes is unspecified:

In [81]:
go_terms=['GO:0005248','GO:0006936','GO:0005244','GO:0034765','GO:0035725','GO:0019228','GO:0086010','GO:0006814',None]
types = ['biological_process_or_activity' for i in range(8)]+['chemical_substance']
star_q_compound = make_star_question(types,go_terms,'gene')
star_q_compound

{'machine_question': {'edges': [{'source_id': 0, 'target_id': 1},
   {'source_id': 0, 'target_id': 2},
   {'source_id': 0, 'target_id': 3},
   {'source_id': 0, 'target_id': 4},
   {'source_id': 0, 'target_id': 5},
   {'source_id': 0, 'target_id': 6},
   {'source_id': 0, 'target_id': 7},
   {'source_id': 0, 'target_id': 8},
   {'source_id': 0, 'target_id': 9}],
  'nodes': [{'id': 0, 'type': 'gene'},
   {'curie': 'GO:0005248', 'id': 1, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0006936', 'id': 2, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0005244', 'id': 3, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0034765', 'id': 4, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0035725', 'id': 5, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0019228', 'id': 6, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0086010', 'id': 7, 'type': 'biological_process_or_activity'},
   {'curie': 'GO:0006814', 'id': 8, 'type': '

In [82]:
common_gene_compound_answer = quick(star_q_compound)

Return Status: 200


In [88]:
cgc_nodes = extract_node_names(common_gene_compound_answer)
cgc_nodes[['node_0','node_9']].drop_duplicates()

Unnamed: 0,node_0,node_9
0,SCN4A,lamotrigine
32,SCN4A,carbamazepine
64,SCN4A,benzocaine
96,SCN4A,phenytoin
128,SCN4A,lidocaine
160,SCN4A,tocainide
192,SCN4A,mexiletine
224,SCN7A,benzocaine


Many other graph shapes are possible, these are simply a couple of examples.  For instance, we could add onto this query an edge from node 0 to a new disease node (node 10) and also include an edge from the target node (node 9) to this same disease node. Now we would be finding genes that share a set of GO terms, then looking for disease,drug pairs such that the disease, the drug, and the gene are all interconnected.

## Rebuild and caching

For details on ROBOKOP caching, see the notebook on the expand service.  Briefly, the quick service will only run against the cache unless the rebuild parameter is passed, in which case services are re-queried.   The rebuild parameter is passed as part of the json query as seen in the `make_star_query` function above.  Another example is seen below in the Edge Properties section.

## Edge Properties

TODO: come up with a nice example.