In [1]:
import pandas as pd
import json
import fornax

# Tutorial 2 - Executing a Query

## Introduction

In this tutorial we will
* query the graph for:
    - a node label containing hulk 
    - joining to a node label containing Lady 
    - joining to a node label containing storm 
    
Critically there is no such subgraph in the dataset.
However, fornax will correctly deduce that She-Hulk and the Invisible Woman are both members of the Lady Liberators and this is the best match because The Invisible Woman is also known as Sue Storm.

## Database Initialisation

In [2]:
#populate the database as in tutorial 2
nodes_df = pd.read_csv('./nodes.csv')
edges_df = pd.read_csv('./edges.csv')

## Preliminaries

Since the database only contains ids of nodes, lets build a look up table to convert node ids into labels.

In [3]:
nodes_df = pd.read_csv('./nodes.csv')
nodes_df.head()

Unnamed: 0,label,type,uid
0,Selene,0,87770955
1,Doctor Doom,0,2073821878
2,Viper,0,396175249
3,Rhino,0,279892555
4,Sin,0,2062678112


## Building a Query

The first step is to create a query graph.

We need three nodes and two edges. 
Add them to the database just like target nodes.

In [4]:
nodes = list(range(3))
nodes

[0, 1, 2]

In [5]:
edges = [(0, 1), (1, 2)]
edges

[(0, 1), (1, 2)]

Now we add matching edges between:
* query node 0 and all target nodes containing the substring hulk
* query node 1 and all target nodes containing the substring lady
* query node 2 and all target nodes containing the substring storm

The weight of each match is 1. Had we used a non binary matching function the weight can be in the range $0 < weight <= 1$.

In [6]:
matches = []
for uid in nodes_df[nodes_df['label'].str.contains("(?i)hulk")]['uid']:
    matches.append((0, uid, 1))
for uid in nodes_df[nodes_df['label'].str.contains('(?i)lady')]['uid']:
    matches.append((1, uid, 1))
for uid in nodes_df[nodes_df['label'].str.contains('(?i)storm')]['uid']:
    matches.append((2, uid, 1))

In [7]:
target_graph = fornax.GraphHandle.create(
    nodes_df['uid'], 
    zip(edges_df['start'], edges_df['end']),
    nodes_df[['label', 'type']].to_dict(orient='records')
)

query_graph = fornax.GraphHandle.create(
    nodes, 
    edges, 
    metadata=[{'label': 'Hulk'}, {'label': 'Lady'}, {'label': 'Storm'}]
)

query = fornax.QueryHandle.create(query_graph, target_graph, matches)

In [11]:
results = query.execute(n=10, edges=False)


This code may break in numpy 1.15 because this will return a view instead of a copy -- see release notes for details.
  return obj.view(dtype=(self.dtype.type, obj.dtype))


In [12]:
print(json.dumps(results, indent=4))

{
    "iterations": 2,
    "subgraph_matches": [
        {
            "subgraph_match": [
                {
                    "query_node_offset": 0,
                    "target_node_offset": 0
                },
                {
                    "query_node_offset": 1,
                    "target_node_offset": 6
                },
                {
                    "query_node_offset": 2,
                    "target_node_offset": 11
                }
            ],
            "total_score": 0.024416640711327393,
            "individual_scores": [
                0.013037520460784435,
                0.025322148576378822,
                0.034890253096818924
            ]
        },
        {
            "subgraph_match": [
                {
                    "query_node_offset": 0,
                    "target_node_offset": 0
                },
                {
                    "query_node_offset": 1,
                    "target_node_offset": 6
                },
     

In [13]:
query_nodes = results['query_nodes']
target_nodes = results['target_nodes']
subgraphs = results['subgraph_matches']
for i, sub in enumerate(subgraphs):
    print('match number: {}. Score: {}'.format(i+1, sub['total_score']))
    for offsets in sub['subgraph_match']: 
        query_node = query_nodes[offsets['query_node_offset']]['meta']['label']
        target_node = target_nodes[offsets['target_node_offset']]['meta']['label']
        print(
            query_node,
            '--->',
            target_node
        )
    print('\n\n')

match number: 1. Score: 0.024416640711327393
Hulk ---> She-Hulk
Lady ---> Lady Liberators
Storm --->  Susan Storm



match number: 2. Score: 0.024416640711327393
Hulk ---> She-Hulk
Lady ---> Lady Liberators
Storm ---> Sue Storm



match number: 3. Score: 0.024416640711327393
Hulk ---> She-Hulk
Lady ---> Lady Liberators
Storm --->  Susan Storm-Richards



match number: 4. Score: 0.6801945505042871
Hulk --->  The Hulk



match number: 5. Score: 0.6801945505042871
Hulk --->  Hulk



match number: 6. Score: 0.6801945505042871
Hulk --->  Red She-Hulk



match number: 7. Score: 0.6801945505042871
Hulk ---> The Incredible Hulk



match number: 8. Score: 0.6801945505042871
Hulk --->  Red Hulk



match number: 9. Score: 0.6801945505042871
Hulk ---> She-Hulk



match number: 10. Score: 0.6801945505042871
Hulk ---> Hulk



