#### Copyright IBM All Rights Reserved.
#### SPDX-License-Identifier: Apache-2.0

## Real Time Data

In this demo we will:
1. Run a graph query and visualize the results
2. Insert new data into Db2
3. Re-run the previous query and see the new data reflected live with no need to:
    - export the data
    - transform the data
    - load the data
    
### Before proceeeding

Please update the `connect_info` notebook with your db2 and graph server information.

Once the notebook has been updated please run the cell and press save.

## Review

In the previous notebook we used Db2 Graph to review a suspicious health insurance claim to determine if it was fraudulent. In this notebook we will build off the last example of showing the policy holder's social connections. We will add new social connections for the policy holder and see how changes to the underlying data are reflected in real time in Db2 Graph

In [None]:
# For using notebooks as modules
import nbfinder

# These imports are for connecting, querying, traversing and returning gremlin result sets
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.traversal import T

# Make sure you have edited and ran the "connect_info" notebook then restarted this notebook
from connect_info import graph_connect_info, db2_connect_info

# When making a secure connection to the gremlin server the SSL certificate verification
# needs to be disabled when using a self signed certificate
from tornado import httpclient

# Db2 imports
import ibm_db as db
import pandas as pd

# These imports are required for working with the gremlin result set
# to transform it into something the visualization tool can work with
import json
from itertools import tee, islice, chain

In [None]:
# This helper function allows us to get the previous and next results when iterating a list
# We use this to determine how the edges connect different vertices when parsing a gremlin result set
def previous_and_next(some_iterable):
    prevs, items, nexts = tee(some_iterable, 3)
    prevs = chain([None], prevs)
    nexts = chain(islice(nexts, 1, None), [None])
    return zip(prevs, items, nexts)

In [None]:
# We will create a connection to our database
# This `g` object will be used to send all query requests to the graph server
gremlin_connect = httpclient.HTTPRequest(graph_connect_info["graph_url"], validate_cert=False)
g = traversal().withRemote(
    DriverRemoteConnection(
        gremlin_connect,
        graph_connect_info["graph_name"],
        username=graph_connect_info["graph_username"],
        password=graph_connect_info["graph_password"]
    )
)

In [None]:
# and to Db2
conn_str="database=" + db2_connect_info["db2_database_name"] + \
    ";hostname=" + db2_connect_info["db2_hostname"] + ";port=" + db2_connect_info["db2_port"] + \
    ";protocol=tcpip;uid=" + db2_connect_info["db2_username"] + ";pwd=" + db2_connect_info["db2_password"]
conn = db.connect(conn_str,'','')

Let's show the social connections query and visualization again

In [None]:
# We'll start by classifying our risk scores and risk score colours for a better visualization output
def risk_factor(risk_score):
    if risk_score < 0:
        return "no_risk"
    elif risk_score in range(0, 20):
        return "low_risk"
    elif risk_score in range(21, 70):
        return "medium_risk"
    else:
        return "high_risk"

def risk_color(risk_score):
    if risk_score < 20:
        return "green"
    elif risk_score in range(21, 70):
        return "#FFAD73"
    else:
        return "red"

In [None]:
"""
This query uses our `g` object to perform the following traversal:
1. Start with the claim in question
2. Find out who the insured person is
3. Find all their social connections
4. emit each connection found
5. Return the complete path from start to end
6. Return all properties for each hop in the path
7. Convert the gremlin result set into a python list
"""

policy_holder_connections = g.V() \
.hasLabel('DEMO.CLAIM') \
.has('CLAIM_ID', 'C4377') \
.out('DEMO.POLICYHOLDER_OF_CLAIM') \
.repeat(__.out('DEMO.POLICYHOLDER_CONNECTION')) \
.emit() \
.path() \
.by(__.valueMap(True)) \
.toList()

In [None]:
# You can view the raw output by uncommenting the print statements below
#print(policy_holder_connections)
#print(policy_holder_connections[0])
#for i in range(len(policy_holder_connections[0])):
#    print("policy_holder_connections[" + str(i) + "] = " + str(policy_holder_connections[0][i]))
#    print("")


In [None]:
"""
Note: vis-network uses the terms nodes and edges instead of vertices and edges, they are interchangable
They are defined as nodes here for vis-network

This function to parse the result set is very similar to the previous one. The only difference is how
we are getting the labels for each vertex
"""

policy_holder_connection_nodes = []
policy_holder_connection_edges = []

# Loop over the result set
for val in policy_holder_connections:
    risk_score = 0
    # for each nested list iterate over it
    for previous, item, nxt in previous_and_next(val):
        itemId = item[T.id]["prefix"] + "::" + item[T.id]["idCols"][0]
        label = item[T.label]
        # if we are on the claim label then skip this iteration
        if label == "DEMO.CLAIM":
            continue
        # if we are on the policy holder claim label then skip this iteration
        if label == "DEMO.POLICYHOLDER_OF_CLAIM":
            continue
        if label == "DEMO.POLICYHOLDER":
            # for the label POLICYHOLDER set a clean label with the policy holder id
            label = "Policyholder " + itemId.split("::")[1]
        if "RISK_SCORE" in item:
            # if risk score is available then set it
            risk_score = item["RISK_SCORE"][0]
        if nxt != None:
            # create our edge links
            nxtId = nxt[T.id]["prefix"] + "::" + nxt[T.id]["idCols"][0]
            link = {"from": itemId, "to": nxtId, "title": label, "color": "blue"}
            if link not in policy_holder_connection_edges:
                policy_holder_connection_edges.append(link)
        # get the risk classification and colour for this vertex based on it's risk score
        color = risk_color(risk_score)
        risk_group = risk_factor(risk_score)
        #  our item is the policy holder then set the colour to aqua
        if itemId == "DEMO.POLICYHOLDER::PH3759":
            color = "aqua"
            risk_score = 100
            risk_group = "high_risk"
        # create our vertex
        node = {
            "title": risk_group,
            "color": color,
            "id": itemId,
            "label": label,
            "group": risk_group,
            "value": risk_score
            # we are setting each vertex value to it's risk score. vis network will increase the vertex size
            # to corrospond to the value
        }
        # and append it if the vertex is not in our list
        if node not in policy_holder_connection_nodes:
            policy_holder_connection_nodes.append(node)

with open('policy_holder_connections.json', 'w') as f:
    json.dump(
        {
            'nodes': policy_holder_connection_nodes,
            'edges': policy_holder_connection_edges
        },
        f,
        indent=4
    )

In [None]:
%%html
<!-- Create a div that will contain the visualization -->
<div id="policy_holder_connections">Visualization is loading...</div>
<script type="text/javascript">
// load the visualization library
require.config({
  paths: {
    Vis: "https://unpkg.com/vis-network@7.6.2/standalone/umd/vis-network.min"
  }
});
require(["Vis"], function(vis) {
  // now we will fetch the json from the previous cell
  fetch('policy_holder_connections.json').then(r => r.json()).then(graph => {
    // get a reference to the container we created to hold the visualization
    var container = document.getElementById('policy_holder_connections');
    // set our visualization data
    var data = {
      nodes: graph.nodes,
      edges: graph.edges
    };
    // define some default options for the visualization
    // See https://visjs.github.io/vis-network/docs/network/ for all available options
    var options = {
      width: '968px',
      height: '800px',
      nodes: {
        shape: 'dot',
      },
      interaction: {
        hover: true,
      },
    };
    new vis.Network(container, data, options);
  })
})
</script>

To review, the larger the circles the more high risk the policy holder is.

We can see that the policy holder in question (aqua coloured node) is directly connected to two other high risk policy holders.

They are also connected to a third high risk policy holder by 3 degrees of separation.

Now we will add some additional social connections for policy holder PH3759 and re-run the query

In [None]:
"""
This will add two fake rows into our policy holder social connections table
"""

insert_statement = """
insert
    into DEMO.POLICYHOLDER_CONNECTION(POLICYHOLDER_ID, POLICYHOLDER_ASSOCIATE_ID, LEVEL)
    values(?, ?, ?),(?, ?, ?)
"""
stmt = db.prepare(conn, insert_statement)
db.bind_param(stmt, 1, "PH11292")
db.bind_param(stmt, 2, "PH71118")
db.bind_param(stmt, 3, 80)
db.bind_param(stmt, 4, "PH3759")
db.bind_param(stmt, 5, "PH11292")
db.bind_param(stmt, 6, 84)
db.execute(stmt)

Now we will run the previous gremlin query and visualization

In [None]:
"""
This query uses our `g` object to perform the following traversal:
1. Start with the claim in question
2. Find out who the insured person is
3. Find all their social connections
4. emit each connection found
5. Return the complete path from start to end
6. Return all properties for each hop in the path
7. Convert the gremlin result set into a python list
"""

policy_holder_connections_v2 = g.V() \
.hasLabel('DEMO.CLAIM') \
.has('CLAIM_ID', 'C4377') \
.out('DEMO.POLICYHOLDER_OF_CLAIM') \
.repeat(__.out('DEMO.POLICYHOLDER_CONNECTION')) \
.emit() \
.path() \
.by(__.valueMap(True)) \
.toList()

In [None]:
# You can view the raw output by uncommenting the print statements below
#print(policy_holder_connections_v2)
#print(policy_holder_connections_v2[0])
#for i in range(len(policy_holder_connections_v2[0])):
#    print("policy_holder_connections_v2[" + str(i) + "] = " + str(policy_holder_connections_v2[0][i]))
#    print("")


In [None]:
"""
The same function as before, just modified for different file names to keep the data separate 

Note: vis-network uses the terms nodes and edges instead of vertices and edges, they are interchangable
They are defined as nodes here for vis-network

This function to parse the result set is very similar to the previous one. The only difference is how
we are getting the labels for each vertex
"""

policy_holder_connection_nodes_v2 = []
policy_holder_connection_edges_v2 = []

# Loop over the result set
for val in policy_holder_connections_v2:
    risk_score = 0
    # for each nested list iterate over it
    for previous, item, nxt in previous_and_next(val):
        itemId = item[T.id]["prefix"] + "::" + item[T.id]["idCols"][0]
        label = item[T.label]
        # if we are on the claim label then skip this iteration
        if label == "DEMO.CLAIM":
            continue
        # if we are on the policy holder claim label then skip this iteration
        if label == "DEMO.POLICYHOLDER_OF_CLAIM":
            continue
        if label == "DEMO.POLICYHOLDER":
            # for the label POLICYHOLDER set a clean label with the policy holder id
            label = "Policyholder " + itemId.split("::")[1]
        if "RISK_SCORE" in item:
            # if risk score is available then set it
            risk_score = item["RISK_SCORE"][0]
        if nxt != None:
            # create our edge links
            nxtId = nxt[T.id]["prefix"] + "::" + nxt[T.id]["idCols"][0]
            link = {"from": itemId, "to": nxtId, "title": label, "color": "blue"}
            if link not in policy_holder_connection_edges_v2:
                policy_holder_connection_edges_v2.append(link)
        # get the risk classification and colour for this vertex based on it's risk score
        color = risk_color(risk_score)
        risk_group = risk_factor(risk_score)
        #  our item is the policy holder then set the colour to aqua
        if itemId == "DEMO.POLICYHOLDER::PH3759":
            color = "aqua"
            risk_score = 100
            risk_group = "high_risk"
        # create our vertex
        node = {
            "title": risk_group,
            "color": color,
            "id": itemId,
            "label": label,
            "group": risk_group,
            "value": risk_score
            # we are setting each vertices value to it's risk score. vis network will increase the vertex size
            # to corrospond to the value
        }
        # and append it if the node is not in our list
        if node not in policy_holder_connection_nodes_v2:
            policy_holder_connection_nodes_v2.append(node)

with open('policy_holder_connections_v2.json', 'w') as f:
    json.dump(
        {
            'nodes': policy_holder_connection_nodes_v2,
            'edges': policy_holder_connection_edges_v2
        },
        f,
        indent=4
    )

In [None]:
%%html
<!-- Create a div that will contain the visualization -->
<div id="policy_holder_connections_v2">Visualization is loading...</div>
<script type="text/javascript">
// load the visualization library
require.config({
  paths: {
    Vis: "https://unpkg.com/vis-network@7.6.2/standalone/umd/vis-network.min"
  }
});
require(["Vis"], function(vis) {
  // now we will fetch the json from the previous cell
  fetch('policy_holder_connections_v2.json').then(r => r.json()).then(graph => {
    // get a reference to the container we created to hold the visualization
    var container = document.getElementById('policy_holder_connections_v2');
    // set our visualization data
    var data = {
      nodes: graph.nodes,
      edges: graph.edges
    };
    // define some default options for the visualization
    // See https://visjs.github.io/vis-network/docs/network/ for all available options
    var options = {
      width: '968px',
      height: '800px',
      nodes: {
        shape: 'dot',
      },
      interaction: {
        hover: true,
      },
    };
    new vis.Network(container, data, options);
  })
})
</script>

We now see our policy holder in question (aqua coloured vertex) is indirectly connected to to another high risk policy holder, PH71118, through policy holder PH11292

Changes to the underlying data are reflected in real time for Db2 Graph with no additional steps required, graph analytics can be performed as often as the data changes with no delay in processing.

## Don't forget to cleanup the added data

In [None]:
delete_statement = "DELETE FROM DEMO.POLICYHOLDER_CONNECTION WHERE POLICYHOLDER_ID = ? OR POLICYHOLDER_ASSOCIATE_ID = ?"
stmt = db.prepare(conn, delete_statement)
db.bind_param(stmt, 1, "PH11292")
db.bind_param(stmt, 2, "PH11292")
db.execute(stmt)

In [None]:
db.close(conn)