#### Copyright IBM All Rights Reserved.
#### SPDX-License-Identifier: Apache-2.0

## Using Database Views

In this demo we will:
1. Review an existing query
2. Modify our graph overlay file to include a new view
3. Create a new query using our view
    
### Before proceeeding

Please update the `connect_info` notebook with your db2 and graph server information.

Once the notebook has been updated please run the cell and press save.

### Graph Overlay Changes

Please make sure you have updated the graph overlay file to include the edge definition for the view:

Using your text editor of choice open the json file that was created when you added the database connection to the gremlin server. This file will be inside the directory where you asked docker to persist the configuration files. If you used the recommended gremlin database name of `graph` when adding the connection then the file will be called `db2_graph.json`

Scroll to the e_tables array and add the following json to the end of the array (keep the leading comma):

```json
    ,{
      "src_v": {
        "prefix": "DEMO.CLAIM",
        "id_cols": [
          "CLAIM_ID"
        ]
      },
      "dst_v": {
        "prefix": "DEMO.SERVICE",
        "id_cols": [
          "SERVICE_ID"
        ]
      },
      "src_v_tables": [
        {
          "schema_name": "DEMO",
          "table_name": "CLAIM"
        }
      ],
      "dst_v_tables": [
        {
          "schema_name": "DEMO",
          "table_name": "SERVICE"
        }
      ],
      "eid": {
        "implicit_id": true
      },
      "table": {
        "schema_name": "DEMO",
        "table_name": "SERVICE_OF_CLAIM"
      },
      "label": {
        "fixed_label": true,
        "label": "DEMO.SERVICE_OF_CLAIM"
      }
    }
```

then restart the gremlin server:

`docker exec -it db2graph manage restart`

## Review

In the first notebook one of the visualizations we looked at was connecting claims to doctors to service providers.

What if we had a scenario where we wanted to directly connect claims to service providers?

With existing graph databases this is a very complex and time consuming task as it requires a lot of custom logic to be defined to create all the additional edges between patients and service providers. This gets really complicated to maintain as new data becomes available. Additionally, if we wanted to remove this requirement for whatever reason cleaning up all the added edges can be problematic.

With Db2 Graph this process is as simple as defining a custom view linking patients to service providers then referencing the view as an edge table in our configuration file.

In [None]:
# For using notebooks as modules
import nbfinder

# These imports are for connecting, querying, traversing and returning gremlin result sets
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.traversal import T

# Make sure you have edited and ran the "connect_info" notebook then restarted this notebook
from connect_info import graph_connect_info, db2_connect_info

# When making a secure connection to the gremlin server the SSL certificate verification
# needs to be disabled when using a self signed certificate
from tornado import httpclient

# Db2 imports
import ibm_db as db
import pandas as pd

# These imports are required for working with the gremlin result set
# to transform it into something the visualization tool can work with
import json
from itertools import tee, islice, chain

In [None]:
# This helper function allows us to get the previous and next results when iterating a list
# We use this to determine how the edges connect different vertices when parsing a gremlin result set
def previous_and_next(some_iterable):
    prevs, items, nexts = tee(some_iterable, 3)
    prevs = chain([None], prevs)
    nexts = chain(islice(nexts, 1, None), [None])
    return zip(prevs, items, nexts)

The view in Db2 is defined as
```sql
create view demo.service_of_claim (claim_id, service_id) as (
    select
        claim_id,
        service_id 
    from
        demo.incharge ic,
        demo.incharge_of_claim link
    where 
        link.PERSON_INCHARGE_ID = ic.INCHARGE_ID
)
```

Which directly connects claims to service providers

Let's take a look at what the data looks like

In [None]:
conn_str="database=" + db2_connect_info["db2_database_name"] + \
    ";hostname=" + db2_connect_info["db2_hostname"] + ";port=" + db2_connect_info["db2_port"] + \
    ";protocol=tcpip;uid=" + db2_connect_info["db2_username"] + ";pwd=" + db2_connect_info["db2_password"]
conn = db.connect(conn_str,'','')
select = """
select * from demo.service_of_claim fetch first 10 rows only;
"""
stmt = db.exec_immediate(conn, select)
result = db.fetch_assoc(stmt)
data = []
while result != False:
    data.append(result)
    result = db.fetch_assoc(stmt)
db.close(conn)
pd.DataFrame.from_dict(data)

This view has the claim_id and linked service_id for every claim

In [None]:
# We will create a connection to our database
# This `g` object will be used to send all query requests to the graph server
gremlin_connect = httpclient.HTTPRequest(graph_connect_info["graph_url"], validate_cert=False)
g = traversal().withRemote(
    DriverRemoteConnection(
        gremlin_connect,
        graph_connect_info["graph_name"],
        username=graph_connect_info["graph_username"],
        password=graph_connect_info["graph_password"]
    )
)

Let's review the previous visualization linking claims > doctors > service providers

In [None]:
"""
This query uses our `g` object to perform the following traversal:
1. Get the vertices with the label 'DEMO.CLAIM'
2. Filter to the vertex with the 'CLAIM_ID' we are interested in
3. Find out who the insured person is
4. Find out what other claims they have filed
5. Find out who the doctors that handled the claim are and which service providers those doctors work for
6. Return the complete path from start to end
7. Return all properties for each hop in the path
8. Convert the gremlin result set into a python list
"""
 
other_claims_for_policy_holder = g.V() \
.hasLabel('DEMO.CLAIM') \
.has('CLAIM_ID', 'C4377') \
.out('DEMO.POLICYHOLDER_OF_CLAIM') \
.in_('DEMO.POLICYHOLDER_OF_CLAIM') \
.union(__.out('DEMO.INCHARGE_OF_CLAIM').out('DEMO.INCHARGE_DEMO.SERVICE'), __.out('DEMO.INSURED_OF_CLAIM')) \
.path() \
.by(__.valueMap(True)) \
.toList()

In [None]:
# You can view the raw output by uncommenting the print statements below
#print(other_claims_for_policy_holder)
#print(other_claims_for_policy_holder[1])
#for i in range(len(other_claims_for_policy_holder[1])):
#    print("other_claims_for_policy_holder[" + str(i) + "] = " + str(other_claims_for_policy_holder[0][i]))
#    print("")

In [None]:
"""
Note: vis-network uses the terms nodes and edges instead of vertices and edges, they are interchangable
They are defined as nodes here for vis-network

This function to parse the result set is very similar to the previous one. The only difference is how
we are getting the labels for each vertex
"""
other_claims_nodes = []
other_claims_edges = []
# start looping through the list containing the results
for val in other_claims_for_policy_holder:
    # for every value in the results get the previous, item and next item
    for previous, item, nxt in previous_and_next(val):
        # If there is no previous value available then skip the iteration
        if previous == None:
            continue
        # grab the id and label for the vertex
        itemId = item[T.id]["prefix"] + "::" + item[T.id]["idCols"][0]
        label = item[T.label]
        colour = "blue"
        # if we are on the patient vertex then skip the iteration
        if label == "DEMO.PATIENT":
            continue
        # if we are on the policy holder then then set a label
        # and set the colour of the vertex to green
        if label == "DEMO.POLICYHOLDER":
            label = "Policyholder " + itemId.split("::")[1]
            colour = "green"
        # if we are on a service vertex then set the label value to the
        # name of the service and the colour to orange
        if label == "DEMO.SERVICE":
            label = item["SERVICE_NAME"][0]
            colour = "orange"
        # if we are on an incharge vertex then set the label value to the doctors name and id
        # and the colour to grey
        if label == "DEMO.INCHARGE":
            label = "Dr. " + item["LNAME"][0] + " - " + item["SERVICE_ID"][0]
            colour = "grey"
        # if we are on the claim vertex then set the label to be the claim id
        if label == "DEMO.CLAIM":
            label = "Claim " + itemId.split("::")[1]
            # and if we are on the claim we are investigating set the colour to red
            if label == "Claim C4377":
                colour = "red"
        # add our edges
        if nxt != None:
            nxtId = nxt[T.id]["prefix"] + "::" + nxt[T.id]["idCols"][0]
            link = {"from": itemId, "to": nxtId, "title": label}
            if link not in other_claims_edges:
                other_claims_edges.append(link)
        # add our vertices
        node = {"id": itemId, "label": label, "group": item[T.label], "color": colour}
        if node not in other_claims_nodes:
            other_claims_nodes.append(node)
# dump the edges and nodges to json
with open('other_claims_for_policy_holder.json', 'w') as f:
    json.dump(
        {
            'nodes': other_claims_nodes,
            'edges': other_claims_edges
        },
        f,
        indent=4
    )

In [None]:
%%html
<!-- Create a div that will contain the visualization -->
<div id="other_claims_for_policy_holder">Visualization is loading...</div>
<script type="text/javascript">
// load the visualization library
require.config({
  paths: {
    Vis: "https://unpkg.com/vis-network@7.6.2/standalone/umd/vis-network.min"
  }
});
require(["Vis"], function(vis) {
  // now we will fetch the json from the previous cell
  fetch('other_claims_for_policy_holder.json').then(r => r.json()).then(graph => {
    // get a reference to the container we created to hold the visualization
    var container = document.getElementById('other_claims_for_policy_holder');
    // set our visualization data
    var data = {
      nodes: graph.nodes,
      edges: graph.edges
    };
    // define some default options for the visualization
    // See https://visjs.github.io/vis-network/docs/network/ for all available options
    var options = {
      width: '968px',
      height: '800px',
      nodes: {
        shape: 'dot',
      },
      interaction: {
        hover: true,
      },
    };
    new vis.Network(container, data, options);
  })
})
</script>

In this graph our policyholder, in green, is connected to the claims they have filed. The claims are connected to doctors and the doctors are connected to service providers.

Now let's run a similar query that uses our view

**Note:** Make sure you have modified the graph overlay file to reference the view and restarted the gremlin server before continuing

In [None]:
"""
This query uses our `g` object to perform the following traversal:
1. Get the vertices with the label 'DEMO.CLAIM'
2. Filter to the vertex with the 'CLAIM_ID' we are interested in
3. Find out who the insured person is
4. Find out what other claims they have filed
5. Find out which service providers those claimes were worked on
6. Return the complete path from start to end
7. Return all properties for each hop in the path
8. Convert the gremlin result set into a python list
"""
 
custom_view_query = g.V() \
.hasLabel('DEMO.CLAIM') \
.has('CLAIM_ID', 'C4377') \
.out('DEMO.POLICYHOLDER_OF_CLAIM') \
.in_('DEMO.POLICYHOLDER_OF_CLAIM') \
.union(__.out('DEMO.SERVICE_OF_CLAIM'), __.out('DEMO.INSURED_OF_CLAIM')) \
.path() \
.by(__.valueMap(True)) \
.toList()


In [None]:
# You can view the raw output by uncommenting the print statements below
#print(custom_view_query)
#print(custom_view_query[0])
#for i in range(len(custom_view_query[0])):
#    print("custom_view_query[" + str(i) + "] = " + str(custom_view_query[0][i]))
#    print("")

In [None]:
"""
This function to parse the result set is very similar to the previous one. The only difference is how
we are getting the labels for each node

Note: vis-network uses the terms nodes and edges instead of vertices and edges, they are interchangable
They are defined as nodes here for vis-network
"""

custom_view_claims_nodes = []
custom_view_claims_edges = []
# start looping through the list containing the results
for val in custom_view_query:
    # for every value in the results get the previous, item and next item
    for previous, item, nxt in previous_and_next(val):
        # If there is no previous value available then skip the iteration
        if previous == None:
            continue
        # grab the id and label for the vertex
        itemId = item[T.id]["prefix"] + "::" + item[T.id]["idCols"][0]
        label = item[T.label]
        colour = "blue"
        # if we are on the patient vertex then skip the iteration
        if label == "DEMO.PATIENT":
            continue
        # if we are on the policy holder then then set a label
        # and set the colour of the vertex to green
        if label == "DEMO.POLICYHOLDER":
            label = "Policyholder " + itemId.split("::")[1]
            colour = "green"
        # if we are on a service vertex then set the label value to the
        # name of the service and the colour to orange
        if label == "DEMO.SERVICE":
            label = item["SERVICE_NAME"][0]
            colour = "orange"
        # if we are on the claim vertex then set the label to be the claim id
        if label == "DEMO.CLAIM":
            label = "Claim " + itemId.split("::")[1]
            # and if we are on the claim we are investigating set the colour to red
            if label == "Claim C4377":
                colour = "red"
        # add our edges
        if nxt != None:
            nxtId = nxt[T.id]["prefix"] + "::" + nxt[T.id]["idCols"][0]
            link = {"from": itemId, "to": nxtId, "title": label}
            if link not in custom_view_claims_edges:
                custom_view_claims_edges.append(link)
        # add our vertices
        node = {"id": itemId, "label": label, "group": item[T.label], "color": colour}
        if node not in custom_view_claims_nodes:
            custom_view_claims_nodes.append(node)
# dump the edges and nodges to json
with open('custom_view_query.json', 'w') as f:
    json.dump(
        {
            'nodes': custom_view_claims_nodes,
            'edges': custom_view_claims_edges
        },
        f,
        indent=4
    )

In [None]:
%%html
<!-- Create a div that will contain the visualization -->
<div id="custom_view_query">Visualization is loading...</div>
<script type="text/javascript">
// load the visualization library
require.config({
  paths: {
    Vis: "https://unpkg.com/vis-network@7.6.2/standalone/umd/vis-network.min"
  }
});
require(["Vis"], function(vis) {
  // now we will fetch the json from the previous cell
  fetch('custom_view_query.json').then(r => r.json()).then(graph => {
    // get a reference to the container we created to hold the visualization
    var container = document.getElementById('custom_view_query');
    // set our visualization data
    var data = {
      nodes: graph.nodes,
      edges: graph.edges
    };
    // define some default options for the visualization
    // See https://visjs.github.io/vis-network/docs/network/ for all available options
    var options = {
      width: '968px',
      height: '800px',
      nodes: {
        shape: 'dot',
      },
      interaction: {
        hover: true,
      },
    };
    new vis.Network(container, data, options);
  })
})
</script>

Our claims, in blue, are now directly connected to our service providers, in orange, without needing to go through the doctors.

We were able to achieve this in two simple steps:
1. Create a view in Db2 that contains the links we want
2. Update the graph overlay file to add a new edge table reference using the view

Without requiring any complex logic to create and maintain. 

Since the underlying data is a regular view in Db2 the data will update automatically as the source tables populate, which means there is no additional effort required to maintain this view. 