# Using Dgraph to Explore the Offshore Leaks Dataset
05.02.14:53

## Analyzing the Raw Data

#### [Overview of the Raw Data](https://github.com/dgraph-io/vlg/blob/main/notes/1.%20Raw%20Data%20Analysis.md)

#### Data Normalization and Sanitizing

* [CSVKit to the rescue, an example](https://github.com/dgraph-io/vlg/blob/main/notes/csvstat/entities.txt)
* [50 ~~First~~ Bad Dates](https://github.com/dgraph-io/vlg/blob/ba577e1b65ae4dfa93a002ccd2009fea1113578b/tools/model/date_time.go#L19)
* [Non-null nulls](https://github.com/dgraph-io/vlg/blob/ba577e1b65ae4dfa93a002ccd2009fea1113578b/tools/model/entity.go#LL29C2-L29C2)

#### Storing Clean Data Transiently

* [Use Badger to store temporarily results (aka the export before the import)](https://github.com/dgraph-io/vlg/blob/ba577e1b65ae4dfa93a002ccd2009fea1113578b/tools/preload/main.go#L22)

## Dgraph Schemas Overview

Dgraph supports two types of schemas: DQL and GraphQL.

* DQL is a predicate-first focused schema language that supports some aspects not yet supported by the GraphQL syntax such as multi-lingual predicates and facets (data information stored on edges between nodes)
* GraphQL is a type-first focused schema language that only supports spec-compliant elements such as types, unions, and interfaces

## Building a GraphQL Schema for the Offshore Leaks Dataset

Questions first, schema second, [description](https://github.com/dgraph-io/vlg/blob/main/notes/2.%20Schema%20Design.md#schema-design-1)

* full term, text, trigram search for names, [schema ref](https://github.com/dgraph-io/vlg/blob/ba577e1b65ae4dfa93a002ccd2009fea1113578b/schema/schema.graphql#L28)
* restrict by source (Panama Papers, Paradise Papers), [schema ref](https://github.com/dgraph-io/vlg/blob/ba577e1b65ae4dfa93a002ccd2009fea1113578b/schema/schema.graphql#L4)
* query by geo-coordinates, [schema ref](https://github.com/dgraph-io/vlg/blob/ba577e1b65ae4dfa93a002ccd2009fea1113578b/schema/schema.graphql#L162)

[The Schema ](vlg/schema/schema.graphql)

In [None]:
# Issue command to apply the Offshore Leaks schema
!curl --data-binary '@./vlg/schema/schema.graphql' http://localhost:8080/admin/schema

## Importing RDF Data into Dgraph

1. Geo-locate US address lines using the US Census Address REST API (~17,000 or ~71%) were successful
2. Export the transient Badger data to RDF-format using tools written in Go. A subset was exported for today's workshop

In [None]:
# Invoke the Dgraph Live Loader to populate the graph
!dgraph live -f ./vlg/rdf-subset/data.rdf.gz

Dgraph supports two import formats (JSON and RDF). Dgraph provides two loading mechanisms, Live Loader and Bulk (Initial) Loader.

----- time check t+25m ------

### Imports and other Housekeeping

In [None]:
import os
import json
import pandas as pd
import multiprocessing

from utils import *

# the host or IP addr where your Dgraph alpha service is running
dgraph_addr = "localhost"

# load API keys, etc from .env file
from dotenv import load_dotenv

if not load_dotenv():
    display(warning("No .env file found, some cells may not render correctly"))
    

## Dgraph Clients, Queries and Mutations Overview

### GraphQL
* Dgraph autogenerates GraphQL API when schema updated
* Standard - Use dozens of IDEs, online tools, clients

For example, [Apollo Studio](https://studio.apollographql.com/sandbox/explorer?endpoint=http://localhost:8080/graphql)

### DQL
* Dgraph's native query language
* Clients for most programming languages
* Ratel: [https://play.dgraph.io](https://play.dgraph.io?latest)

In [None]:
# Create and connect a Dgraph DQL Client
import pydgraph # official Dgraph python client

client_stub = pydgraph.DgraphClientStub(addr='{}:9080'.format(dgraph_addr), options=[('grpc.max_receive_message_length', 1024*1024*1024)])
client = pydgraph.DgraphClient(client_stub)
print("pydgraph client, check version:", client.check_version())

# GraphQL client and admin client
from python_graphql_client import GraphqlClient #  popular python GraphQL client

gql_client = GraphqlClient(endpoint="http://{}:8080/graphql".format(dgraph_addr))
gql_admin_client = GraphqlClient(endpoint="http://{}:8080/admin".format(dgraph_addr))
data = gql_admin_client.execute(query="{health {status}}")
print("generic graphql client, check cluster health:", data['data']['health'][0])

### Issue a GraphQL Query

In [None]:
%%time

# Issue a GraphQL Query to Get Record Counts
query = """
query {
    paradisePapers: aggregateRecord(filter: { sourceID: { eq: ParadisePapers } }) {count}
    panamaPapers: aggregateRecord(filter: { sourceID: { eq: PanamaPapers } }) {count}
    bahamasLeaks: aggregateRecord(filter: { sourceID: { eq: BahamasLeaks } }) {count}
    offshoreLeaks: aggregateRecord(filter: { sourceID: { eq: OffshoreLeaks} }) {count}
    pandoraPapers: aggregateRecord(filter: { sourceID: { eq: PandoraPapers} }) {count}
    total: aggregateRecord() {count}
}
"""
data = gql_client.execute(query=query)
del data['extensions'] #drop the Dgraph metrics attributes
print(json.dumps(data, indent=2))
print("----------")

### Full-text query
Uses stems and removes stop words

In [None]:
%%time

ft_query = """
query ($filter: EntityFilter) {
  queryEntity(filter: $filter) {
    name
  }
}
"""
variables = {
    "filter": {
        "name": {
            "anyoftext": "live"
        }
    }
}
data = gql_client.execute(query=ft_query, variables=variables)
for res in data['data']['queryEntity']:
    print(res['name'])
print("-----------")        

### Regular Expression Query

In [None]:
%%time

# query for either limited or ltd, ignoring case
regex_query = """
query {
  queryRecord(filter: { name: { regexp: "/.LTD|Limited*/i" } }, first: 10) {
    id: nodeID
    type: __typename
    name
  }
}
"""
data = gql_client.execute(query=regex_query)
for res in data['data']['queryRecord']:
    print(res['name'])
print("-------------")


### Dgraph Query Funcs for Extracting Nodes and Edges from Query Results

In [None]:
def update_node(nodes: dict, key: str, value: dict):
    if not key in nodes:
        nodes[key] = {}
    for k, v in value.items():
        if not isinstance(v, list):
            nodes[key][k] = v

def extract_dict(nodes: dict, edges: list, data: dict, parent: dict = None, name: str = None):
    """Recursively extract nodes and edges from a dict created from the result of a Dgraph query.

    Nodes (vertices) from the query must have an ``id`` field in order to be recognized
    as a node. Optionally, if a ``type`` field is present (either as a list or a string),
    the type will be applied to the node. Nodes encountered in more than one place in the
    query result will be merged.

    Edges are automatically extracted from the query result. If a node has an an id and a parent,
    a relationship is made. The relationship predicate name is assigned as the edge type.
    """
    if isinstance(data, dict):
        # ignore the Dgraph 'extensions' field
        if name == "extensions":
            return
        # id is a special field, we use it to identify nodes
        if "id" in data:
            update_node(nodes, data['id'], data)
            # if we have a parent, add an edge
            if parent and "id" in parent:
                edges.append(
                    {"src": parent["id"], "dst": data["id"], "type": name})
        # recurse into the dict
        for key, value in data.items():
            if isinstance(value, dict):
                extract_dict(nodes, edges, value, data, key)
            elif isinstance(value, list) and len(value) > 0:
                # if the list is named 'type', assign it to the node
                if key == "type":
                    update_node(nodes, data["id"], {"type": value[0]})
                    continue
                # else, recurse into the list if it contains dicts
                if isinstance(value[0], dict):
                    for v in value:
                        extract_dict(nodes, edges, v, data, key)
                # if the list is of scalars, assign it to the node
                else:
                    nodes[data['id']][key] = value



## Load nodes and edges from the graph

In [None]:
import threading
import concurrent.futures
import time

# Query to extract all records and  edges from the graph
recordQuery = """
query ($queryRecordOffset: Int, $queryRecordFirst: Int) {
  queryRecord(offset: $queryRecordOffset, first: $queryRecordFirst) {
    id: nodeID
    type: __typename
    name
    sourceID
    hasAddress {
      id: nodeID
    }
    hasOfficer {
      id: nodeID
    }
    hasIntermediary {
      id: nodeID
    }
    connectedTo {
      id: nodeID
    }
  }
}
"""

def query(offset, first):
    variables = {
        "queryRecordOffset": offset,
        "queryRecordFirst": first
    }
    data = gql_client.execute(query=recordQuery, variables=variables)
    return data, offset

def load_all_nodes_and_edges(nodes: dict, edges: list):
    count_query = """
    query {
      total: aggregateRecord {
        count
      }
    }
    """
    data = gql_client.execute(query=count_query)
    totalRecords = data['data']['total']['count']
    start = time.time()

    print("Loading nodes and edges using", int(multiprocessing.cpu_count()/2), "cores...")
    executor = concurrent.futures.ThreadPoolExecutor(max_workers=int(multiprocessing.cpu_count()/2))
    f = []
    step = 25000
    lock = threading.Lock()
    for i in range(0, totalRecords-1, step):
        f.append(executor.submit(query, i, step))
    for r in concurrent.futures.as_completed(f):
        data = r.result()
        with lock:
            print("retrieved", data[1]+1, "thru", data[1]+step, "record count:", len((data[0]['data']['queryRecord'])))
            extract_dict(nodes, edges, data[0])
        
    end = time.time()
    print('frames and edges loaded in', end - start, 'seconds')
    print('node count', len(nodes))
    print('edges count', len(edges))

In [None]:
%%time

nodes = {}
edges = []
load_all_nodes_and_edges(nodes, edges)

In [None]:
---- time check T+45m ----

## Visualize the Graph

In [None]:
nodes_df = pd.DataFrame.from_dict(nodes, orient = 'index')
nodes_df.sample(5)

In [None]:
edges_df = pd.DataFrame(edges)
edges_df.sample(5)

In [None]:
import graphistry

graphistry_login()

g = graphistry.nodes(nodes_df, 'id').edges(edges_df, 'src', 'dst').bind(point_title='name')
g2 = g.encode_point_color('type', categorical_mapping={'Entity': '#3bdbdb', 'Intermediary': '#E99233', 'Officer': '#6DB364', 'Address': '#F7D82F'}, default_mapping='gray')
g3 = g2.encode_point_icon('type', shape="circle", #clip excess
  categorical_mapping={
      'Entity': 'fa-building',
      'Intermediary': 'fa-handshake-o',
      'Address': 'fa-map-marker',
      'Officer': 'fa-user'
  },
  default_mapping="question")

g3.plot()

## Graph Analysis
with networkx

In [None]:
import networkx as nx

G = nx.from_pandas_edgelist(
    edges_df,
    source="src",
    target="dst",
    edge_key="type",
    create_using=nx.DiGraph()
)
print(G)
print("Network density:", "%.8f" % nx.density(G))
try:
    print("Diameter:", nx.diameter(G))
except nx.NetworkXError as e:
    print("Error gettting diameter", e)

In [None]:
#find top 10 nodes by degree
sorted_deg = sorted(G.degree, key=lambda x: x[1], reverse=True)
for n in range(10):
    nodeID = sorted_deg[n][0]
    print(n+1, nodeID, nodes[nodeID]['name'], ', type:', nodes[nodeID]['type'], ", degrees:", sorted_deg[n][1])

In [None]:
%%time

# find top 10 entities by pagerank
pageranks = nx.pagerank(G)
sorted_pr = sorted(pageranks.items(), key=lambda x: x[1], reverse=True)
for n in range(10):
    nodeID = sorted_pr[n][0]
    print(n+1, nodeID, nodes[nodeID]['name'], ', type:', nodes[nodeID]['type'], ", pagerank:", '{:.8f}'.format(pageranks[nodeID]))

In [None]:
# distribution of record pagerank scores
sorted_pr_df = pd.DataFrame(sorted_pr)
plt = sorted_pr_df[1].hist(bins=8, range=[0.0,0.005])
plt.set_ylabel("Records")
plt.set_xlabel("Page Rank")
plt.set_title("Distribution of Node Page Rank")

In [None]:
%%time

# extract to top 150 pageranked record ids from the sorted pagerank list
top_nodes_by_pagerank = list(list(zip(*sorted_pr[0:150]))[0])

recurse_query_for_pr = """
{
    q(func: eq(Record.nodeID, {LIST})) @recurse(depth: 8) {
        # predicates to return for each recurse
        id: Record.nodeID
        name: Record.name
        type: <dgraph.type>
        # predicates to loop through
        hasaddress: Record.hasAddress
        addressFor: RecordRecord.addressFor
        hasOfficer: Record.hasOfficer
        officerFor: Record.officerFor
        hasIntermediary: Record.hasIntermediary
        intermediaryFor: Record.intermediaryFor
        connectedTo: RecordRecord.connectedTo  
  }
}
"""
recurse_query_for_pr = recurse_query_for_pr.replace("{LIST}", json.dumps(top_nodes_by_pagerank))
res = client.txn(read_only=True).query(recurse_query_for_pr)
data = json.loads(res.json)


In [None]:
nodes = {}
edges = []
extract_dict(nodes, edges, data)

edges_df = pd.DataFrame(edges)
nodes_df = pd.DataFrame.from_dict(nodes, orient = 'index')
print("nodes count:", nodes_df.shape[0])
print("edges count:", edges_df.shape[0])

In [None]:
g4 = graphistry.nodes(nodes_df, 'id').edges(edges_df, 'src', 'dst').bind(point_title='name')
g5 = g4.encode_point_color('type', categorical_mapping={'Entity': '#3bdbdb', 'Intermediary': '#E99233', 'Officer': '#6DB364', 'Address': '#F7D82F'}, default_mapping='gray')
g6 = g5.encode_point_icon('type', shape="circle", #clip excess
  categorical_mapping={
      'Entity': 'fa-building',
      'Intermediary': 'fa-handshake-o',
      'Address': 'fa-map-marker',
      'Officer': 'fa-user'
  },
  default_mapping="question")
g6.plot()

## Path Discovery

For instance, is there any connection between [The Duchy of Lancaster](https://www.google.com/maps/place/The+Duchy+of+Lancaster/@51.5103577,-0.1185369,3a,75y,90t/data=!3m8!1e2!3m6!1sAF1QipN1qKb8oPQi8V2B0p41M2VEw_87O3e9kBQDY3_4!2e10!3e12!6shttps:%2F%2Flh5.googleusercontent.com%2Fp%2FAF1QipN1qKb8oPQi8V2B0p41M2VEw_87O3e9kBQDY3_4%3Dw124-h86-k-no!7i2750!8i1894!4m7!3m6!1s0x487604ca1a382113:0x9abd9f1000dcce88!8m2!3d51.5103081!4d-0.1186846!10e5!16s%2Fg%2F1hc90lp5r) and [this address](https://goo.gl/maps/p34SchektxYGDJuCA) in Bermuda?


In [None]:
%%time

shortest_dql_query = """
query shortest($from: string, $to: string) {

  FROM as var(func: eq(Record.name, $from))
  TO as var(func: eq(Record.name, $to))
    
  P as shortest(from: uid(FROM), to: uid(TO)) {
    Record.hasAddress
    Record.addressFor
    Record.hasIntermediary
    Record.intermediaryFor
    Record.hasOfficer
    Record.officerFor
    Record.connectedTo
  }
    
  path(func: uid(P)) {
   uid
   Record.nodeID
   Record.name
   <dgraph.type>
  }
}
"""

from_node = 'The Duchy of Lancaster'
to_node = 'Suite 1090; 48 Par La Ville Road; Hamilton HM 11; Bermuda'

variables = {'$from': from_node, '$to': to_node}

res = client.txn(read_only=True).query(query=shortest_dql_query, variables=variables)
paths = json.loads(res.json)
print(json.dumps(paths, indent=2))
    
for path in paths['path']:
    print(path)


In [None]:
import ipycytoscape

graph_data = {"nodes": [], "edges": []}
# find the nodes
for idx, path in enumerate(paths['path']):
    entity_type = path['dgraph.type'][0]
    #graph_data['nodes'].append({"data": {"id": path['Record.nodeID'], "label": path['Record.name'], "tooltip": "<div style='background-color:white'>foo</div>"}, "classes": entity_type})
    graph_data['nodes'].append({"data": {"id": path['Record.nodeID'], "label": path['Record.name'], "type": entity_type}, "classes": entity_type})
    if idx < len(paths['path'])-1:
        graph_data['edges'].append({"data": {"uid": path['uid'], "source": path['Record.nodeID'], "target": paths['path'][idx+1]['Record.nodeID']}})

def find_edge_type(d: dict):
    uid = d['uid']
    for key, entry in d.items():
        if isinstance(entry, dict):
            for edge in graph_data['edges']:
                if edge['data']['uid'] == uid:
                    edge['data']['label'] = key[7:]
            find_edge_type(entry)
 
        
# recursively find the edge types
find_edge_type(paths['_path_'][0])
                           
print(graph_data)

In [None]:
cyto_styles = [
    {'selector': 'node[type = "Address"]', 'style': {
        'font-family': 'helvetica',
        'font-size': '10px',
        'label': 'data(label)',
        'background-color': '#E99233',
        'background-image': 'https://raw.githubusercontent.com/FortAwesome/Font-Awesome/master/svgs/solid/map-marker.svg',
        "background-width": "50%",
        "background-height": "50%"}},
    {'selector': 'node[type = "Entity"]', 'style': {
        'font-family': 'helvetica',
        'font-size': '10px',
        'label': 'data(label)',
        'background-color': '#3bdbdb',
        'background-image': 'https://raw.githubusercontent.com/FortAwesome/Font-Awesome/master/svgs/solid/building.svg',
        "background-width": "50%",
        "background-height": "50%"}},
    {'selector': 'node[type = "Intermediary"]', 'style': {
        'font-family': 'helvetica',
        'font-size': '10px',
        'label': 'data(label)',
        'background-color': '#E99233',
       'background-image': 'https://raw.githubusercontent.com/FortAwesome/Font-Awesome/master/svgs/solid/handshake.svg',
        "background-width": "50%",
        "background-height": "50%"}},
    {'selector': 'node[type = "Officer"]', 'style': {
        'font-family': 'helvetica',
        'font-size': '10px',
        'label': 'data(label)',
        'background-color': '#6DB364',
        'background-image': 'https://raw.githubusercontent.com/FortAwesome/Font-Awesome/master/svgs/solid/user.svg',
        "background-width": "50%",
        "background-height": "50%"}},
    {'selector': 'node[type = "Other"]', 'style': {
        'font-family': 'helvetica',
        'font-size': '10px',
        'label': 'data(label)',
        'background-color': '#999999'}},
    {'selector': 'node.flagged','style': {
        'border-color': 'red',
        'border-width': '4px'}},    
    {'selector': 'node:parent',
        'css': {
            'background-opacity': 0.333
        }
    },
    {'selector': 'edge', 'style': {
        'width': 3,
        'font-size': '9px',
        'line-color': '#9dbaea',
        'target-arrow-shape': 'triangle',
        'target-arrow-color': '#9dbaea',
        'curve-style': 'bezier',
        'label': 'data(label)'
    }
}]

cytoscapeobj = ipycytoscape.CytoscapeWidget()
cytoscapeobj.graph.add_graph_from_json(graph_data)
cytoscapeobj.set_layout(name='cola', nodeSpacing=20, edgeLengthVal=10)
cytoscapeobj.set_style(cyto_styles)
#display
cytoscapeobj

In [None]:
---- time check T+65m ----

### Search via Geo-coordinates

* Dgraph supports a  type 'geo' to store Points, Polygons, and MultiPolygons (OpenGeo Consortium-style geometries using geojson notation)
* Dgraph implements a "geo" index, allowing fast execution of 'near', 'within', 'contains', and 'intersects' functions in DQL queries

In [None]:
geo_query = """
query ($filter: AddressFilter) {
  queryAddress(filter: $filter) {
    nodeID
    name
    location {
      latitude
      longitude
    }
    addressFor {
      nodeID
      __typename
      name
    }
  }
}"""
variables = {
  "filter": {
    "has": "location"
  }
}

data = gql_client.execute(query=geo_query, variables=variables)

addresses_df = pd.json_normalize(data['data']['queryAddress'])

def extract_names(l):
    name = ''
    for entry in l:
        name += entry['__typename'] + ": " + entry['name'] + ", "
    return name[0:len(name)-2]

addresses_df = addresses_df.rename(columns={"location.latitude": "lat", "location.longitude": "lon"})
addresses_df['addressFor'] = addresses_df['addressFor'].apply(lambda val: extract_names(val))
addresses_df.sample(5)


In [None]:
import bokeh.io
from bokeh.plotting import gmap
from bokeh.models import ColumnDataSource, GMapOptions
from bokeh.io import output_file, show
from bokeh.models import HoverTool
from bokeh.resources import INLINE
bokeh.io.output_notebook(INLINE)

lat = 39.116386
lng = -99.299591
google_map_options = GMapOptions(lat = lat, lng = lng, map_type = "hybrid", zoom = 4)

hover = HoverTool(
        tooltips = [
            ('address', '@name'),
            ('addressFor', '@addressFor'), 
        ]
    )

google_maps_key = os.getenv("GOOGLE_MAPS_KEY")
google_map = gmap(google_maps_key, google_map_options, title="US Addresses", 
                  tools=[hover, 'reset', 'wheel_zoom', 'pan'], width=1200, height=640)
source = ColumnDataSource(addresses_df)
google_map.square(x="lon", y="lat", size=8, fill_color="red", fill_alpha=0.7, source=source)
show(google_map)


In [None]:
%%time

# Query for addresses within 50 miles of a point

# syracuse ny
lat = 43.088947
lng = -76.154480
# los angeles
#lat = 34.098907
#lng = -118.327759


miles = 50
meters = miles * 1609
variables = {
  "filter": {
    "location": {
      "near": {
        "coordinate": {
          "latitude": lat,
          "longitude": lng
        },
        "distance": meters
      }
    }
  }
}

data = gql_client.execute(query=geo_query, variables=variables)

addresses_df = pd.json_normalize(data['data']['queryAddress'])

addresses_df = addresses_df.rename(columns={"location.latitude": "lat", "location.longitude": "lon"})
addresses_df['addressFor'] = addresses_df['addressFor'].apply(lambda val: extract_names(val))

google_map_options = GMapOptions(lat = lat, lng = lng, map_type = "hybrid", zoom = 10)

hover = HoverTool(
        tooltips = [
            ('address', '@name'),
            ('addressFor', '@addressFor'), 
        ]
    )

google_map = gmap(google_maps_key, google_map_options, title="Addresses near Syracuse NY", 
                  tools=[hover, 'reset', 'wheel_zoom', 'pan'], width=1200, height=640)
source = ColumnDataSource(addresses_df)
google_map.square(x="lon", y="lat", size=12, fill_color="red", fill_alpha=0.7, source=source)
show(google_map)

## Recursive Queries

In [None]:
def is_flagged(node):
    return 'flagged' in node and len(node['flagged']) > 0
                                 
def convert_to_cyto_objs(nodes, edges):
    graph_data = {"nodes": [], "edges": []}
    # find the nodes
    for node in nodes.items():
        node = node[1]
        entity_type = node['type']
        classes = ''
        if is_flagged(node):
            classes = 'flagged'
        graph_data['nodes'].append({"data": {"id": node['id'], "label": node['name'], "type": entity_type, "flagged": is_flagged(node)}, "classes": classes})
    for edge in edges:
        graph_data['edges'].append({"data": {"source": edge['src'], "target": edge['dst'], "label": edge['type']}})
    return graph_data


In [None]:
%%time

# find record by nodeID, recursively iterate five levels deep
recurse_query = """
{
    q(func: eq(Record.nodeID, ["236724"])) @recurse(depth: 5) {
        # predicates to return for each recurse
        id: Record.nodeID
        name: Record.name
        type: <dgraph.type>
        
        # predicates to loop through
        addressFor: Record.addressFor(first: 30)
        hasOfficer: Record.hasOfficer
        hasIntermediary: Record.hasIntermediary
        connectedTo: RecordRecord.connectedTo  
    }
}
"""

nodes = {}
edges = []

txn = client.txn(read_only=True)
try:
    res = txn.query(query=recurse_query)
    results = json.loads(res.json)
    extract_dict(nodes, edges, results)
finally:
    txn.discard()

recurse_viz = ipycytoscape.CytoscapeWidget()
recurse_viz.set_layout(name='cola', nodeSpacing=40, edgeLengthVal=10)
recurse_viz.set_style(cyto_styles)
cyto_obj = convert_to_cyto_objs(nodes, edges)
recurse_viz.graph.add_graph_from_json(cyto_obj)
#display
recurse_viz

## Mutating Dgraph

In [None]:
# Update the GraphQL schema with 'flagged' predicate
!curl --data-binary '@./schema-flagged.graphql' http://localhost:8080/admin/schema

In [None]:
email = "matthew.mcneely@gmail.com"

flagged_mutation = """
mutation ($input: UpdateRecordInput!) {
  updateRecord(input: $input) {
    numUids
    record {
      flagged
    }
  }
}
"""

def record_click(node):
    """ updates the Record with a flagged entry """
    nodeID = node['data']['id']
    variables = {
      "input": {
        "filter": {
          "nodeID": {
            "eq": nodeID
          }
        },
        "set": {
          "flagged": [email]
        }
      }
    }
    for n in recurse_viz.graph.nodes:
        if n.data['id'] == nodeID:
            n.classes = 'flagged'
    data = gql_client.execute(query=flagged_mutation, variables=variables)
    print(data)


In [None]:
recurse_query = """
{
	q(func: eq(Record.nodeID, "236724")) @recurse(depth: 5) {
        # predicates to return for each recurse
        id: Record.nodeID
        name: Record.name
        type: <dgraph.type>
        flagged: Record.flagged
        
        # predicates to loop through
        addressFor: Record.addressFor(first: 30)
        hasOfficer: Record.hasOfficer
        hasIntermediary: Record.hasIntermediary
        connectedTo: RecordRecord.connectedTo  
    }
}
"""

nodes = {}
edges = []

txn = client.txn(read_only=True)
try:
    res = txn.query(query=recurse_query)
    results = json.loads(res.json)
    extract_dict(nodes, edges, results)
finally:
    txn.discard()

recurse_viz = ipycytoscape.CytoscapeWidget()
recurse_viz.set_layout(name='cola', nodeSpacing=40, edgeLengthVal=10)
recurse_viz.set_style(cyto_styles)
cyto_obj = convert_to_cyto_objs(nodes, edges)
recurse_viz.graph.add_graph_from_json(cyto_obj)
recurse_viz.on('node', 'click', record_click)
#display
recurse_viz

In [None]:
# Locate all the flagged records in the graph and recurse their edges to a depth of 7

recurse_flagged_query = """
{
	q(func: has(Record.flagged)) @recurse(depth: 3) {
        # predicates to return for each recurse
        id: Record.nodeID
        name: Record.name
        type: <dgraph.type>
        flagged: Record.flagged
        
        # predicates to loop through
        addressFor: Record.addressFor(first:10)
        hasAddress: Record.hasAddress(first:10)
        officerFor: Record.officerFor(first:10)
        hasOfficer: Record.hasOfficer(first:10)
        intermediaryFor: Record.intermediaryFor(first:10)
        hasIntermediary: Record.hasIntermediary(first:10)
        connectedTo: RecordRecord.connectedTo(first:10)  
    }
}
"""

nodes = {}
edges = []

txn = client.txn(read_only=True)
try:
    res = txn.query(query=recurse_flagged_query)
    results = json.loads(res.json)
    extract_dict(nodes, edges, results)
finally:
    txn.discard()

flagged_recurse_viz = ipycytoscape.CytoscapeWidget()
flagged_recurse_viz.set_layout(name='cola', nodeSpacing=40, edgeLengthVal=10)
flagged_recurse_viz.set_style(cyto_styles)
cyto_obj = convert_to_cyto_objs(nodes, edges)
flagged_recurse_viz.graph.add_graph_from_json(cyto_obj)
flagged_recurse_viz.on('node', 'click', record_click)
#display
flagged_recurse_viz

In [None]:
---- time check T+80m ----