![arangodb](https://github.com/arangodb/interactive_tutorials/blob/master/notebooks/img/ArangoDB_logo.png?raw=1)

<a href="https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/ArangoDB_Graphistry_Visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**ArangoDB with Graphistry**

We explore the fraud detection data in ArangoDB to show how Arango's graph support interops with Graphistry pretty quickly.

This tutorial shares two sample transforms:

* Visualize the full graph
* Visualize the result of a traversal query

Each runs an AQL query via python-arango, automatically converts to pandas, and plots with graphistry.

Setup

In [None]:
%%capture
!git clone -b oasis_connector --single-branch https://github.com/arangodb/interactive_tutorials.git
!git clone -b 2.0.0 --single-branch https://github.com/arangoml/networkx-adapter.git
!rsync -av networkx-adapter/examples/ ./ --exclude=.git
!rsync -av interactive_tutorials/ ./ --exclude=.git
!pip3 install adbnx_adapter==2.0.0
!pip3 install matplotlib
!pip3 install pyArango
!pip3 install --user graphistry
!pip install jsonlines

In [None]:
## The runtime must restart on Colab for graphistry to work
print("The runtime must restart due to graphistry")
exit()

In [None]:
## NOTE: Notebook will intentionally exit, continue running from this point.
## On Colab: Click this code block and then CTRL/CMD + F10 or Runtime > Run After
print("NOTE: Notebook will intentionally exit, continue running from this point.")

import json
import oasis
import pandas as pd
import graphistry

**Create a Temporary ArangoDB Instance**

In [None]:
# Request temporary instance from the managed ArangoDB Cloud Oasis.
con = oasis.getTempCredentials(tutorialName="graphistry")

# Connect to the db via the python-arango driver
python_arango_db_driver = oasis.connect_python_arango(con)

# (Alternative) Connect to the db via the pyArango driver
# pyarango_db_driver = oasis.connect(con)[con['dbName']]

print()
print("https://{}:{}".format(con["hostname"], con["port"]))
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])

The following updates the data to no longer include hyphens as this is an issue when we want to display the data using graphify later on.

In [None]:
!gunzip /content/networkx-adapter/examples/data/fraud_dump/customer_91ec1f9324753048c0096d036a694f86.data.json.gz

In [None]:
import jsonlines

data = []

with jsonlines.open("/content/networkx-adapter/examples/data/fraud_dump/customer_91ec1f9324753048c0096d036a694f86.data.json", "r") as reader:
  for obj in reader:
   obj['data']['Ssn'] = str(obj['data']['Ssn']).replace('-', '')
   data.append(obj)   

with jsonlines.open("/content/networkx-adapter/examples/data/fraud_dump/customer_91ec1f9324753048c0096d036a694f86.data.json", "w") as writer:
  for obj in data:
    writer.write(obj)

In [None]:
!gzip /content/networkx-adapter/examples/data/fraud_dump/customer_91ec1f9324753048c0096d036a694f86.data.json 

**Load the fraud detection dataset**

In [None]:
!chmod -R 755 ./tools
!./tools/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --default-replication-factor 3  --input-directory "/content/networkx-adapter/examples/data/fraud_dump"


**Define the ArangoDB Named Graph**

In [None]:
edge_definitions = [
    {
        "edge_collection": "accountHolder",
        "from_vertex_collections": ["customer"],
        "to_vertex_collections": ["account"],
    },
    {
        "edge_collection": "transaction",
        "from_vertex_collections": ["account"],
        "to_vertex_collections": ["account"],
    },
]

name = "fraud-detection"
python_arango_db_driver.delete_graph(name, ignore_missing=True)
fraud_graph = python_arango_db_driver.create_graph(name, edge_definitions=edge_definitions)

print("Graph Setup done.")
print(fraud_graph)

Define Graphistry Transformation Functions

In [None]:
def paths_to_graph(paths, source='_from', destination='_to', node='_id'):
    nodes_df = pd.DataFrame()
    edges_df = pd.DataFrame()
    for graph in paths:
        nodes_df = pd.concat([ nodes_df, pd.DataFrame(graph['vertices']) ], ignore_index=True)
        edges_df = pd.concat([ edges_df, pd.DataFrame(graph['edges']) ], ignore_index=True)
    nodes_df = nodes_df.drop_duplicates([node])
    edges_df = edges_df.drop_duplicates([node])
    return graphistry.bind(source=source, destination=destination, node=node).nodes(nodes_df).edges(edges_df)

def graph_to_graphistry(graph, source='_from', destination='_to', node='_id'):
    nodes_df = pd.DataFrame()
    for vc_name in graph.vertex_collections():
        nodes_df = pd.concat([nodes_df, pd.DataFrame([x for x in graph.vertex_collection(vc_name)])], ignore_index=True)
    edges_df = pd.DataFrame()
    for edge_def in graph.edge_definitions():
        edges_df = pd.concat([edges_df, pd.DataFrame([x for x in graph.edge_collection(edge_def['edge_collection'])])], ignore_index=True)
    return graphistry.bind(source=source, destination=destination, node=node).nodes(nodes_df).edges(edges_df)

**Connect to Graphistry hub.graphistry.com server**

You need to [set up an account on graphistry.com](https://hub.graphistry.com/?ref=_ptnr_graphistry_ste_core) and login in order to generate a temporary API key.

In [None]:
#login to hub.graphistry.com
graphistry.register(api=3, protocol="https", server="hub.graphistry.com", username="<yourGraphistryUserName>", password="<yourGraphistryPassword>")
#register your temprary API key
graphistry.register(key='<yourGraphistryAPI_Key>', server='hub.graphistry.com') #https://www.graphistry.com/api-request


**Demo 1: Traversal visualization**

* Use python-arango's traverse() call to accounts and users connected to Betty Blue's account (account/10000016)
* Convert result paths to pandas and Graphistry
Plot, and 
* instead of using raw Arango vertex IDs, use the Name

In [None]:
paths = python_arango_db_driver.graph('fraud-detection').traverse(start_vertex='account/10000016')['paths']


In [None]:
g = paths_to_graph(paths)
g.bind(point_title='Name').plot()

**Demo 2: Full graph visualization**

* Use python-arango on a graph to identify and download the involved vertex/edge collections
* Convert the results to pandas and Graphistry
* Plot, and instead of using raw Arango vertex IDs, use Name



In [None]:
g = graph_to_graphistry( python_arango_db_driver.graph('fraud-detection') )
g.bind(point_title='_id').plot()