# ArangoDB NetworkX Adapter Getting Started Guide  

<a href="https://colab.research.google.com/github/arangoml/networkx-adapter/blob/master/examples/ArangoDB_NetworkX_Adapter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![arangodb](https://github.com/arangoml/networkx-adapter/blob/master/examples/assets/logos/ArangoDB_logo.png?raw=1)
![networkX](https://github.com/arangoml/networkx-adapter/blob/master/examples/assets/logos/networkx_logo.svg?raw=1) 

Version: 2.0.0

Objective: Export Graphs from [ArangoDB](https://www.arangodb.com/), a multi-model Graph Database, to [NetworkX](https://networkx.github.io/), the swiss army knife for graph analysis in python, and vice-versa.

# Setup

In [None]:
%%capture
!git clone -b oasis_connector --single-branch https://github.com/arangodb/interactive_tutorials.git
!git clone -b 2.0.0 --single-branch https://github.com/arangoml/networkx-adapter.git
!rsync -av networkx-adapter/examples/ ./ --exclude=.git
!rsync -av interactive_tutorials/ ./ --exclude=.git
!pip3 install adbnx_adapter==2.0.0
!pip3 install matplotlib
!pip3 install pyArango

In [None]:
import json
import oasis
import networkx as nx
import matplotlib.pyplot as plt


from adbnx_adapter.adbnx_adapter import ArangoDB_Networkx_Adapter
from adbnx_adapter.adbnx_controller import Base_ADBNX_Controller

# Create a Temporary ArangoDB Instance

In [None]:
# Request temporary instance from the managed ArangoDB Cloud Oasis.
con = oasis.getTempCredentials()

# Connect to the db via the python-arango driver
python_arango_db_driver = oasis.connect_python_arango(con)

# (Alternative) Connect to the db via the pyArango driver
# pyarango_db_driver = oasis.connect(con)[con['dbName']]

print()
print("https://{}:{}".format(con["hostname"], con["port"]))
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])

Feel free to use to above URL to checkout the UI!

# Data Import

We will use an Fraud Detection example graph, explained in more detail in this [interactive notebook](https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Fraud_Detection.ipynb).

*Note the included arangorestore will only work on Linux system, if you want to run this notebook on a different OS please consider using the appropriate arangorestore from the [Download area](https://www.arangodb.com/download-major/).*

In [None]:
!chmod -R 755 ./tools
!./tools/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --default-replication-factor 3  --input-directory "data/fraud_dump"
!./tools/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --default-replication-factor 3  --input-directory "data/imdb_dump"

# Create Graph

The graph we will be using in the following looks as follows:

![networkX](https://github.com/arangoml/networkx-adapter/blob/master/examples/assets/fraud_graph.jpeg?raw=1) 

In [None]:
edge_definitions = [
    {
        "edge_collection": "accountHolder",
        "from_vertex_collections": ["customer"],
        "to_vertex_collections": ["account"],
    },
    {
        "edge_collection": "transaction",
        "from_vertex_collections": ["account"],
        "to_vertex_collections": ["account"],
    },
]

name = "fraud-detection"
python_arango_db_driver.delete_graph(name, ignore_missing=True)
fraud_graph = python_arango_db_driver.create_graph(name, edge_definitions=edge_definitions)

print("Graph Setup done.")
print(fraud_graph)

Feel free to visit the ArangoDB UI using the above link and login data and check the Graph!

# Create Adapter

Connect the ArangoDB_Networkx_Adapter to our temp ArangoDB cluster:

In [None]:
adbnx_adapter = ArangoDB_Networkx_Adapter(con)

# ArangoDB to NetworkX



## Via ArangoDB Attributes

In [None]:
# Define attributes
fraud_detection_attributes = {
    "vertexCollections": {
        "account": {"Balance", "account_type", "customer_id", "rank"},
        "bank": {"Country", "Id", "bank_id", "bank_name"},
        "branch": {"City", "Country", "Id", "bank_id", "branch_id", "branch_name"},
        "Class": {"concrete", "label", "name"},
        "customer": {"Name", "Sex", "Ssn", "rank"},
    },
    "edgeCollections": {
        "accountHolder": {"_from", "_to"},
        "Relationship": {"_from", "_to", "label", "name", "relationshipType"},
        "transaction": {"_from", "_to"},
    },
}

# Create NetworkX Graph from attributes
nx_g = adbnx_adapter.arangodb_to_networkx('FraudDetection',  fraud_detection_attributes)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# nx_g = adbnx_adapter.arangodb_to_networkx(graph_name = 'FraudDetection',  fraud_detection_attributes, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print(nx_g.nodes(data=True))
print(nx_g.edges(data=True))
nx.draw(nx_g, with_labels=True)

## Via ArangoDB Collections

In [None]:
# Define collection
vertex_collections = {"account", "bank", "branch", "Class", "customer"}
edge_collections = {"accountHolder", "Relationship", "transaction"}

# Create NetworkX graph from ArangoDB collections
nx_g = adbnx_adapter.arangodb_collections_to_networkx("fraud-detection", vertex_collections, edge_collections)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# nx_g = adbnx_adapter.arangodb_collections_to_networkx("fraud-detection", vertex_collections, edge_collections, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print(nx_g.nodes(data=True))
print(nx_g.edges(data=True))
nx.draw(nx_g, with_labels=True)

## Via ArangoDB Graph

In [None]:
# Define graph name
graph_name = "fraud-detection"

# Create NetworkX graph from ArangoDB graph
nx_g = adbnx_adapter.arangodb_graph_to_networkx(graph_name)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# nx_g = adbnx_adapter.arangodb_graph_to_networkx(graph_name, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print(nx_g.nodes(data=True))
print(nx_g.edges(data=True))
nx.draw(nx_g, with_labels=True)

## Via ArangoDB Attributes with a customized controller

In [None]:
# Introduce the new controller class
class IMDB_ADBNX_Controller(Base_ADBNX_Controller):
    # We re-define how vertex pre-insertion should be treated, specifically for the IMDB dataset.
    def _prepare_arangodb_vertex(self, vertex: dict, collection: str):
        """Prepare an ArangoDB vertex before it gets inserted into the NetworkX graph.

        Given an ArangoDB vertex, you can modify it before it gets inserted
        into the NetworkX graph, and/or derive a custom node id for networkx to use.

        In most cases, it is only required to return the ArangoDB _id of the vertex.

        :param vertex: The ArangoDB vertex object to (optionally) modify.
        :type vertex: dict
        :param collection: The ArangoDB collection the vertex belongs to.
        :type collection: str
        :return: The ArangoDB _id attribute of the vertex.
        :rtype: str
        """
        vertex["bipartite"] = 0 if collection == "Users" else 1  # The new change
        return vertex["_id"]  # This is standard

    # We're not interested in re-defining pre-insertion handling for edges, so we leave it be
    # def _prepare_arangodb_edge(self, edge: dict, collection: str):
    #   return super()._prepare_arangodb_edge(edge, collection)

# Instantiate the adapter
imdb_adbnx_adapter = ArangoDB_Networkx_Adapter(con, IMDB_ADBNX_Controller)

# Define attributes
imdb_attributes = {
    "vertexCollections": {"Users": {}, "Movies": {}},
    "edgeCollections": {"Ratings": {"_from", "_to", "ratings"}},
}

# Create NetworkX Graph from attributes using the custom IMDB_ArangoDB_Networx_Adapter
nx_g = imdb_adbnx_adapter.arangodb_to_networkx("IMDBGraph", imdb_attributes)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# nx_g = imdb_adbnx_adapter.arangodb_to_networkx("IMDBGraph", imdb_attributes, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print(nx_g.nodes(data=True))
# print(nx_g.edges(data=True)) # (will exceed IOPub data rate)
# nx.draw(nx_g, with_labels=True) # (will exceed IOPub data rate)

# NetworkX to ArangoDB

## Example 1: NetworkX Grid Graph

In [None]:
# Load the nx graph & draw
grid_nx_g = nx.grid_2d_graph(5, 5)
nx.draw(grid_nx_g, with_labels=True)

# Define edge defintions for the ArangoDB graph to understand
edge_definitions = [
    {
        "edge_collection": "to",
        "from_vertex_collections": ["Grid_Node"],
        "to_vertex_collections": ["Grid_Node"],
    }
]

# Introduce the new controller class
class Grid_ADBNX_Controller(Base_ADBNX_Controller):
  def _identify_networkx_node(self, id, node: dict) -> str:
    """Given a NetworkX node, identify what ArangoDB collection it should belong to.

    NOTE: If your NetworkX graph does not comply to ArangoDB standards
    (i.e a node's ID is not "collection/key"), then you must override this function.

    :param id: The NetworkX ID of the node.
    :type id: Any
    :param node: The NetworkX node object.
    :type node: dict
    :return: The ArangoDB collection name
    :rtype: str
    """
    return "Grid_Node"  # Only one node collection in this dataset

  def _identify_networkx_edge(self, edge: dict, from_node: dict, to_node: dict) -> str:
    """Given a NetworkX edge, and its pair of nodes, identify what ArangoDB collection should it belong to.

    NOTE: If your NetworkX graph does not comply to ArangoDB standards
    (i.e a node's ID is not "collection/key"), then you must override this function.

    :param edge: The NetworkX edge object.
    :type edge: dict
    :param from_node: The NetworkX node object representing the edge source.
    :type from_node: dict
    :param to_node: The NetworkX node object representing the edge destination.
    :type to_node: dict
    :return: The ArangoDB collection name
    :rtype: str
    """
    from_collection = self.adb_map.get(from_node["id"])["collection"]
    to_collection = self.adb_map.get(to_node["id"])["collection"]

    if from_collection == to_collection == "Grid_Node":
        return "to"

    return "Unknown_Edge"
    
  def _keyify_networkx_node(self, id, node: dict, collection: str) -> str:
    """Given a NetworkX node, derive its valid ArangoDB key.

    NOTE: If your NetworkX graph does not comply to ArangoDB standards
    (i.e a node's ID is not "collection/key"), then you must override this function.

    :param node: The NetworkX node object.
    :type node: dict
    :param collection: The ArangoDB collection the node belongs to.
    :type collection: str
    :return: A valid ArangoDB _key value.
    :rtype: str
    """
    # Since our NetworkX nodes have an id of type tuple, we can use the existing helper function.
    return self._tuple_to_arangodb_key_helper(id)


# Instantiate the adapter
grid_adbnx_adapter = ArangoDB_Networkx_Adapter(con, Grid_ADBNX_Controller)

# Create the ArangoDB graph
name = "Grid"
python_arango_db_driver.delete_graph(name, drop_collections=True, ignore_missing=True)
grid_adb_g = grid_adbnx_adapter.networkx_to_arangodb(name, grid_nx_g, edge_definitions)


print(f"Inspect the graph here: https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}")
print(f"View the original graph here: https://networkx.org/documentation/stable/auto_examples/basic/plot_read_write.html#sphx-glr-auto-examples-basic-plot-read-write-py)")

## Example 2: NetworkX Football Graph

In [None]:
import io
import zipfile
import urllib.request as urllib

# Load the nx graph & draw
url = "http://www-personal.umich.edu/~mejn/netdata/football.zip"
sock = urllib.urlopen(url)
s = io.BytesIO(sock.read())
sock.close()
zf = zipfile.ZipFile(s)
gml = zf.read("football.gml").decode()
gml = gml.split("\n")[1:]

football_nx_g = nx.parse_gml(gml)
nx.draw(football_nx_g, with_labels=True)

# Define edge defintions for the ArangoDB graph to understand
edge_definitions = [
    {
        "edge_collection": "played",
        "from_vertex_collections": ["Football_Team"],
        "to_vertex_collections": ["Football_Team"],
    }
]

# Introduce the new controller class
class Football_ADBNX_Controller(Base_ADBNX_Controller):
    def _identify_networkx_node(self, id, node: dict) -> str:
        return "Football_Team"  # Only one node collection in this dataset=

    def _keyify_networkx_node(self, id, node: dict, collection: str) -> str:
        return self._string_to_arangodb_key_helper(id)

    def _identify_networkx_edge(
        self, edge: dict, from_node: dict, to_node: dict
    ) -> str:
        from_collection = self.adb_map.get(from_node["id"])["collection"]
        to_collection = self.adb_map.get(to_node["id"])["collection"]

        if from_collection == to_collection == "Football_Team":
            return "played"

        return "Unknown_Edge"


# Instantiate the adapter
football_adbnx_adapter = ArangoDB_Networkx_Adapter(con, Football_ADBNX_Controller)

# Create the ArangoDB graph
name = "Football"
python_arango_db_driver.delete_graph(name, drop_collections=True, ignore_missing=True)
football_adb_g = football_adbnx_adapter.networkx_to_arangodb(name, football_nx_g, edge_definitions)


print(f"Inspect the graph here: https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}")
print(f"View the original graph here: https://networkx.org/documentation/stable/auto_examples/graph/plot_football.html#sphx-glr-auto-examples-graph-plot-football-py)")

# Full Cycles

## From ArangoDB (ArangoDB to NetworkX to ArangoDB) with existing collections

In [None]:
name = "fraud-detection"

# Start from ArangoDB graph
original_fraud_adb_g = adbnx_adapter.db.graph(name)
edge_definitions = original_fraud_adb_g.edge_definitions()

# Create NetworkX graph from ArangoDB graph
fraud_nx_g = adbnx_adapter.arangodb_graph_to_networkx(name)
nx.draw(fraud_nx_g, with_labels=True)

# Modify the NetworkX graph
for _, node in fraud_nx_g.nodes(data=True):
    node["new_vertex_data"] = ["new", "vertex", "data", "here"]

for _, _, edge in fraud_nx_g.edges(data=True):
    edge["new_edge_data"] = ["new", "edge", "data", "here"]

# Re-use existing graph's edge definitions to overwrite existing graph
updated_fraud_adb_g = adbnx_adapter.networkx_to_arangodb(
    name,
    fraud_nx_g,
    edge_definitions,
    keyify_edges=True,
)

# Create ArangoDB graph from NetworkX graph
# Keify edges to keep the same key values as original (this is optional)
new_fraud_adb_g = adbnx_adapter.networkx_to_arangodb(name, fraud_nx_g, edge_definitions, keyify_edges=True)

print(f"Inspect the overwritten graph here: https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}")

## From ArangoDB (ArangoDB to NetworkX to ArangoDB) with new collections

In [None]:
name = "fraud-detection"

# Start from ArangoDB graph
original_fraud_adb_g = adbnx_adapter.db.graph(name) 

# Create NetworkX graph from ArangoDB graph
fraud_nx_g = adbnx_adapter.arangodb_graph_to_networkx(name)
nx.draw(fraud_nx_g, with_labels=True)

# Provide edge_definitions (we are preparing to re-translate back to ArangoDB)
edge_definitions = [
    {
        "edge_collection": "accountHolder_nx",
        "from_vertex_collections": ["customer_nx"],
        "to_vertex_collections": ["account_nx"],
    },
    {
        "edge_collection": "transaction_nx",
        "from_vertex_collections": ["account_nx"],
        "to_vertex_collections": ["account_nx"],
    },
]

# Since we want to port existing fraud data to new collections, we must introduce a new Controller Class
class Fraud_ADBNX_Controller(Base_ADBNX_Controller):
  def _identify_networkx_node(self, id, node: dict) -> str:
      adb_id: str = id
      return adb_id.split("/")[0] + "_nx"

  def _identify_networkx_edge(self, edge: dict, from_node: dict, to_node: dict) -> str:
      edge_id: str = edge["_id"]
      return edge_id.split("/")[0] + "_nx"

fraud_adbnx_adapter = ArangoDB_Networkx_Adapter(con, Fraud_ADBNX_Controller)

# Create a new ArangoDB graph from NetworkX graph
new_name = name + "-nx"
python_arango_db_driver.delete_graph(new_name, drop_collections=True, ignore_missing=True)
new_fraud_adb_g = fraud_adbnx_adapter.networkx_to_arangodb(new_name, fraud_nx_g, edge_definitions, keyify_edges=True) # Keify edges to keep the same key values as original (this is optional)

print(f"Inspect the new graph here: https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{new_name}")
print(f"View the original graph here: https://tutorials.arangodb.cloud:8529/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}")


## From NetworkX (NetworkX to ArangoDB to NetworkX)

In [None]:
# Load the nx graph
original_grid_nx_g = nx.grid_2d_graph(5, 5)
print(original_grid_nx_g.nodes(data=True))
print(original_grid_nx_g.edges(data=True))

# Re-introduce the Grid controller class
class Grid_ADBNX_Controller(Base_ADBNX_Controller):
    def _prepare_arangodb_vertex(self, vertex: dict, collection: str):
        nx_id = tuple(
            int(n)
            for n in tuple(
                vertex["_key"],
            )
        )
        return nx_id

    def _identify_networkx_node(self, id: tuple, node: dict) -> str:
        return "Grid_Node"  # Only one node collection in this dataset

    def _keyify_networkx_node(self, id: tuple, node: dict, collection: str) -> str:
        return self._tuple_to_arangodb_key_helper(id)

    def _identify_networkx_edge(
        self, edge: dict, from_node: dict, to_node: dict
    ) -> str:
        from_collection = self.adb_map.get(from_node["id"])["collection"]
        to_collection = self.adb_map.get(to_node["id"])["collection"]

        if from_collection == to_collection == "Grid_Node":
            return "to"

        return "Unknown_Edge"

# Re-instantiate the Grid adapter class
grid_adbnx_adapter = ArangoDB_Networkx_Adapter(con, Grid_ADBNX_Controller)

# Delete the Grid graph if it already exists in ArangoDB
name = "Grid"
python_arango_db_driver.delete_graph(name, drop_collections=True, ignore_missing=True)

# Define edge defintions for the ArangoDB graph to understand
edge_definitions = [
    {
        "edge_collection": "to",
        "from_vertex_collections": ["Grid_Node"],
        "to_vertex_collections": ["Grid_Node"],
    }
]

# Create the ArangoDB graph
grid_adbnx_adapter.networkx_to_arangodb(name, original_grid_nx_g, edge_definitions)

# Create the NetworkX graph from the ArangoDB graph
new_grid_nx_g = grid_adbnx_adapter.arangodb_graph_to_networkx(name)

# Draw the new graph
nx.draw(new_grid_nx_g, with_labels=True)
print(new_grid_nx_g.nodes(data=True))
print(new_grid_nx_g.edges(data=True))