# ArangoDB cuGraph Adapter Getting Started Guide  

<a href="https://colab.research.google.com/github/arangoml/cugraph-adapter/blob/master/examples/ArangoDB_cuGraph_Adapter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![arangodb](https://github.com/arangoml/cugraph-adapter/blob/master/examples/assets/logos/ArangoDB_logo.png?raw=1)
<a href="https://github.com/rapidsai/cugraph" rel="github.com/rapidsai/cugraph"><img src="https://github.com/arangoml/cugraph-adapter/blob/master/examples/assets/logos/rapids_logo.png?raw=1" width=30% height=30%></a>

Export Graphs from [ArangoDB](https://www.arangodb.com/), a multi-model Graph Database, to [cuGraph](https://github.com/rapidsai/cugraph), a library of collective GPU-accelerated graph algorithms.

# Environment Sanity Check



This notebook requires a Tesla T4, P4, or P100 GPU.
1. Open the <u>Runtime</u> dropdown
2. Click on <u>Change Runtime Type</u>
3. Set <u>Hardware accelerator</u> to GPU
4. Re-connect to runtime 

Check the output of `!nvidia-smi -L` to make sure you've been allocated a Tesla T4, P4, or P100. If not, you can rely on the _Disconnect and delete runtime_ option to repeat the process & try again (unfortunately this is the only option).

In [None]:
!nvidia-smi -L

# Setup

Estimated Time: 20 minutes :( 

Itinerary:
1. Update gcc in Colab
2. Install Conda
3. Install dependencies
4. Copy RAPIDS `.so` files into current working directory, a neccessary workaround for RAPIDS+Colab integration.

In [None]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py

In [None]:
# This will update the Colab environment and restart the kernel. 
!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)
# NOTE: Don't run the next cell until you see this session crash.

In [None]:
# This will install CondaColab.  This will restart your kernel one last time.
%%capture
import condacolab
condacolab.install()
# NOTE: Don't run the next cell until you see this session crash.

In [None]:
# You can now run the rest of the cells as normal
import condacolab
condacolab.check()

In [None]:
# Run CFFI Colab Pip Fix
!pip uninstall --yes cffi
!pip uninstall --yes cryptography
!pip install cffi==1.15.0

In [None]:
# Est time: 15 minutes
# Install CUDA 11.2, along with a specific version of cuGraph
!conda install -c rapidsai -c nvidia -c numba -c conda-forge cugraph=21.12 cudatoolkit=11.2

In [None]:
# Update Colab's libraries
import sys, os, shutil
sys.path.append('/usr/local/lib/python3.7/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ["CONDA_PREFIX"] = "/usr/local"
for so in ['cudf', 'rmm', 'nccl', 'cuml', 'cugraph', 'xgboost', 'cuspatial', 'cupy', 'geos','geos_c']:
  fn = 'lib'+so+'.so'
  source_fn = '/usr/local/lib/'+fn
  dest_fn = '/usr/lib/'+fn
  if os.path.exists(source_fn):
    print(f'Copying {source_fn} to {dest_fn}')
    shutil.copyfile(source_fn, dest_fn)

In [None]:
# Finally! Last step 
!git clone -b master --single-branch https://github.com/arangoml/cugraph-adapter.git
!pip install git+https://github.com/arangoml/cugraph-adapter.git
!pip install git+https://github.com/arangoml/utilities.git

# Understanding cuGraph

(referenced from [docs.rapids.ai/api/cugraph](https://docs.rapids.ai/api/cugraph/stable/))

RAPIDS cuGraph is a library of graph algorithms that seamlessly integrates into the RAPIDS data science ecosystem and allows the data scientist to easily call graph algorithms using data stored in GPU DataFrames, NetworkX Graphs, or even CuPy or SciPy sparse Matrices.

Built based on the [Apache Arrow](http://arrow.apache.org/) columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:

In [None]:
# Load a dataset into a GPU memory resident DataFrame and perform a basic calculation.
# Everything from CSV parsing to calculating tip percentage and computing a grouped average is done on the GPU.

import cudf
import cugraph
import io, requests

# download CSV file from GitHub
url="https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')

# read CSV from memory
tips_df = cudf.read_csv(io.StringIO(content))
tips_df['tip_percentage'] = tips_df['tip']/tips_df['total_bill']*100

# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())

As a cuGraph example, the following Python snippet loads graph data and computes PageRank:

In [None]:
import cudf
import cugraph

# read data into a cuDF DataFrame using read_csv
gdf = cudf.read_csv(io.StringIO(content), names=["src", "dst"], dtype=["int32", "int32"])

# We now have data as edge pairs
# create a Graph using the source (src) and destination (dst) vertex pairs
G = cugraph.Graph()
G.from_cudf_edgelist(gdf, source='src', destination='dst')

# Let's now get the PageRank score of each vertex by calling cugraph.pagerank
df_page = cugraph.pagerank(G)

# Let's look at the PageRank Score (only do this on small graphs)
for i in range(len(df_page)):
	print("vertex " + str(df_page['vertex'].iloc[i]) +
		" PageRank is " + str(df_page['pagerank'].iloc[i]))

# Create a Temporary ArangoDB Oasis Instance

In [None]:
import json
from arango import ArangoClient
from arango_utils import utils

# Request temporary instance from the managed ArangoDB Cloud Oasis.
con = utils.get_oasis_credentials()
print(json.dumps(con, indent=2))

# Connect to the db via the python-arango driver
db = ArangoClient(hosts=con["url"]).db(con["dbName"], con["username"], con["password"], verify=True)

Feel free to use the above URL to check out the UI!

# Import Sample Data

For demo purposes, we will be using the [ArangoDB Fraud Detection example graph](https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Fraud_Detection.ipynb), and the [ArangoDB IMDB Dataset](https://github.com/arangodb/example-datasets/tree/master/Graphs/IMDB).

In [None]:
!chmod -R 755 cugraph-adapter/
!./cugraph-adapter/tests/assets/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --replication-factor 3  --input-directory "cugraph-adapter/examples/data/fraud_dump" --include-system-collections true
!./cugraph-adapter/tests/assets/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --replication-factor 3  --input-directory "cugraph-adapter/examples/data/imdb_dump" --include-system-collections true

# Instantiate the Adapter

Connect the ArangoDB-cuGraph Adapter to our database client:

In [None]:
from adbcug_adapter import ADBCUG_Adapter
adbcug_adapter = ADBCUG_Adapter(db)

# ArangoDB to cuGraph



## Via ArangoDB Graph Name

Data source
* ArangoDB Fraud-Detection Graph

Package methods used
* [`adbcug_adapter.adapter.arangodb_graph_to_cugraph()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/adapter.py)

Important notes
* The `name` parameter in this case must point to an existing ArangoDB graph in your ArangoDB instance. 


In [None]:
# Define graph name
graph_name = "fraud-detection"

# Create cuGraph graph from ArangoDB graph name
cu_g = adbcug_adapter.arangodb_graph_to_cugraph(graph_name)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# cu_g = adbcug_adapter.arangodb_graph_to_cugraph(graph_name, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------------')
print(cu_g.nodes())
print('\n--------------------')
print(cu_g.edges())

## Via ArangoDB Collection Names

Data source
* ArangoDB Fraud-Detection Collections

Package methods used
* [`adbcug_adapter.adapter.arangodb_collections_to_cugraph()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/adapter.py)

Important notes
* The `name` parameter in this case is simply for naming your cuGraph graph.
* The `vertex_collections` & `edge_collections` parameters must point to existing ArangoDB collections within your ArangoDB instance.

In [None]:
# Define collection
vertex_collections = {"account", "bank", "branch", "Class", "customer"}
edge_collections = {"accountHolder", "Relationship", "transaction"}

# Create NetworkX graph from ArangoDB collections
cu_g = adbcug_adapter.arangodb_collections_to_cugraph("fraud-detection", vertex_collections, edge_collections)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# cu_g = adbcug_adapter.arangodb_collections_to_cugraph, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------------')
print(cu_g.nodes())
print('\n--------------------')
print(cu_g.edges())

## Via ArangoDB Graph Name with a custom controller and verbose logging

Data source
* ArangoDB Fraud-Detection Collections

Package methods used
* [`adbcug_adapter.adapter.arangodb_graph_to_cugraph()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/adapter.py)
* [`adbcug_adapter.controller._prepare_arangodb_vertex()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/controller.py)

Important notes
* The name parameter in this case must point to an existing ArangoDB graph in your ArangoDB instance.
* We are creating a custom `ADBCUG_Controller` to specify *how* to convert our ArangoDB vertex IDs into cuGraph node IDs. View the default `ADBCUG_Controller` [here](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/controller.py).

In [None]:
import logging
from adbcug_adapter import ADBCUG_Controller
from adbcug_adapter.typings import CuGId, Json

# Define metagraph
graph_name = "fraud-detection"

class Custom_ADBCUG_Controller(ADBCUG_Controller):
    """ArangoDB-cuGraph controller.

    Responsible for controlling how nodes & edges are handled when
    transitioning from ArangoDB to cuGraph.

    You can derive your own custom ADBCUG_Controller, but it is not
    necessary.
    """

    def _prepare_arangodb_vertex(self, adb_vertex: Json, col: str) -> CuGId:
        """Prepare an ArangoDB vertex before it gets inserted into the cuGraph
        graph.

        Given an ArangoDB vertex, you can modify it before it gets inserted
        into the cuGraph graph, and/or derive a custom node id for cuGraph to use.
        In most cases, it is only required to return the ArangoDB _id of the vertex.

        :param adb_vertex: The ArangoDB vertex object to (optionally) modify.
        :type adb_vertex: adbcug_adapter.typings.Json
        :param col: The ArangoDB collection the vertex belongs to.
        :type col: str
        :return: The ArangoDB _id attribute of the vertex.
        :rtype: str
        """
        adb_vertex_id: str = adb_vertex["_id"] # ArangoDB _id example: 'collection_name/_key'
        return 'new_' + adb_vertex_id # Custom behaviour: Add a "_new" prefix to every vertex ID

# Instantiate a new adapter with the custom controller
custom_adbcug_adapter = ADBCUG_Adapter(db, controller=Custom_ADBCUG_Controller())

# You can also change the adapter's logging level for access to 
# silent, regular, or verbose logging
custom_adbcug_adapter.set_logging(logging.WARNING) # silent logging
custom_adbcug_adapter.set_logging(logging.INFO) # regular logging (default)
custom_adbcug_adapter.set_logging(logging.DEBUG) # verbose logging

# Create cuGraph Graph an ArangoDB graph using the custom adapter
cu_g = custom_adbcug_adapter.arangodb_graph_to_cugraph("fraud-detection")

# Show graph data
print('\n--------------------')
print(cu_g.nodes())
print('\n--------------------')
print(cu_g.edges())