# ArangoDB cuGraph Adapter Getting Started Guide  

<a href="https://colab.research.google.com/github/arangoml/cugraph-adapter/blob/master/examples/ArangoDB_cuGraph_Adapter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![arangodb](https://github.com/arangoml/cugraph-adapter/blob/master/examples/assets/logos/ArangoDB_logo.png?raw=1)
<a href="https://github.com/rapidsai/cugraph" rel="github.com/rapidsai/cugraph"><img src="https://github.com/arangoml/cugraph-adapter/blob/master/examples/assets/logos/rapids_logo.png?raw=1" width=30% height=30%></a>

Export Graphs from [ArangoDB](https://www.arangodb.com/), a multi-model Graph Database, to [cuGraph](https://github.com/rapidsai/cugraph), a library of collective GPU-accelerated graph algorithms.


⚠️ The `Run all` option will not work in this notebook. ⚠️

# Environment Sanity Check



This notebook requires a Tesla T4, P4, or P100 GPU.
1. Open the <u>Runtime</u> dropdown
2. Click on <u>Change Runtime Type</u>
3. Set <u>Hardware accelerator</u> to GPU
4. Re-connect to runtime 

Check the output of `!nvidia-smi -L` to make sure you've been allocated a Tesla T4, P4, or P100. If not, you can rely on the _Disconnect and delete runtime_ option to repeat the process & try again (unfortunately this is the only option).

In [1]:
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-d29fdd1e-0900-c42d-55b4-6b695acb0a1f)


# Setup
Est Time: 20 minutes 

Itinerary:
1. Update gcc in Colab
2. Install Conda
3. Install dependencies
4. Copy RAPIDS `.so` files into current working directory, a neccessary workaround for RAPIDS+Colab integration.

In [2]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py

Cloning into 'rapidsai-csp-utils'...
remote: Enumerating objects: 300, done.[K
remote: Counting objects: 100% (129/129), done.[K
remote: Compressing objects: 100% (74/74), done.[K
remote: Total 300 (delta 74), reused 99 (delta 55), pack-reused 171[K
Receiving objects: 100% (300/300), 87.58 KiB | 425.00 KiB/s, done.
Resolving deltas: 100% (136/136), done.
***********************************************************************
Woo! Your instance has the right kind of GPU, a Tesla T4!
***********************************************************************



In [None]:
# This will update the Colab environment and restart the kernel. 
!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)

# NOTE: Don't run the next cell until you see this session crash.

Updating your Colab environment.  This will restart your kernel.  Don't Panic!
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Hit:3 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease [1,581 B]
Get:5 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Get:8 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Packages [787 kB]
Hit:10 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Get:12 http://security.ubuntu.com/ubuntu bionic-security/universe am

In [1]:
# This will install CondaColab.  This will restart your kernel one last time.
!pip install -q condacolab
import condacolab
condacolab.install()
# condacolab.install_miniconda()

# NOTE: Don't run the next cell until you see this session crash.

⏬ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:22
🔁 Restarting kernel...


In [1]:
# You can now run the rest of the cells as normal
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [2]:
# Run CFFI Colab Pip Fix
!pip uninstall --yes cffi
!pip uninstall --yes cryptography
!pip install cffi==1.15.0

Found existing installation: cffi 1.14.5
Uninstalling cffi-1.14.5:
  Successfully uninstalled cffi-1.14.5
Found existing installation: cryptography 3.4.5
Uninstalling cryptography-3.4.5:
  Successfully uninstalled cryptography-3.4.5
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting cffi==1.15.0
  Downloading cffi-1.15.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (427 kB)
[K     |████████████████████████████████| 427 kB 5.3 MB/s 
Installing collected packages: cffi
Successfully installed cffi-1.15.0


In [3]:
# Est time: 15 minutes
# Install CUDA 11.2, along with a specific version of cuGraph
!conda install -c rapidsai -c nvidia -c numba -c conda-forge cugraph=21.12 cudatoolkit=11.2

Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done
- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - 

In [4]:
# Update Colab's libraries
import sys, os, shutil
sys.path.append('/usr/local/lib/python3.7/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ["CONDA_PREFIX"] = "/usr/local"
for so in ['cudf', 'rmm', 'nccl', 'cuml', 'cugraph', 'xgboost', 'cuspatial', 'cupy', 'geos','geos_c']:
  fn = 'lib'+so+'.so'
  source_fn = '/usr/local/lib/'+fn
  dest_fn = '/usr/lib/'+fn
  if os.path.exists(source_fn):
    print(f'Copying {source_fn} to {dest_fn}')
    shutil.copyfile(source_fn, dest_fn)

Copying /usr/local/lib/libcudf.so to /usr/lib/libcudf.so
Copying /usr/local/lib/libnccl.so to /usr/lib/libnccl.so
Copying /usr/local/lib/libcugraph.so to /usr/lib/libcugraph.so


In [5]:
# Finally! Last step
!pip install git+https://github.com/arangoml/cugraph-adapter.git
!pip install adb-cloud-connector
!git clone -b master --single-branch https://github.com/arangoml/cugraph-adapter.git

# Unfortunately the following does not work in colab (hence the steps above):
# !conda install -c arangodb adbcug_adapter

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/arangoml/cugraph-adapter.git
  Cloning https://github.com/arangoml/cugraph-adapter.git to /tmp/pip-req-build-dbgjaw6a
  Running command git clone -q https://github.com/arangoml/cugraph-adapter.git /tmp/pip-req-build-dbgjaw6a
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting python-arango>=7.3.3
  Downloading python_arango-7.3.4-py3-none-any.whl (96 kB)
[K     |████████████████████████████████| 96 kB 3.6 MB/s 
Collecting requests-toolbelt
  Downloading requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
[K     |████████████████████████████████| 54 kB 3.4 MB/s 
Collecting PyJWT
  Downloading PyJWT-2.4.0-py3-none-any.whl (18 kB)
Building wheels for collected packages: adbcug-adapter
  Building wheel for adbcug-adapter (PEP 517) ... [

In [42]:
# All imports

import cudf
import cugraph

from adbcug_adapter import ADBCUG_Adapter, ADBCUG_Controller
from adbcug_adapter.typings import CUGId, Json

from arango import ArangoClient
from adb_cloud_connector import get_temp_credentials

import json
import logging
import io, requests
from typing import List

# Understanding cuGraph & cuDF

(referenced from [docs.rapids.ai](https://docs.rapids.ai/))

RAPIDS cuGraph is a library of graph algorithms that seamlessly integrates into the RAPIDS data science ecosystem and allows the data scientist to easily call graph algorithms using data stored in GPU DataFrames, NetworkX Graphs, or even CuPy or SciPy sparse Matrices.


Here is an example of creating a simple weighted graph:

In [8]:
cug_graph = cugraph.Graph()

df = cudf.DataFrame(
  [('a', 'b', 5), ('a', 'c', 1), ('a', 'd', 4), ('b', 'c', 3), ('c', 'd', 2)],
  columns=['src', 'dst', 'weight']
)

cug_graph.from_cudf_edgelist(
    df,
    source='src',
    destination='dst',
    edge_attr='weight'
)

print('\n--------------------')
print(cug_graph.nodes())
print('\n--------------------')
print(cug_graph.edges())


--------------------
0    c
1    b
2    d
3    a
Name: 0, dtype: object

--------------------
  src dst
0   c   d
1   a   b
2   a   c
3   a   d
4   b   c


RAPIDS cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. It provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:

In [36]:
# Load a dataset into a GPU memory resident DataFrame and perform a basic calculation.
# Everything from CSV parsing to calculating tip percentage and computing a grouped average is done on the GPU.

# download CSV file from GitHub
url="https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')

# read CSV from memory
tips_df = cudf.read_csv(io.StringIO(content))
tips_df['tip_percentage'] = tips_df['tip']/tips_df['total_bill']*100

# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())

size
6    15.622920
1    21.729202
4    14.594901
3    15.215685
2    16.571919
5    14.149549
Name: tip_percentage, dtype: float64


The following snippet loads data into a cuGraph graph and computes PageRank:

In [38]:
# read data into a cuDF DataFrame using read_csv
gdf = cudf.read_csv(io.StringIO(content), names=["src", "dst"], dtype=["int32", "int32"])

# We now have data as edge pairs
# create a Graph using the source (src) and destination (dst) vertex pairs
G = cugraph.Graph()
G.from_cudf_edgelist(gdf, source='src', destination='dst')

# Let's now get the PageRank score of each vertex by calling cugraph.pagerank
df_page = cugraph.pagerank(G)

# Let's look at the PageRank Score (only do this on small graphs)
for i in range(len(df_page)):
	print("vertex " + str(df_page['vertex'].iloc[i]) +
		" PageRank is " + str(df_page['pagerank'].iloc[i]))

     src  dst
0      0    0
1     16    1
2     10    1
3     21    3
4     23    3
..   ...  ...
240   29    5
241   27    2
242   22    2
243   17    1
244   18    3

[245 rows x 2 columns]
vertex 8 PageRank is 0.011164907
vertex 9 PageRank is 0.01570843
vertex 23 PageRank is 0.02324468
vertex 28 PageRank is 0.015321213
vertex 34 PageRank is 0.01548651
vertex 26 PageRank is 0.011081555
vertex 27 PageRank is 0.015224752
vertex 29 PageRank is 0.011332559
vertex 30 PageRank is 0.015167612
vertex 32 PageRank is 0.019350136
vertex 31 PageRank is 0.015141399
vertex 38 PageRank is 0.011083909
vertex 48 PageRank is 0.01612841
vertex 35 PageRank is 0.011332559
vertex 40 PageRank is 0.011083909
vertex 0 PageRank is 0.022727273
vertex 16 PageRank is 0.018839726
vertex 13 PageRank is 0.014973748
vertex 15 PageRank is 0.014973748
vertex 10 PageRank is 0.01852377
vertex 12 PageRank is 0.011164907
vertex 17 PageRank is 0.018839726
vertex 20 PageRank is 0.01903359
vertex 18 PageRank is 0.018839726
v

# Create a Temporary ArangoDB Cloud Instance

In [12]:
# Request temporary instance from the managed ArangoDB Cloud Service.
con = get_temp_credentials()
print(json.dumps(con, indent=2))

# Connect to the instance via the python-arango driver
db = ArangoClient(hosts=con["url"]).db(con["dbName"], con["username"], con["password"], verify=True)

Log: requesting new credentials...
Succcess: new credentials acquired
{
  "dbName": "TUT4mnzcrc61phw4we9wpp7tg",
  "username": "TUTka66dmg6q2eb3vgyb1ywo",
  "password": "TUTxoj1trkqv5pzhfwjn8h2mh",
  "hostname": "tutorials.arangodb.cloud",
  "port": 8529,
  "url": "https://tutorials.arangodb.cloud:8529"
}


Feel free to use the above URL to check out the UI!

# Import Sample Data

For demo purposes, we will be using the [ArangoDB Fraud Detection example graph](https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Fraud_Detection.ipynb), and the [ArangoDB IMDB Dataset](https://github.com/arangodb/example-datasets/tree/master/Graphs/IMDB).

In [13]:
!chmod -R 755 cugraph-adapter/
!./cugraph-adapter/tests/assets/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --replication-factor 3  --input-directory "cugraph-adapter/examples/data/fraud_dump" --include-system-collections true
!./cugraph-adapter/tests/assets/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --replication-factor 3  --input-directory "cugraph-adapter/examples/data/imdb_dump" --include-system-collections true

[0m2022-05-25T16:40:19Z [12055] INFO [05c30] {restore} Connected to ArangoDB 'http+ssl://tutorials.arangodb.cloud:8529'
[0m[0m2022-05-25T16:40:19Z [12055] INFO [abeb4] {restore} Database name in source dump is 'fraud-detection'
[0m[0m2022-05-25T16:40:19Z [12055] INFO [9b414] {restore} # Re-creating document collection '_analyzers'...
[0m[0m2022-05-25T16:40:19Z [12055] INFO [9b414] {restore} # Re-creating document collection '_appbundles'...
[0m[0m2022-05-25T16:40:24Z [12055] INFO [9b414] {restore} # Re-creating document collection '_apps'...
[0m[0m2022-05-25T16:40:24Z [12055] INFO [9b414] {restore} # Re-creating document collection '_aqlfunctions'...
[0m[0m2022-05-25T16:40:24Z [12055] INFO [9b414] {restore} # Re-creating document collection '_graphs'...
[0m[0m2022-05-25T16:40:24Z [12055] INFO [9b414] {restore} # Re-creating document collection '_modules'...
[0m[0m2022-05-25T16:40:26Z [12055] INFO [9b414] {restore} # Re-creating document collection 'account'...
[0m[0m

# Instantiate the Adapter

Connect the ArangoDB-cuGraph Adapter to our database client:

In [14]:
adbcug_adapter = ADBCUG_Adapter(db)

[2022/05/25 16:40:48 +0000] [2488] [INFO] - adbcug_adapter: Instantiated ADBCUG_Adapter with database 'TUT4mnzcrc61phw4we9wpp7tg'


# <u>ArangoDB to cuGraph</u>



#### Via ArangoDB Graph Name

Data source
* ArangoDB Fraud-Detection Graph

Package methods used
* [`adbcug_adapter.adapter.arangodb_graph_to_cugraph()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/adapter.py)

Important notes
* The graph `name` must point to an existing ArangoDB graph
* cuGraph does not support node or edge attributes (apart from edge weight)
* If an ArangoDB edge has an attribute named `weight`, its value will be transferred over to the cuGraph graph. Otherwise, the cuGraph edge weight will default to `0`.

In [63]:
# Define graph name
graph_name = "fraud-detection"

# Create cuGraph graph from ArangoDB graph name
cug_graph = adbcug_adapter.arangodb_graph_to_cugraph(graph_name)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# cug_graph = adbcug_adapter.arangodb_graph_to_cugraph(graph_name, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------------')
print(cug_graph.nodes())
print('\n--------------------')
print(cug_graph.edges())

[2022/05/25 17:05:57 +0000] [2488] [DEBUG] - adbcug_adapter: Starting arangodb_to_cugraph(fraud-detection, ...):
[2022/05/25 17:05:57 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 'account' vertices
[2022/05/25 17:05:57 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 'customer' vertices
[2022/05/25 17:05:57 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 'accountHolder' edges
[2022/05/25 17:05:57 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 'transaction' edges
[2022/05/25 17:05:57 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting 116 edges
[2022/05/25 17:05:57 +0000] [2488] [INFO] - adbcug_adapter: Created cuGraph 'fraud-detection' Graph



--------------------
0      account/10000001
1      account/10000002
2      account/10000003
3      account/10000004
4      account/10000005
            ...        
66    customer/10000013
67    customer/10000014
68    customer/10000015
69    customer/10000016
70       customer/10810
Length: 71, dtype: object

--------------------
                  src                dst
0    account/10000022  customer/10000006
1    account/10000001  customer/10000008
2    account/10000034  customer/10000012
3    account/10000032  customer/10000011
4    account/10000027  customer/10000002
..                ...                ...
111  account/10000006   account/10000003
112  account/10000022   account/10000021
113  account/10000032   account/10000035
114  account/10000040   account/10000043
115  account/10000016   account/10000015

[116 rows x 2 columns]


#### Via ArangoDB Collection Names

Data source
* ArangoDB Fraud-Detection Collections

Package methods used
* [`adbcug_adapter.adapter.arangodb_collections_to_cugraph()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/adapter.py)

Important notes
* The `vertex_collections` & `edge_collections` parameters must point to existing ArangoDB collections within your ArangoDB instance.
* cuGraph does not support node or edge attributes (apart from edge weight)
* If an ArangoDB edge has an attribute named `weight`, its value will be transferred over to the cuGraph graph. Otherwise, the cuGraph edge weight will default to `0`.

In [17]:
# Define collection
vertex_collections = {"account", "bank", "branch", "Class", "customer"}
edge_collections = {"accountHolder", "Relationship", "transaction"}

# Create NetworkX graph from ArangoDB collections
cug_graph = adbcug_adapter.arangodb_collections_to_cugraph("fraud-detection", vertex_collections, edge_collections)

# You can also provide valid Python-Arango AQL query options to the command above, like such:
# cug_graph = adbcug_adapter.arangodb_collections_to_cugraph, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute

# Show graph data
print('\n--------------------')
print(cug_graph.nodes())
print('\n--------------------')
print(cug_graph.edges())

[2022/05/25 16:43:45 +0000] [2488] [INFO] - adbcug_adapter: Created cuGraph 'fraud-detection' Graph



--------------------
0         Class/account
1            Class/bank
2          Class/branch
3        Class/customer
4      account/10000001
            ...        
70    customer/10000013
71    customer/10000014
72    customer/10000015
73    customer/10000016
74       customer/10810
Length: 75, dtype: object

--------------------
                  src                dst
0    account/10000022  customer/10000006
1    account/10000001  customer/10000008
2    account/10000034  customer/10000012
3    account/10000032  customer/10000011
4    account/10000027  customer/10000002
..                ...                ...
115  account/10000016   account/10000015
116     Class/account      Class/account
117     Class/account     Class/customer
118    Class/customer       Class/branch
119      Class/branch         Class/bank

[120 rows x 2 columns]


#### Via ArangoDB Graph Name with a custom ADBCUG_Controller & verbose logging

Data source
* ArangoDB Fraud-Detection Collections

Package methods used
* [`adbcug_adapter.adapter.arangodb_graph_to_cugraph()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/adapter.py)
* [`adbcug_adapter.controller._prepare_arangodb_vertex()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/controller.py)

Important notes
* We are creating a custom `ADBCUG_Controller` to specify *how* to convert our ArangoDB vertex IDs into cuGraph node IDs. View the default `ADBCUG_Controller` [here](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/controller.py).
* Using a custom ADBCUG Controller for `ArangoDB --> cuGraph` is optional. However, a custom ADBCUG Controller for `cuGraph --> ArangoDB` functionality is almost always needed, at the exception of Homogeneous graphs, and graphs where the node IDs are already formatted to the ArangoDB vertex ID standard (i.e `collection/_key`)

In [19]:
# Define metagraph
graph_name = "fraud-detection"

class Custom_ADBCUG_Controller(ADBCUG_Controller):
    """ArangoDB-cuGraph controller.

    Responsible for controlling how nodes & edges are handled when
    transitioning from ArangoDB to cuGraph.

    You can derive your own custom ADBCUG_Controller.
    """

    def _prepare_arangodb_vertex(self, adb_vertex: Json, col: str) -> None:
        """Prepare an ArangoDB vertex before it gets inserted into the cuGraph
        graph.

        Given an ArangoDB vertex, you can modify it before it gets inserted
        into the cuGraph graph, and/or derive a custom node id for cuGraph
        to use by updating the "_id" attribute of the vertex (otherwise the
        vertex's current "_id" value will be used)

        :param adb_vertex: The ArangoDB vertex object to (optionally) modify.
        :type adb_vertex: adbcug_adapter.typings.Json
        :param col: The ArangoDB collection the vertex belongs to.
        :type col: str
        """
        # Custom behaviour: Add a "_new" prefix to every vertex ID
        adb_vertex["_id"] = "new_" + adb_vertex["_id"]

# Instantiate a new adapter with the custom controller
custom_adbcug_adapter = ADBCUG_Adapter(db, controller=Custom_ADBCUG_Controller())

# You can also change the adapter's logging level for access to 
# silent, regular, or verbose logging (logging.WARNING, logging.INFO, logging.DEBUG)
custom_adbcug_adapter.set_logging(logging.DEBUG) # verbose logging

# Create cuGraph Graph an ArangoDB graph using the custom adapter
cug_graph = custom_adbcug_adapter.arangodb_graph_to_cugraph("fraud-detection")

# Show graph data
print('\n--------------------')
print(cug_graph.nodes())
print('\n--------------------')
print(cug_graph.edges())

[2022/05/25 16:44:10 +0000] [2488] [INFO] - adbcug_adapter: Instantiated ADBCUG_Adapter with database 'TUT4mnzcrc61phw4we9wpp7tg'
[2022/05/25 16:44:10 +0000] [2488] [DEBUG] - adbcug_adapter: Starting arangodb_to_cugraph(fraud-detection, ...):
[2022/05/25 16:44:10 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 'account' vertices
[2022/05/25 16:44:10 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 'customer' vertices
[2022/05/25 16:44:10 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 'accountHolder' edges
[2022/05/25 16:44:10 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 'transaction' edges
[2022/05/25 16:44:10 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting 116 edges
[2022/05/25 16:44:10 +0000] [2488] [INFO] - adbcug_adapter: Created cuGraph 'fraud-detection' Graph



--------------------
0      new_account/10000001
1      new_account/10000002
2      new_account/10000003
3      new_account/10000004
4      new_account/10000005
              ...          
66    new_customer/10000013
67    new_customer/10000014
68    new_customer/10000015
69    new_customer/10000016
70       new_customer/10810
Length: 71, dtype: object

--------------------
                      src                    dst
0    new_account/10000022  new_customer/10000006
1    new_account/10000001  new_customer/10000008
2    new_account/10000034  new_customer/10000012
3    new_account/10000032  new_customer/10000011
4    new_account/10000027  new_customer/10000002
..                    ...                    ...
111  new_account/10000006   new_account/10000003
112  new_account/10000022   new_account/10000021
113  new_account/10000032   new_account/10000035
114  new_account/10000040   new_account/10000043
115  new_account/10000016   new_account/10000015

[116 rows x 2 columns]


# <u>cuGraph to ArangoDB</u>

#### Karate Graph

Data source
* [cuGraph 22.06 Datasets](https://github.com/rapidsai/cugraph/blob/branch-22.06/datasets/karate.csv)

Package methods used
* [`adbcug_adapter.adapter.cugraph_to_arangodb()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/adapter.py)

Important notes
* A custom `ADBCUG Controller` is **not** required here. This is because the karate graph only has 1 vertex collection (`karateka`), and 1 edge collection (`knows`). See the edge definitions below 

In [60]:
# Fetch Karate Club data
!wget https://raw.githubusercontent.com/rapidsai/cugraph/branch-22.06/datasets/karate.csv
dataframe = cudf.read_csv("karate-data.csv", delimiter='\t', names=['src', 'dst'], dtype=['int32', 'int32'] )

# Create the cuGraph graph
cug_graph = cugraph.Graph()
cug_graph.from_cudf_edgelist(dataframe, source='src', destination='dst')

# Specify ArangoDB edge definitions
edge_definitions = [
    {
        "edge_collection": "knows",
        "from_vertex_collections": ["karateka"],
        "to_vertex_collections": ["karateka"],
    }
]

# Create ArangoDB graph from cuGraph
name = "KarateClubGraph"
db.delete_graph(name, drop_collections=True, ignore_missing=True)
adb_graph = adbcug_adapter.cugraph_to_arangodb(name, cug_graph, edge_definitions)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}")

--2022-05-25 17:05:34--  https://raw.githubusercontent.com/rapidsai/cugraph/branch-22.06/datasets/karate.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1434 (1.4K) [text/plain]
Saving to: ‘karate.csv.4’


2022-05-25 17:05:34 (23.8 MB/s) - ‘karate.csv.4’ saved [1434/1434]



[2022/05/25 17:05:34 +0000] [2488] [DEBUG] - adbcug_adapter: Starting cugraph_to_arangodb('KarateClubGraph', ...):
[2022/05/25 17:05:34 +0000] [2488] [DEBUG] - adbcug_adapter: Is graph 'KarateClubGraph' homogeneous? True
[2022/05/25 17:05:34 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 34 cugraph nodes
[2022/05/25 17:05:34 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 78 cugraph edges
[2022/05/25 17:05:34 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting last 34 documents into 'karateka'
[2022/05/25 17:05:34 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting last 78 documents into 'knows'
[2022/05/25 17:05:34 +0000] [2488] [INFO] - adbcug_adapter: Created ArangoDB 'KarateClubGraph' Graph



--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTka66dmg6q2eb3vgyb1ywo
Password: TUTxoj1trkqv5pzhfwjn8h2mh
Database: TUT4mnzcrc61phw4we9wpp7tg
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUT4mnzcrc61phw4we9wpp7tg/_admin/aardvark/index.html#graph/KarateClubGraph


#### Divisibility Graph

Data source
* No source

Package methods used
* [`adbcug_adapter.adapter.cugraph_to_arangodb()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/adapter.py)

Important notes
* Even if this graph has more than 1 vertex collection, a custom `ADBCUG Controller` is still **not** required here. This is because the cuGraph Node IDs are already formatted to ArangoDB standard, so the default ADBCUG Controller will take care of node identification (see [`_identify_cugraph_node()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/controller.py))

In [61]:
# Create the cuGraph graph
cug_graph = cugraph.MultiGraph(directed=True)
cug_graph.from_cudf_edgelist(
    cudf.DataFrame(
        [
            (f"numbers_j/{j}", f"numbers_i/{i}", j / i)
            for i in range(1, 101)
            for j in range(1, 101)
            if j % i == 0
        ],
        columns=["src", "dst", "weight"],
    ),
    source="src",
    destination="dst",
    edge_attr="weight",
    renumber=False,
)

# Specify ArangoDB edge definitions
edge_definitions = [
    {
        "edge_collection": "is_divisible_by",
        "from_vertex_collections": ["numbers_j"],
        "to_vertex_collections": ["numbers_i"],
    }
]

# Create ArangoDB graph from cuGraph
name = "DivisibilityGraph"
db.delete_graph(name, drop_collections=True, ignore_missing=True)
adb_graph = adbcug_adapter.cugraph_to_arangodb(name, cug_graph, edge_definitions, keyify_nodes=True)


print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}")

[2022/05/25 17:05:44 +0000] [2488] [DEBUG] - adbcug_adapter: Starting cugraph_to_arangodb('DivisibilityGraph', ...):
[2022/05/25 17:05:45 +0000] [2488] [DEBUG] - adbcug_adapter: Is graph 'DivisibilityGraph' homogeneous? False
[2022/05/25 17:05:45 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 200 cugraph nodes
[2022/05/25 17:05:45 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 482 cugraph edges
[2022/05/25 17:05:45 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting last 100 documents into 'numbers_i'
[2022/05/25 17:05:45 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting last 100 documents into 'numbers_j'
[2022/05/25 17:05:45 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting last 482 documents into 'is_divisible_by'
[2022/05/25 17:05:45 +0000] [2488] [INFO] - adbcug_adapter: Created ArangoDB 'DivisibilityGraph' Graph



--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTka66dmg6q2eb3vgyb1ywo
Password: TUTxoj1trkqv5pzhfwjn8h2mh
Database: TUT4mnzcrc61phw4we9wpp7tg
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUT4mnzcrc61phw4we9wpp7tg/_admin/aardvark/index.html#graph/DivisibilityGraph


#### School Graph with a custom ADBCUG_Controller

Data source
* No source, the graph data is arbitrary

Package methods used
* [`adbcug_adapter.adapter.cugraph_to_arangodb()`](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/adapter.py)

Important notes
* Here we demonstrate the functionality of having a custom `ADBCUG_Controller`, that overrides the [default ADBCUG_Controller](https://github.com/arangoml/cugraph-adapter/blob/master/adbcug_adapter/controller.py).
* Recall that a custom ADBCUG Controller for `cuGraph --> ArangoDB` functionality is almost always needed, at the exception of Homogeneous graphs, and graphs where the node IDs are already formatted to the ArangoDB vertex ID standard (i.e `collection/_key`)

In [62]:
# Load some arbitary data
df = cudf.DataFrame(
  [
   ('student:101', 'lecture:101'), 
   ('student:102', 'lecture:102'), 
   ('student:103', 'lecture:103'), 
   ('student:103', 'student:101'), 
   ('student:103', 'student:102'),
   ('teacher:101', 'lecture:101'),
   ('teacher:102', 'lecture:102'),
   ('teacher:103', 'lecture:103'),
   ('teacher:101', 'teacher:102'),
   ('teacher:102', 'teacher:103')
  ],
  columns=['src', 'dst']
)

# Create the cuGraph graph
cug_graph = cugraph.MultiGraph(directed=True)
cug_graph.from_cudf_edgelist(df, source='src', destination='dst')

# Specify ArangoDB edge definitions
edge_definitions = [
    {
        "edge_collection": "attends",
        "from_vertex_collections": ["student"],
        "to_vertex_collections": ["lecture"],
    },
    {
        "edge_collection": "classmate",
        "from_vertex_collections": ["student"],
        "to_vertex_collections": ["student"],
    },
    {
        "edge_collection": "teaches",
        "from_vertex_collections": ["teacher"],
        "to_vertex_collections": ["lecture"],
    },
    {
        "edge_collection": "colleague",
        "from_vertex_collections": ["teacher"],
        "to_vertex_collections": ["teacher"],
    }
]


# Given our graph is heterogeneous, and has a non-ArangoDB way of
# formatting its Node IDs, we must derive a custom ABCCUG Controller
# to handle this behavior.
class Custom_ADBCUG_Controller(ADBCUG_Controller):
  """ArangoDB-cuGraph controller.

  Responsible for controlling how nodes & edges are handled when
  transitioning from ArangoDB to cuGraph.

  You can derive your own custom ADBCUG_Controller.
  """

  def _identify_cugraph_node(self, cug_node_id: CUGId, adb_v_cols: List[str]) -> str:
    """Given a cuGraph node, and a list of ArangoDB vertex collections defined,
    identify which ArangoDB vertex collection it should belong to.

    NOTE: You must override this function if len(**adb_v_cols**) > 1
    OR **cug_node_id* does NOT comply to ArangoDB standards
    (i.e "{collection}/{key}").

    :param cug_node_id: The cuGraph ID of the vertex.
    :type cug_node_id: adbcug_adapter.typings.CUGId
    :param adb_v_cols: All ArangoDB vertex collections specified
        by the **edge_definitions** parameter of cugraph_to_arangodb()
    :type adb_v_cols: List[str]
    :return: The ArangoDB collection name
    :rtype: str
    """
    return str(cug_node_id).split(":")[0] # Identify node based on ':' split

  def _identify_cugraph_edge(
      self,
      from_cug_node: Json,
      to_cug_node: Json,
      adb_e_cols: List[str],
  ) -> str:
    """Given a pair of connected cuGraph nodes, and a list of ArangoDB
    edge collections defined, identify which ArangoDB edge collection it
    should belong to.

    NOTE: You must override this function if len(**adb_e_cols**) > 1.

    NOTE #2: The pair of associated cuGraph nodes can be accessed
    by the **from_cug_node** & **to_cug_node** parameters, and are guaranteed
    to have the following attributes: `{"cug_id", "adb_id", "adb_col", "adb_key"}`

    :param from_cug_node: The cuGraph node representing the edge source.
    :type from_cug_node: adbcug_adapter.typings.Json
    :param to_cug_node: The cuGraph node representing the edge destination.
    :type to_cug_node: adbcug_adapter.typings.Json
    :param adb_e_cols: All ArangoDB edge collections specified
        by the **edge_definitions** parameter of
        ADBCUG_Adapter.cugraph_to_arangodb()
    :type adb_e_cols: List[str]
    :return: The ArangoDB collection name
    :rtype: str
    """
    from_col = from_cug_node["adb_col"] # From node collection
    to_col = to_cug_node["adb_col"] # To node collection

    if from_col == "student" and to_col == "lecture":
      return "attends"
    elif from_col == to_col == "student":
      return "classmate"
    elif from_col == "teacher" and to_col == "lecture":
      return "teaches"
    elif from_col == to_col == "teacher":
      return "colleague"
    else:
      raise ValueError(f"Unknown edge relationship between {from_cug_node} and {to_cug_node}")

  def _keyify_cugraph_node(self, cug_node_id: CUGId, col: str) -> str:
    """Given a cuGraph node, derive its valid ArangoDB key.

    NOTE: You can override this function if you want to create custom ArangoDB _key
    values from your cuGraph nodes. To enable the use of this method, enable the
    **keyify_nodes** parameter in ADBCUG_Adapter.cugraph_to_arangodb().

    :param cug_node_id: The cuGraph node id.
    :type cug_node_id: adbcug_adapter.typings.CUGId
    :param col: The ArangoDB collection the vertex belongs to.
    :type col: str
    :return: A valid ArangoDB _key value.
    :rtype: str
    """
    return str(cug_node_id).split(":")[1] # Keyify node based on ':' split


# Instantiate the adapter
custom_adbcug_adapter = ADBCUG_Adapter(db, Custom_ADBCUG_Controller())
custom_adbcug_adapter.set_logging(logging.DEBUG) # Update logging to verbose

# Create the ArangoDB graph
name = "SchoolGraph"
db.delete_graph(name, drop_collections=True, ignore_missing=True)
adb_g = custom_adbcug_adapter.cugraph_to_arangodb(name, cug_graph, edge_definitions, keyify_nodes=True)

print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}")


[2022/05/25 17:05:50 +0000] [2488] [INFO] - adbcug_adapter: Instantiated ADBCUG_Adapter with database 'TUT4mnzcrc61phw4we9wpp7tg'
[2022/05/25 17:05:51 +0000] [2488] [DEBUG] - adbcug_adapter: Starting cugraph_to_arangodb('SchoolGraph', ...):
[2022/05/25 17:05:51 +0000] [2488] [DEBUG] - adbcug_adapter: Is graph 'SchoolGraph' homogeneous? False
[2022/05/25 17:05:51 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 9 cugraph nodes
[2022/05/25 17:05:51 +0000] [2488] [DEBUG] - adbcug_adapter: Preparing 10 cugraph edges
[2022/05/25 17:05:51 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting last 3 documents into 'student'
[2022/05/25 17:05:51 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting last 3 documents into 'teacher'
[2022/05/25 17:05:51 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting last 3 documents into 'lecture'
[2022/05/25 17:05:51 +0000] [2488] [DEBUG] - adbcug_adapter: Inserting last 3 documents into 'attends'
[2022/05/25 17:05:51 +0000] [2488] [DEBUG] - adbcug_adapter: Insertin


--------------------
URL: https://tutorials.arangodb.cloud:8529
Username: TUTka66dmg6q2eb3vgyb1ywo
Password: TUTxoj1trkqv5pzhfwjn8h2mh
Database: TUT4mnzcrc61phw4we9wpp7tg
--------------------

View the created graph here: https://tutorials.arangodb.cloud:8529/_db/TUT4mnzcrc61phw4we9wpp7tg/_admin/aardvark/index.html#graph/SchoolGraph
