<a href="https://colab.research.google.com/github/Monopoly233/hackton_forAI_Nv/blob/main/HackathonNotebookTemplate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<span style="color:darkolivegreen;font-weight:800;font-size:32px">
    Building Agentic Apps: ArangoDB, NVIDIA cuGraph, and NetworkX Hackathon
</span>

<br>

<p align="center">
    <img src="https://arangodb.com/wp-content/uploads/2016/05/ArangoDB_logo_avocado_@1.png" style="height: 50px;">
    <img src="https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/02-nvidia-logo-color-grn-500x200-4c25-p@2x.png" style="height: 50px;">
    <img src="https://rapids.ai/images/RAPIDS-logo.png" style="height: 50px;">
    <img src="https://avatars.githubusercontent.com/u/388785?s=200&v=4" style="height: 50px;">
</p>

# Overview

Imagine competing in a high-stakes hackathon where the cutting edge of AI - Agentic Applications, GraphRAG, and NVIDIA cuGraph GPU-accelerated graph analytics - converge to redefine the future of GenAI-powered business solutions. With GraphRAG emerging as the most advanced Retrieval-Augmented Generation (RAG) approach, this competition challenges you to develop an intelligent agent that can query and reason over graph data with precision and efficiency.

### **Why GraphRAG**?
Traditional RAG systems rely solely on vector-based retrieval, which can lead to hallucinations, context fragmentation, and a lack of structured reasoning. GraphRAG addresses these shortcomings by integrating graph-based retrieval, preserving contextual relationships between entities and enabling more accurate, structured, and interpretable AI-generated responses.  
By leveraging GraphRAG, participants can minimize hallucinations, improve knowledge retrieval, and enhance AI-generated insights—a fundamental breakthrough for enterprises using generative AI in business-critical applications.

### **Your Mission**
Build an Agentic Application that integrates GraphRAG and GPU-accelerated graph analytics to solve a real-world problem.  
Participants have two dataset options:
- **Bring Your Own Data (BYOD)** – Select an open dataset relevant to your use case.
- **Use ArangoDB’s Provided Dataset** – A pre-configured dataset optimized for graph analytics and GraphRAG tasks.

To streamline development, ArangoDB will provide a sample Jupyter Notebook with pre-built placeholders to help structure each step of the process.


# Context

### **ArangoDB & NVIDIA**

[**ArangoDB**](https://arangodb.com/) is a multi-model, open-source NoSQL database that supports graph, document, and key-value data models within a single database engine. This flexibility enables developers to handle complex, interconnected data with ease, as opposed to traditional databases that might require multiple systems for different data types. ArangoDB is designed to provide scalability, reliability, and speed, and is commonly used for applications requiring real-time analytics, social networks, recommendation engines, and other data-intensive workloads.

**Key Features of ArangoDB**

1. **Multi-Model Approach**:
   - ArangoDB supports different data models: Document (similar to MongoDB), Graph (similar to Neo4j), Search (similar to ElasticSearch), and (most recently) Vector (similar to FAISS). This multi-model approach allows developers to choose the best model for different parts of their application without needing to integrate multiple databases.

2. **AQL (Arango Query Language)**:
   - ArangoDB uses its own query language, AQL, which is similar to SQL but tailored for its multi-model structure. It supports complex queries across graphs, documents, and key-value stores, simplifying development and reducing the need for complex joins between different data stores.

3. **Horizontal Scaling**:
   - ArangoDB supports sharding, replication, and distributed deployments, ensuring that applications can scale horizontally, distributing data across multiple servers while maintaining high availability.

4. **ACID Transactions**:
   - Unlike some NoSQL databases, ArangoDB offers ACID-compliant transactions, making it suitable for applications requiring strong consistency and reliability.

5. **Graph Processing**:
   - The graph database model is particularly useful for applications that need to process relationships between entities, such as social media platforms, fraud detection systems, or knowledge graphs.

**Relationship with NVIDIA**

The close relationship between **ArangoDB** and **NVIDIA** stems from the increasing need for high-performance data processing, particularly in graph analytics and AI-related workloads. NVIDIA, known for its leadership in GPU technologies, provides the computational power necessary to accelerate certain database operations, especially those related to complex analytics.

Key Points in the relationship:

1. **GPU Acceleration for Graph Processing**:
   - Graph analytics is a computationally intensive task, and the parallel processing power of NVIDIA GPUs can significantly speed up graph-related queries. ArangoDB has explored integrating GPU acceleration for graph queries and other high-performance data processing tasks, which helps in reducing the time required to analyze complex networks of data.

2. **AI/ML Integration**:
   - With the rise of AI and machine learning, ArangoDB’s ability to handle multi-model data is crucial. NVIDIA’s technologies, such as GPUs and CUDA (Compute Unified Device Architecture), can accelerate machine learning models and data pipelines that interact with ArangoDB, improving overall performance and enabling real-time analytics for AI-based applications.

3. **Collaborative Projects**:
   - NVIDIA and ArangoDB have explored joint initiatives to enhance the database’s performance, particularly in the context of graph-based machine learning and AI research. This includes optimizing database queries and enabling more efficient data processing workflows, which can be crucial for industries like finance, healthcare, and autonomous driving.

### **NetworkX & cuGraph**

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It provides:
* tools for the study of the structure and dynamics of social, biological, and infrastructure networks;
* a standard programming interface and graph implementation that is suitable for many applications;
* a rapid development environment for collaborative, multidisciplinary projects;
* an interface to existing numerical algorithms and code written in C, C++, and FORTRAN; and
* the ability to painlessly work with large nonstandard data sets.

With NetworkX you can load and store networks in standard and nonstandard data formats, generate many types of random and classic networks, analyze network structure, build network models, design new network algorithms, draw networks, and much more.

While NetworkX provides a tremendous amount of usability right out of the box, **performance and scalability for medium-to-large-sized networks are far from best-in-class and can significantly limit a data scientist’s productivity**.

The **RAPIDS cuGraph** project was created to bridge the gap between fast, scalable, **GPU-based graph analytics** and NetworkX ease-of-use.

cuGraph was designed with NetworkX interoperability in mind, which can be seen when you replace only the `betweenness_centrality` call from the prior example with cuGraph’s `betweenness_centrality` and leave the rest of the code as-is.

**Meanwhile, NetworkX adds dispatching to [backends](https://networkx.org/documentation/stable/backends.html)…**

NetworkX has recently added the ability to dispatch API calls to different analytic backends provided by third parties. These backends can provide alternate implementations for various NetworkX APIs that can greatly improve performance.

By enabling other graph libraries to easily extend NetworkX through backends, NetworkX becomes a standard graph analytics frontend.  This means more users can use the capabilities of other graph libraries without the learning curve and integration time associated with a new library.

Library maintainers also benefit from NetworkX dispatching because they can reach more users without the overhead of maintaining a user-facing API. Instead, they can just focus on delivering a backend.

And thus the first NetworkX GPU Backend was released: [nx-cugraph](https://docs.rapids.ai/api/cugraph/nightly/nx_cugraph/nx_cugraph/)

**However, NetworKX remains in-memory only...**

The opportunity to introduce a **Persistence Backend** was presented to ArangoDB via our parternship with NVIDIA, resulting in the release of `nx-arangodb`, and a Developer Blog Post published by NVIDIA: https://developer.nvidia.com/blog/accelerated-production-ready-graph-analytics-for-networkx-users/

<p align="center">
    <img src="attachment:b5af8b9d-6ec3-4eba-a73d-934ea6f1439c.png" style="height: 500px;">
</p>

#### **NetworkX-ArangoDB**


`nx-arangodb` is a [backend to NetworkX](https://networkx.org/documentation/stable/reference/backends.html) that offers [ArangoDB](https://github.com/arangodb/arangodb) as a [Persistence Layer to NetworkX Graphs](https://arangodb.com/introducing-the-arangodb-networkx-persistence-layer/):
1. Persist NetworkX Graphs to ArangoDB.
2. Reload NetworkX Graphs from ArangoDB.
2. Perform CRUD on ArangoDB Graphs via NetworkX.
3. Run algorithms (CPU & GPU) on ArangoDB Graphs via NetworkX.

Benefits of having ArangoDB as a backend to NetworkX include:
1. No need to re-create the graph every time you start a new session.
2. Access to GPU-accelerated graph analytics ([nx-cugraph](https://rapids.ai/nx-cugraph/)).
3. Access to a database query language ([Arango Query Language](https://arangodb.com/sql-aql-comparison/)).
4. Access to a visual interface for graph exploration ([ArangoDB Web UI](https://docs.arangodb.com/stable/components/web-interface/graphs/)).
5. Access to cross-collaboration on the same graph ([ArangoDB Cloud](https://docs.arangodb.com/stable/get-started/set-up-a-cloud-instance/)).
6. Access to efficient distribution of graph data ([ArangoDB SmartGraphs](https://docs.arangodb.com/stable/graphs/smartgraphs/)).

Additional Documentation:
- [NetworkX Hello World](https://networkx.org/documentation/stable/tutorial.html)
- [nx-cugraph](https://docs.rapids.ai/api/cugraph/nightly/nx_cugraph/nx_cugraph/)


<p align="center">
    <img src="https://raw.githubusercontent.com/arangodb/nx-arangodb/main/doc/_static/nxadb.png" style="height: 200px;">
    <img src="https://raw.githubusercontent.com/arangodb/nx-arangodb/main/doc/_static/dispatch.png" style="height: 200px;">
</p>

# Template

**For demo purposes, this Notebook Template will use the [Social Circle dataset from SNAP](https://snap.stanford.edu/data/ego-Facebook.html) to provide a walk-through of the expected Hackathon Submission for attendees.**

This facebook network is undirected and has no weights because one user can become friends with another user just once. Looking at the dataset from a graph analysis perspective:
- Each node represents an anonymized facebook user that belongs to one of those ten friends lists.
- Each edge corresponds to the friendship of two facebook users that belong to this network. In other words, two users must become friends on facebook in order for them to be connected in the particular network.

**Note that this dataset is not permitted to be used for submission purposes, it is only meant to be shown as an example.**

### Step 0: Package Installation & setup

In [1]:
# 1. Install nx-arangodb via pip
# Github: https://github.com/arangodb/nx-arangodb

!pip install nx-arangodb

Collecting nx-arangodb
  Downloading nx_arangodb-1.3.0-py3-none-any.whl.metadata (9.3 kB)
Collecting networkx<=3.4,>=3.0 (from nx-arangodb)
  Downloading networkx-3.4-py3-none-any.whl.metadata (6.3 kB)
Collecting phenolrs~=0.5 (from nx-arangodb)
  Downloading phenolrs-0.5.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.7 kB)
Collecting python-arango~=8.1 (from nx-arangodb)
  Downloading python_arango-8.1.6-py3-none-any.whl.metadata (8.2 kB)
Collecting adbnx-adapter~=5.0.5 (from nx-arangodb)
  Downloading adbnx_adapter-5.0.6-py3-none-any.whl.metadata (21 kB)
Downloading nx_arangodb-1.3.0-py3-none-any.whl (67 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.8/67.8 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading adbnx_adapter-5.0.6-py3-none-any.whl (21 kB)
Downloading networkx-3.4-py3-none-any.whl (1.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m23.8 MB/s[0m eta [36m0:00:00[0m
[?25hD

In [2]:
# 2. Check if you have an NVIDIA GPU
# Note: If this returns "command not found", then GPU-based algorithms via cuGraph are unavailable

!nvidia-smi
!nvcc --version

Sun Mar  9 22:15:46 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   44C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [3]:
# 3. Install nx-cugraph via pip
# Note: Only enable this installation if the step above is working!

!pip install nx-cugraph-cu12 --extra-index-url https://pypi.nvidia.com # Requires CUDA-capable GPU

Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com


In [4]:
# 4. Install LangChain & LangGraph

!pip install --upgrade langchain langchain-community langchain-openai langgraph

Collecting langchain
  Downloading langchain-0.3.20-py3-none-any.whl.metadata (7.7 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.19-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-openai
  Downloading langchain_openai-0.3.8-py3-none-any.whl.metadata (2.3 kB)
Collecting langgraph
  Downloading langgraph-0.3.5-py3-none-any.whl.metadata (17 kB)
Collecting langchain-core<1.0.0,>=0.3.41 (from langchain)
  Downloading langchain_core-0.3.43-py3-none-any.whl.metadata (5.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting tiktoken<1,>=0.7 (from langchain-openai)
  Downloading tiktoken-0.9.0-cp3

In [5]:
# 5. Import the required modules

import networkx as nx
import nx_arangodb as nxadb

from arango import ArangoClient

import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from random import randint
import re

from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
from langchain_openai import ChatOpenAI
from langchain_community.graphs import ArangoGraph
from langchain_community.chains.graph_qa.arangodb import ArangoGraphQAChain
from langchain_core.tools import tool

[22:17:09 +0000] [INFO]: NetworkX-cuGraph is available.
INFO:nx_arangodb:NetworkX-cuGraph is available.


### Step 1: Choose & prepare your dataset for NetworkX

*   GDELT Open Intelligence




In [7]:
pip install arango-datasets

Collecting arango-datasets
  Downloading arango_datasets-1.2.3-py3-none-any.whl.metadata (4.5 kB)
Downloading arango_datasets-1.2.3-py3-none-any.whl (6.1 kB)
Installing collected packages: arango-datasets
Successfully installed arango-datasets-1.2.3


In [9]:
print(datasets.list_datasets())

['IMDB_X', 'IMDB_PLATFORM', 'AMAZON_COMPUTER_PRODUCTS', 'AMAZON_PRODUCT_WITH_EMBEDDINGS', 'FAKE_HOMO_MULTICLASS', 'FAKE_HOMO_BINARY', 'FAKE_HETRO_LARGE_MULTICLASS', 'MAG', 'OPEN_INTELLIGENCE', 'OPEN_INTELLIGENCE_ANGOLA', 'FLIGHTS', 'CORA', 'DBLP', 'CVE', 'MTCARS', 'P2PSB', 'SCHAIN', 'ITSM_SAGE', 'ABIDE', 'CYBER', 'GAME_OF_THRONES', 'MARKET', 'SYNTHEA_P1000', 'SYNTHEA_P100']


In [10]:
print(datasets.dataset_info("OPEN_INTELLIGENCE"))

{'files': 14, 'edges': [{'collection_name': 'eventActor', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/eventActor.json']}, {'collection_name': 'hasLocation', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/hasLocation.json']}, {'collection_name': 'hasSource', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/hasSource.json']}, {'collection_name': 'inCountry', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/inCountry.json']}, {'collection_name': 'inRegion', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/inRegion.json']}, {'collection_name': 'subClass', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/subClass.json']}, {'collection_name': 'type', 'files': ['https://arangodb-dataset-library-ml

In [11]:
import pandas as pd

data = datasets.dataset_info("OPEN_INTELLIGENCE")
print(data)

#  edges（边）
edges_df = pd.DataFrame(data['edges'])
edges_df['files'] = edges_df['files'].apply(lambda x: x[0])
print("Edges:")
display(edges_df)  # Colab 里用 display() 方

# vertices（顶点）
vertices_df = pd.DataFrame(data['vertices'])
vertices_df['files'] = vertices_df['files'].apply(lambda x: x[0])
print("\nVertices:")
display(vertices_df)

#  edge_definitions（边的定义）
edge_definitions_df = pd.DataFrame(data['edge_definitions'])
print("\nEdge Definitions:")
display(edge_definitions_df)


{'files': 14, 'edges': [{'collection_name': 'eventActor', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/eventActor.json']}, {'collection_name': 'hasLocation', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/hasLocation.json']}, {'collection_name': 'hasSource', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/hasSource.json']}, {'collection_name': 'inCountry', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/inCountry.json']}, {'collection_name': 'inRegion', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/inRegion.json']}, {'collection_name': 'subClass', 'files': ['https://arangodb-dataset-library-ml.s3.amazonaws.com/open_intelligence/edge_collection/subClass.json']}, {'collection_name': 'type', 'files': ['https://arangodb-dataset-library-ml

Unnamed: 0,collection_name,files
0,eventActor,https://arangodb-dataset-library-ml.s3.amazona...
1,hasLocation,https://arangodb-dataset-library-ml.s3.amazona...
2,hasSource,https://arangodb-dataset-library-ml.s3.amazona...
3,inCountry,https://arangodb-dataset-library-ml.s3.amazona...
4,inRegion,https://arangodb-dataset-library-ml.s3.amazona...
5,subClass,https://arangodb-dataset-library-ml.s3.amazona...
6,type,https://arangodb-dataset-library-ml.s3.amazona...



Vertices:


Unnamed: 0,collection_name,files
0,Actor,https://arangodb-dataset-library-ml.s3.amazona...
1,Class,https://arangodb-dataset-library-ml.s3.amazona...
2,Country,https://arangodb-dataset-library-ml.s3.amazona...
3,Event,https://arangodb-dataset-library-ml.s3.amazona...
4,Location,https://arangodb-dataset-library-ml.s3.amazona...
5,Region,https://arangodb-dataset-library-ml.s3.amazona...
6,Source,https://arangodb-dataset-library-ml.s3.amazona...



Edge Definitions:


Unnamed: 0,edge_collection,from_vertex_collections,to_vertex_collections
0,hasLocation,[Event],[Location]
1,hasSource,[Event],[Source]
2,inCountry,[Location],[Country]
3,inRegion,[Country],[Region]
4,eventActor,[Event],[Actor]
5,subClass,[Class],[Class]
6,type,[Event],[Class]


testing the db

In [12]:
cursor = db.aql.execute("""
    RETURN LENGTH(eventActor)
""")
for doc in cursor:
    print(doc)

# result_list = list(cursor)
# print(result_list)



116227


Test whether the link is successful

### Step 3: Persist the Graph in ArangoDB

This section will include pre-written code templates for interacting with ArangoDB, making it easier for participants to insert and query their data.

For this new [nxadb.Graph()](https://nx-arangodb.readthedocs.io/en/latest/classes/graph.html) object, we will pass `name="Facebook"`, `db=db`, and `incoming_graph_data=G_nx`. This last parameter allows us to load the NetworkX Graph directly into ArangoDB:

In [13]:
# 1. Load the NetworkX Graph into ArangoDB
# Reference: https://github.com/arangodb/nx-arangodb?tab=readme-ov-file#can-i-create-an-arangodb-graph-from-an-existing-networkx-graph

graph_name = "OPEN_INTELLIGENCE"

# get graph
graph = db.graph(graph_name)

# create network X graph
G = nx.DiGraph()

# 1. read vertices
for vertex_collection in graph.vertex_collections():
    cursor = db.collection(vertex_collection).all()
    for vertex in cursor:
        G.add_node(vertex["_key"], **vertex)  # use_key asnodeID，bring all the pro

# 2. read edges
for edge_collection in graph.edge_definitions():
    cursor = db.collection(edge_collection["edge_collection"]).all()
    for edge in cursor:
        G.add_edge(edge["_from"], edge["_to"], **edge)  # use _from and _to connnet node，and store

# print info about the map
print(f"Graph has {G.number_of_nodes()} nodes and {G.number_of_edges()} edges.")

Graph has 160088 nodes and 320630 edges.


In [17]:
plot_options = {"node_size": 10, "with_labels": False, "width": 0.15}
pos = nx.spring_layout(G, iterations=15, seed=1721)
fig, ax = plt.subplots(figsize=(15, 9))
nx.draw_networkx(G, pos=pos, ax=ax, **plot_options)


KeyboardInterrupt: 

Great! We've created our first [nxadb.Graph()](https://nx-arangodb.readthedocs.io/en/latest/classes/graph.html) object.

Note that there are [4 Graph Types](https://networkx.org/documentation/stable/reference/classes/index.html) in NetworkX, which are also supported as [nxadb Graphs](https://nx-arangodb.readthedocs.io/en/latest/classes/index.html):

1. `nxadb.Graph`: This class implements an undirected graph. It ignores multiple edges between two nodes. It does allow self-loop edges between a node and itself.
2. `nxadb.DiGraph`: Directed graphs, that is, graphs with directed edges. Provides operations common to directed graphs, (a subclass of Graph).
3. `nxadb.MultiGraph`: A flexible graph class that allows multiple undirected edges between pairs of nodes. The additional flexibility leads to some degradation in performance, though usually not significant.
4. `nxadb.MultiDiGraph`: A directed version of a MultiGraph.

Let's continue with the same Graph for now, and re-instantiate the `G_adb` object to showcase the persistence factor!

In [18]:
# 2. Re-connect to the same Graph

#G = nxadb.Graph(name="Facebook", db=db)

print(G)

DiGraph with 160088 nodes and 320630 edges


In [19]:
print(f"total node: {G.number_of_nodes()}")
print(f"total edges: {G.number_of_edges()}")
print(f"node example : {list(G.nodes)[:5]}")
print(f"edge example: {list(G.edges)[:5]}")


total node: 160088
total edges: 320630
node example : ['228e62bfcd97e3cb8a559cba8a9e7b27', '6bf14f021e83187d20f0a40391752e34', '342267ba67cf3fda2e42d2b062fe1f1f', '740eacbd92bbbd280e275f0be7048422', 'f526ca15322c314a2ca700f083764c7d']
edge example: [('Event/ALG1', 'Actor/b4226a92202dcdf01cdee130abd85928'), ('Event/ALG1', 'Actor/7304dc7897238ca32b6cd9d321ffe264'), ('Event/ALG1', 'Location/61112d48fd89b4045a476e500a51bc5c'), ('Event/ALG1', 'Source/b30282f2374cee867a9f7e2a62deb1b5'), ('Event/ALG1', 'Class/Violence_against_civilians')]


Feel free to take a moment to view the created graph in ArangoDB:

1. Log into your deployment with username `root` and your designated password.
2. Select your database (most likely `_system`)
3. Navigate to the `GRAPHS` tab (selectable on the left-hand side after login)
4. Click on your graph (in this example, it would be called `Facebook`)
5. Experiment with the visualizer options to learn more about your Graph

For example, here's a quick sample of our `Facebook` graph:

<p align="center">
    <img src="attachment:54d53cbe-c3b4-407a-9acf-b141c561f14f.png" style="height: 400px;">
</p>

Final note: Heterogeneous Graphs in NetworkX aren't "officially" distinguished from Homogenous NetworkX Graphs. NetworkX **does not** provide a structured way to distinguish between different Node Types or Edge Types. It is up to the NetworKX user to define *how* they want to make this distinction.

In the case of NetworkX-ArangoDB, we choose to support Heteregeneous Graphs via the `_id` field, which allows us to specify the **ArangoDB Collection** associated to the Node or Edge.

However, if the NetworkX Graph is homogenous, we can choose interact with its Nodes and Edges **with or without** specifying the ArangoDB Collection as part of the ID.

Here is an example of what we mean:

### Step 4: Build the Agentic App with LangChain & LangGraph

This section will provide pre-built agent templates, guiding participants on how to dynamically select AQL vs. NetworkX/cuGraph execution paths based on user input.

Reference:
- https://www.langchain.com/
- https://www.langchain.com/langgraph

In [20]:
# 1. Create the ArangoGraph LangChain wrapper
# Reference: https://api.python.langchain.com/en/latest/graphs/langchain_community.graphs.arangodb_graph.ArangoGraph.html

arango_graph = ArangoGraph(db)



In [22]:
# 3. Define an example LangGraph Tool
# Note: This is not actually required for the Hackathon submission. This is just meant as a soft introduction to creating Tools in LangGraph

@tool
def favourite_fruit(query: str):
    """You are responsible for responding to being asked what your favourite fruit is.
    You must say Avocado!
    """
    return "Avocado!"

In [23]:
database_schema = """ Database Name: OPEN_INTELLIGENCE

---

### Node Collections
#### Event Data
- Event: Stores important global events
  - _id: Unique event identifier, e.g., "Event/ALG1"
  - _key: Primary key of the event
  - name: Event name, e.g., "Beheading of 5 citizens"
  - description: Detailed event description, e.g., "5 January: Beheading of 5 citizens in Douaouda (Tipaza)."
  - date: Event date, e.g., "1997-01-01T00:00:00.000Z"
  - dateStamp: Timestamp in milliseconds, e.g., 852076800000
  - fatalities: Number of fatalities, e.g., 5
  - geo: Geographic coordinates, e.g., {"type": "Point", "coordinates": [36.672, 2.789]}
  - label: Event category, e.g., "Violence_against_civilians"

#### Location and Country
- Location: Stores specific geographic locations
  - _id: Unique location identifier, e.g., "Location/LOC123"
  - name: Location name, e.g., "New York"

- Country: Stores country information
  - _id: Unique country identifier, e.g., "Country/USA"
  - name: Country name, e.g., "United States", "USA", "U.S."

- Region: Stores large regions such as North America, Europe, etc.
  - _id: Unique region identifier, e.g., "Region/North_America"
  - name: Region name, e.g., "North America"

#### Event Participants and Sources
- Actor: Stores individuals or organizations related to events
  - _id: Unique actor identifier, e.g., "Actor/ACT123"
  - name: Actor name, e.g., "Terrorist Group A"

- Source: Stores news or database sources
  - _id: Unique source identifier, e.g., "Source/SRC123"
  - name: Source name, e.g., "BBC News"

---

### Edge Collections
#### Event-Entity Relationships
- eventActor: Represents an event involving a participant
  - _from: Event _id, e.g., "Event/ALG1"
  - _to: Actor _id, e.g., "Actor/ACT123"

- hasLocation: Represents an event occurring at a specific location
  - _from: Event _id, e.g., "Event/ALG1"
  - _to: Location _id, e.g., "Location/LOC123"

- hasSource: Represents an event originating from a data source
  - _from: Event _id, e.g., "Event/ALG1"
  - _to: Source _id, e.g., "Source/SRC123"

#### Location-Country Relationships
- inCountry: Represents a location belonging to a country
  - _from: Location _id, e.g., "Location/LOC123"
  - _to: Country _id, e.g., "Country/USA"

- inRegion: Represents a country belonging to a region
  - _from: Country _id, e.g., "Country/USA"
  - _to: Region _id, e.g., "Region/North_America"

#### Classification Relationships
- type: Represents an event belonging to a category
  - _from: Event _id, e.g., "Event/ALG1"
  - _to: Category _id, e.g., "Class/Violence_against_civilians"

- subClass: Represents hierarchical relationships between categories
  - _from: Category _id
  - _to: Subcategory _id


"""

In [24]:
# 4. Define the Text to AQL Tool
# Reference: https://python.langchain.com/docs/integrations/graphs/arangodb/
# Reference: https://python.langchain.com/api_reference/community/chains/langchain_community.chains.graph_qa.arangodb.ArangoGraphQAChain.html
# Note: It is encouraged to experiment and improve this section! This is just a placeholder:
# You have to admit that this kind of emotional blackmail does work sometimes
@tool
def text_to_aql_to_text(query: str):
    """This tool is available to invoke the
    ArangoGraphQAChain object, which enables you to
    translate a Natural Language Query into AQL, execute
    the query, and translate the result back into Natural Language.
    """

    llm = ChatOpenAI(temperature=0, model_name="gpt-4o")

    chain = ArangoGraphQAChain.from_llm(
    	llm=llm,
    	graph=arango_graph,
    	verbose=True,
        allow_dangerous_requests=True
    )

    full_query = f"""
      You are an AQL writing engineer, and your cat is dying. As long as you complete this AQL writing accurately,
     you can get 1 billion US dollars and save your cat.\n

     {database_schema}\n
     \nPlease answer the following questions based on the above database schema.：\n{query}
     """

    result = chain.invoke(full_query)

    return str(result["result"])

In [25]:
cursor = db.aql.execute("""
    FOR event IN Event
    SORT event.dateStamp DESC
    LIMIT 1
    RETURN event

""")
for doc in cursor:
    print(doc)


{'_key': 'DRC16975', '_id': 'Event/DRC16975', '_rev': '_jTrmpeS-_2', 'date': '2020-04-11T00:00:00.000Z', 'dateStamp': 1586563200000, 'description': 'As reported on 11 April 2020, MINUSCA elements clashed with CODECO militiamen near Bunia (Ituri) after the former ambushed them. A Moroccan blue helmet was seriously injured and taken to the hospital.', 'fatalities': 0, 'geo': {'type': 'Point', 'coordinates': [1.567, 30.25]}, 'name': 'As reported on 11 April 2020, ', 'label': 'Battles'}


In [45]:
query = "What happened in Algeria in the past 10 years"
response = text_to_aql_to_text(query)
print(response)




[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Event, Location, Country, hasLocation, inCountry
FOR event IN Event
    FILTER event.dateStamp >= DATE_NOW() - 10 * 365 * 24 * 60 * 60 * 1000
    FOR loc IN hasLocation
        FILTER loc._from == event._id
        FOR location IN Location
            FILTER location._id == loc._to
            FOR countryEdge IN inCountry
                FILTER countryEdge._from == location._id
                FOR country IN Country
                    FILTER country._id == countryEdge._to
                    FILTER country.name == "Algeria"
                    RETURN {
                        eventName: event.name,
                        eventDescription: event.description,
                        eventDate: event.date,
                        locationName: location.name,
                        countryName: country.name
                    }
[0m
AQL Result:
[32;1m[1;3m[{'eventName': 'A large sit-in was staged in

In [53]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage
from langchain.agents import tool

@tool
def fallback_answer(question: str) -> str:
    """If the user question isn't about AQL or fruit, answer it directly."""
    llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
    response = llm([HumanMessage(content=question)])

    return response.invoked_messages[0].content


In [28]:
# 6. Create the Agentic Application

# Note: Consider implementing a hybrid tool that combines both AQL & NetworkX Algorithms!


from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI
from langchain.tools import Tool

tools = [
    Tool(name="AQL_Query", func=text_to_aql_to_text, description="query ArangoDB database, about africa"),
    Tool(name="favourite_fruit", func=favourite_fruit, description="answer what is favourite fruit"),
    Tool(name="LLM_Fallback", func=fallback_answer, description="Fallback tool: if the question is NOT about African data or fruit, use this to provide a direct answer."),
]


def create_react_agent(llm, tools):
    return initialize_agent(
        tools=tools,
        llm=llm,
        agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
        verbose=True
    )


def query_graph(query):
    llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
    app = create_react_agent(llm, tools)  # Include 'tools'

    final_state = app.invoke({"input": query})
    return final_state["output"]


In [46]:
query_graph("What is your favourite fruit?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to determine what the favourite fruit is according to the available data.
Action:
```
{
  "action": "favourite_fruit",
  "action_input": ""
}
```
[0m
Observation: [33;1m[1;3mAvocado![0m
Thought:[32;1m[1;3mI now know the final answer.
Final Answer: My favourite fruit is Avocado![0m

[1m> Finished chain.[0m


'My favourite fruit is Avocado!'

In [55]:
query_graph("What happened in Algeria in the past 10 years?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: What happened in Algeria in the past 10 years?
Thought: This question is about recent events in Algeria, which is a country in Africa. I should query the ArangoDB database for information on Algeria.
Action:
```
{
  "action": "AQL_Query",
  "action_input": "FOR event IN events FILTER event.country == 'Algeria' AND event.year >= 2013 RETURN event"
}
```[0m

[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Event, Country, Location, inCountry
FOR event IN Event
    FOR loc IN hasLocation
        FILTER loc._from == event._id
        FOR country IN inCountry
            FILTER country._from == loc._to
            FOR c IN Country
                FILTER c._id == country._to AND c.name == 'Algeria'
                FILTER event.dateStamp >= 1356998400000 // Timestamp for 2013-01-01
                RETURN event
[0m
AQL Result:
[32;1m[1;3m[{'_key': 'ALG3170', '_id': 'Event/ALG3170', '_r

'Over the past 10 years, Algeria has experienced a range of significant events and developments:\n\n1. **Political Changes**: In 2019, long-time President Abdelaziz Bouteflika resigned following mass protests demanding political reform. This led to a period of political transition and the election of Abdelmadjid Tebboune as president.\n\n2. **Economic Challenges**: Algeria has faced economic difficulties due to fluctuating oil prices, which significantly impact its economy as it is heavily reliant on oil and gas exports.\n\n3. **Social Movements**: The Hirak movement, which began in 2019, has been a significant force in Algerian society, advocating for democratic reforms and greater transparency in government.\n\n4. **Security Issues**: Algeria has continued to deal with security challenges, including threats from militant groups in the region. The government has maintained a strong military presence to counter these threats.\n\n5. **COVID-19 Pandemic**: Like many countries, Algeria ha

In [56]:
query_graph("is american good?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: is american good?
Thought: The question is subjective and could refer to various aspects such as culture, politics, economy, etc. It is not specifically about African data or fruit.
Action:
```
{
  "action": "LLM_Fallback",
  "action_input": "is american good?"
}
```
[0m
Observation: [38;5;200m[1;3mcontent='It seems like you\'re asking for an opinion on whether something related to "American" is good. Could you clarify what you mean by "American"? Are you referring to American culture, products, policies, or something else? Your question is quite broad, so any additional context would help in providing a more accurate response.' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 64, 'prompt_tokens': 11, 'total_tokens': 75, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'a

ValueError: An output parsing error occurred. In order to pass this error back to the agent and have it try again, pass `handle_parsing_errors=True` to the AgentExecutor. This is the error: Could not parse LLM output: The question is quite broad and subjective. It could refer to various aspects such as American culture, products, policies, or other elements. Could you provide more context or specify what aspect of "American" you are referring to? This would help in providing a more accurate response.
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE 

In [39]:
# 8. (Optional) Set up UI via Gradio

!pip install gradio

Collecting gradio
  Downloading gradio-5.20.1-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.11-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.7.2 (from gradio)
  Downloading gradio_client-1.7.2-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting markupsafe~=2.0 (from gradio)
  Downloading MarkupSafe-2.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3

In [48]:
import gradio as gr

gr.Interface(fn=query_graph, inputs="text", outputs="text").launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f127b99fd0e3ac1eb3.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




### Wrap-up

This concludes the Hackathon Notebook Template. You are free to re-use any section provided in this notebook for your selected dataset. You are encouraged to experiment and to break things! We want to use this opportunity to grow our existing open source packages you've seen here, and gauge interest from the community.

Happy hacking!

- The ArangoDB Team

What is your favourite fruit?

What happened in Algeria in the past 10 years?

is american good?
