<div style="color:darkblue;font-weight:800;font-size:48px; text-align:center">
Voyage Mind
</div>
</br>
<div style="color:darkolivegreen;font-weight:800;font-size:32px; text-align:center">
    Building Agentic Apps: ArangoDB, NVIDIA cuGraph, and NetworkX Hackathon
</div>
<br>

<p align="center">
    <img src="https://arangodb.com/wp-content/uploads/2016/05/ArangoDB_logo_avocado_@1.png" style="height: 50px;">
    <img src="https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/02-nvidia-logo-color-grn-500x200-4c25-p@2x.png" style="height: 50px;">
    <img src="https://rapids.ai/images/RAPIDS-logo.png" style="height: 50px;">
    <img src="https://avatars.githubusercontent.com/u/388785?s=200&v=4" style="height: 50px;">
</p>


# Key Technologies Used

- **ArangoDB** - To Store Graph - Nodes and Edges
- **NVIDIA cuGraph** - To run NetworkX Algorithms efficiently, on-scale
- **NetworkX** - To model the graph and run Graph Algorithms
- **OpenAI gpt-4o-mini** - As LLM
- **Langchain & Langgraph** - As Agentic Framework to Build Agentic App
- **nx-cugraph** - To run cuGraph with networkx
- **LangChain LLM Graph Transformer** - To extract Graph Entities from unstructed data

**Agentic App** uses below Agents:

1. **Data Augmentation Agent** - leverages Wikipedia to extract information about various cities as a Graph. Uses LangChain WikipediaRetriever and LLM Graph Transformer
2. **Master ReAct Agent** - Which Plans which tools to call. The ability to plan and call below tools and pass results of one to another, allows it run below queries:

* **SIMPLE QUERIES** → Dynamic AQL (via LangChain)
* **COMPLEX QUERIES** → GPU-Accelerated Graph Analytics (via NetworkX/cuGraph)
* **HYBRID QUERY EXECUTION** → Combining AQL and cuGraph for Contextual Responses

Available Tools for Master ReAct Agent:

- **Text to AQL** - Converts Natural Language to ArangoDB AQL Query
- **Text to AQL to Text** - Converts Natural Language to ArangoDB AQL Query, run it and convert the results to a Natural Language Answer
- **Text to NetworkX Algorithm** - Converts Natural Language to NetworkX Algorithm, executes it and provides response in natural Language - With Robust Re-Try Mechanism
- **NetworkX Algorithm to AQL to Text** - Converts Natural Language to NetworkX Algorithm, executes it and then passes results to execute AQL Query and then provides response in natural Language


# Key Use Cases:

## Simple Queries - Use Cases Answerable by AQL Queries

### Connectivity and Basic Path Finding
- Which cities are directly connected by flights?
- What is the shortest flight path between two specific cities?
- How many airports does each city have?

### Tourist Attraction Analysis
- Which cities have the most tourist attractions?
- What are all the tourist attractions in a specific city?
- Which cities have both airports and tourist attractions?

### Airport and Flight Statistics
- Which airports have the most incoming and outgoing flights?
- What is the total number of flights between two specific airports?
- Which cities have airports with no outgoing flights?

### Airport Connectivity Analysis
- Which airports serve as major hubs with the most connections?
- What is the average number of direct flight connections per airport?
- Which cities have airports with direct flights to the most other cities?

### Tourist Attraction Distribution
- How many tourist attractions are there per city on average?
- Which cities have the highest ratio of tourist attractions to airports?
- Are there any cities with tourist attractions but no airports?

## Complex Queries - Use Cases Requiring NetworkX Algorithms

### Complex Network Analysis
- What is the average clustering coefficient of the airport network?
- Which airports are the most central in the network based on betweenness centrality?
- What is the diameter of the flight network?

### Community Detection
- Can we identify clusters of closely connected airports?
- Are there distinct communities of cities based on their flight connections?
- How many strongly connected components exist in the flight network?

### Network Efficiency
- What is the global efficiency of the flight network?
- How does the removal of the top 5% most connected airports affect the network's efficiency?
- Which cities, if their airports were upgraded, would most improve the overall network efficiency?

## Hybrid Queries - Use Cases Requiring Both Cypher Queries and NetworkX

### Advanced Route Planning
- What is the most efficient multi-city tour covering the top 10 tourist attractions?
- Which route maximizes the number of unique tourist attractions visited within a given number of flights?
- What is the optimal flight path between two cities that includes a stopover at a city with specific tourist attractions?
- Design a 5 City Tour which covers Museums

### Comparative City Analysis
- How do cities compare in terms of their importance in the flight network versus their tourist attractions?
- Which cities serve as the best hubs for both air travel and tourism?
- What is the correlation between a city's connectivity in the flight network and its number of tourist attractions?

### Network Growth Prediction
- Based on current network structure and tourist attraction distribution, which city pairs are most likely to establish new flight routes?
- How would the addition of a new major airport in a central location affect the overall network topology and efficiency?
- Can we predict future tourist attraction development based on a city's position and growth in the flight network?

# 1. Setup Environment

In [5]:
# 1. Install nx-arangodb via pip
# Github: https://github.com/arangodb/nx-arangodb

!pip install nx-arangodb

Collecting nx-arangodb
  Using cached nx_arangodb-1.3.0-py3-none-any.whl.metadata (9.3 kB)
Using cached nx_arangodb-1.3.0-py3-none-any.whl (67 kB)
Installing collected packages: nx-arangodb
Successfully installed nx-arangodb-1.3.0


In [1]:
# 2. Check if you have an NVIDIA GPU
# Note: If this returns "command not found", then GPU-based algorithms via cuGraph are unavailable

!nvidia-smi
!nvcc --version

Sat Mar  8 18:19:00 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03              Driver Version: 560.76         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 4050 ...    On  |   00000000:01:00.0 Off |                  N/A |
| N/A   46C    P8              5W /   35W |       0MiB /   6141MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
# 3. Install nx-cugraph via pip
# Note: Only enable this installation if the step above is working!
!pip install cudf-cu12 cugraph-cu12 dask-cudf-cu12 --extra-index-url=https://pypi.nvidia.com
!pip install nx-cugraph-cu12 --extra-index-url=https://pypi.nvidia.com

# Restart the runtime (required after installation)
import os
os._exit(0)


Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com


In [1]:
# 4. Install LangChain & LangGraph

!pip install --upgrade langchain langchain-community langchain-openai langgraph faiss-gpu-cu12 nltk langchain-ollama
!pip install --upgrade matplotlib seaborn arango_datasets wikipedia langchain_community langchain-experimental



In [3]:
# 5. Checking if GPU Acceleration is available

import networkx as nx
import time
import os

# Create a large graph
G = nx.erdos_renyi_graph(10000, 0.01, seed=42)
os.environ['NX_CUGRAPH_AUTOCONFIG'] = 'False'
# Run without GPU acceleration
start_time = time.time()
pr_cpu = nx.pagerank(G)
cpu_time = time.time() - start_time
print(f"CPU Time: {cpu_time:.4f} seconds")

# Enable GPU acceleration
os.environ['NX_CUGRAPH_AUTOCONFIG'] = 'True'

# Run with GPU acceleration
start_time = time.time()
pr_gpu = nx.pagerank(G)
gpu_time = time.time() - start_time
print(f"GPU Time: {gpu_time:.4f} seconds")

# Compare results
print(f"Speedup: {cpu_time / gpu_time:.2f}x")


CPU Time: 1.8082 seconds
GPU Time: 0.7383 seconds
Speedup: 2.45x


In [1]:
# 6. Import the required modules

import networkx as nx
import nx_arangodb as nxadb

from arango import ArangoClient

import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from random import randint
import re

from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent
from langgraph.checkpoint.memory import MemorySaver
from langchain_openai import ChatOpenAI
from langchain_community.graphs import ArangoGraph
from langchain_community.chains.graph_qa.arangodb import ArangoGraphQAChain
from langchain_core.tools import tool

import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

from nltk.corpus import stopwords
import nltk
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
stopwords = stopwords.words('english')
nltk.download('punkt_tab')

[05:45:24 +0000] [INFO]: NetworkX-cuGraph is available.
[nltk_data] Downloading package stopwords to /home/ninad/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/ninad/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [2]:
# 6. Connect to the ArangoDB database
from arango.http import DefaultHTTPClient
import os
from dotenv import load_dotenv
load_dotenv(dotenv_path="./.env",
            override=True)
client = ArangoClient(
    http_client=DefaultHTTPClient(request_timeout=300)
)
db = ArangoClient(hosts=os.environ.get("ARANGO_URL")) \
        .db(username=os.environ.get("ARANGO_USER_NAME"),
            password=os.environ.get("ARANGO_PASSWORD"), verify=True)

print(db)

<StandardDatabase _system>


# 2. Import Data
We Import Data from the Arango Datasets.

**Important** - Run this section only once! Move directy to Section 3, if Data was imported once

## 2.1 Import Data from ArangoDB Datasets

In [None]:
# 1. Download the dataset
## We do not use directly load, since we add couple of properties to the vertices and edge prior to upload to DB
from arango import ArangoClient
from arango_datasets import Datasets
# Connect to datasets
datasets = Datasets(db)
info = (datasets.dataset_info("FLIGHTS"))
for e in info['edges']:
    for f in e['files']:
        print(f)

for e in  info['vertices']:
    for f in e['files']:
        print(f)
# datasets.load("FLIGHTS")

https://arangodb-dataset-library.s3.amazonaws.com/flights_dataset/edges/flights.json
https://arangodb-dataset-library.s3.amazonaws.com/flights_dataset/vertices/airports.json


In [63]:
!wget https://arangodb-dataset-library.s3.amazonaws.com/flights_dataset/edges/flights.json
!wget https://arangodb-dataset-library.s3.amazonaws.com/flights_dataset/vertices/airports.json


--2025-03-08 19:23:05--  https://arangodb-dataset-library.s3.amazonaws.com/flights_dataset/edges/flights.json
Resolving arangodb-dataset-library.s3.amazonaws.com (arangodb-dataset-library.s3.amazonaws.com)... 54.231.203.1, 54.231.140.145, 16.182.37.17, ...
Connecting to arangodb-dataset-library.s3.amazonaws.com (arangodb-dataset-library.s3.amazonaws.com)|54.231.203.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 96740607 (92M) [application/json]
Saving to: ‘flights.json’


2025-03-08 19:26:50 (429 KB/s) - ‘flights.json’ saved [96740607/96740607]

--2025-03-08 19:26:50--  https://arangodb-dataset-library.s3.amazonaws.com/flights_dataset/vertices/airports.json
Resolving arangodb-dataset-library.s3.amazonaws.com (arangodb-dataset-library.s3.amazonaws.com)... 3.5.28.27, 52.217.232.249, 3.5.28.235, ...
Connecting to arangodb-dataset-library.s3.amazonaws.com (arangodb-dataset-library.s3.amazonaws.com)|3.5.28.27|:443... connected.
HTTP request sent, awaiting respon

In [64]:
import json
with open ('airports.json', 'r') as fp:
    all_vertices = json.load(fp)
with open ('flights.json', 'r') as fp:
    all_edges = json.load(fp)

In [67]:
for r in all_vertices:
    r['collection'] = 'airports'
for r in all_edges:
    r['edge_collection'] = 'flights'
all_vertices[0], all_edges[0]


({'_key': '00M',
  '_id': 'airports/00M',
  '_rev': '_ezXpx-y---',
  'name': 'Thigpen ',
  'city': 'Bay Springs',
  'state': 'MS',
  'country': 'USA',
  'lat': 31.95376472,
  'long': -89.23450472,
  'vip': False,
  'collection': 'airports'},
 {'_key': '306520629',
  '_id': 'flights/306520629',
  '_from': 'airports/ATL',
  '_to': 'airports/CHS',
  '_rev': '_ezXqoN----',
  'Year': 2008,
  'Month': 1,
  'Day': 1,
  'DayOfWeek': 2,
  'DepTime': 2,
  'ArrTime': 57,
  'DepTimeUTC': '2008-01-01T05:02:00.000Z',
  'ArrTimeUTC': '2008-01-01T05:57:00.000Z',
  'UniqueCarrier': 'FL',
  'FlightNum': 579,
  'TailNum': 'N937AT',
  'Distance': 259,
  'edge_collection': 'flights'})

In [70]:
# Import to Arango DB
# Create Collections, Delete if exists
import pandas as pd
from tqdm import tqdm

vertices_df = pd.DataFrame(all_vertices)

edges_df = pd.DataFrame(all_edges)

vertice_names = list(vertices_df['collection'].unique())
edge_names  = list(edges_df['edge_collection'].unique())
print(vertice_names, edge_names)

for collection_name in vertice_names:
    db.delete_collection(collection_name, ignore_missing=True)
    db.create_collection(collection_name, edge=False )
for collection_name in edge_names:
    db.delete_collection(collection_name, ignore_missing=True)
    db.create_collection(collection_name, edge=True)

edges_df = edges_df.fillna('')
vertices_df = vertices_df.fillna('')
vertices_df['_key'].unique(), vertices_df['_id'].unique()


['airports'] ['flights']


(array(['00M', '00R', '00V', ..., 'ZPH', 'ZUN', 'ZZV'], dtype=object),
 array(['airports/00M', 'airports/00R', 'airports/00V', ...,
        'airports/ZPH', 'airports/ZUN', 'airports/ZZV'], dtype=object))

In [71]:
# Import the Document Collection
for collection_name in tqdm(vertice_names):
    data = pd.DataFrame(vertices_df[vertices_df['collection'] == collection_name])
    docs = data.to_dict(orient='records')
    print(docs[0])
    db.collection(collection_name).import_bulk(docs, batch_size=5000, on_duplicate="ignore", halt_on_error=False)


  0%|          | 0/1 [00:00<?, ?it/s]

{'_key': '00M', '_id': 'airports/00M', '_rev': '_ezXpx-y---', 'name': 'Thigpen ', 'city': 'Bay Springs', 'state': 'MS', 'country': 'USA', 'lat': 31.95376472, 'long': -89.23450472, 'vip': False, 'collection': 'airports'}


100%|██████████| 1/1 [00:03<00:00,  3.70s/it]


In [72]:
# Import the Edges
for collection_name in tqdm(edge_names):
    data = pd.DataFrame(edges_df[edges_df['edge_collection'] == collection_name])
    docs = data.to_dict(orient='records')
    print(collection_name, docs[0])
    db.collection(collection_name).import_bulk(docs, batch_size=5000, on_duplicate="ignore", halt_on_error=False)


  0%|          | 0/1 [00:00<?, ?it/s]

flights {'_key': '306520629', '_id': 'flights/306520629', '_from': 'airports/ATL', '_to': 'airports/CHS', '_rev': '_ezXqoN----', 'Year': 2008, 'Month': 1, 'Day': 1, 'DayOfWeek': 2, 'DepTime': 2, 'ArrTime': 57, 'DepTimeUTC': '2008-01-01T05:02:00.000Z', 'ArrTimeUTC': '2008-01-01T05:57:00.000Z', 'UniqueCarrier': 'FL', 'FlightNum': 579, 'TailNum': 'N937AT', 'Distance': 259, 'edge_collection': 'flights'}


100%|██████████| 1/1 [06:18<00:00, 378.57s/it]


In [73]:
edge_definitions = [
            {
                "edge_collection": "flights",
                "from_vertex_collections": ["airports"],
                "to_vertex_collections": ["airports"]
            }
]
db.delete_graph('FLIGHTS', ignore_missing=True, drop_collections=False)
db.create_graph('FLIGHTS', edge_definitions)

<Graph FLIGHTS>

In [74]:
# Fetch the data into a NetworkX Graph and pre-fetch the nodes and edges
G_adb = nxadb.MultiDiGraph(name="FLIGHTS", db=db)
print(G_adb)
# This will load all nodes
nodes = list(G_adb.nodes(data=True))
print(f"Fetched {len(nodes)} Nodes")
# This will load all edges
edges = list(G_adb.edges(data=True))
print(f"Fetched {len(edges)} Edges")


[19:38:06 +0000] [INFO]: Graph 'FLIGHTS' exists.
[19:38:07 +0000] [INFO]: Default node type set to 'airports'


MultiDiGraph named 'FLIGHTS' with 3375 nodes and 286463 edges
{'vertex_collections': [{'name': 'airports', 'fields': []}], 'edge_collections': [], 'database_config': {'endpoints': ['https://a02eeb6a1295.arangodb.cloud:8529'], 'database': '_system', 'username': 'root', 'password': 'mnnqy0EBZqib4PHiDel8'}, 'load_config': {'parallelism': 10, 'batch_size': 100000, 'prefetch_count': 5, 'load_all_vertex_attributes': True, 'load_all_edge_attributes': False}}
{'load_adj_dict': False, 'load_coo': False, 'is_directed': False, 'is_multigraph': False, 'symmetrize_edges_if_directed': False}
Fetched 3375 Nodes
{'vertex_collections': [{'name': 'airports', 'fields': []}], 'edge_collections': [{'name': 'flights', 'fields': []}], 'database_config': {'endpoints': ['https://a02eeb6a1295.arangodb.cloud:8529'], 'database': '_system', 'username': 'root', 'password': 'mnnqy0EBZqib4PHiDel8'}, 'load_config': {'parallelism': 10, 'batch_size': 100000, 'prefetch_count': 5, 'load_all_vertex_attributes': False, 'loa

## 2.2 Augment Data with Information of Cities
We currently have Graph for airport -> flight -> airport

We will connect to Wikipedia and use a Data Extraction Agent using GraphRAG to get a relationship of Airport -> City -> Tourist Attraction. Also populate relevant properties at each node



In [None]:
import os

from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama

llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")
# Alternatively use local LLM
ollama_llm =  ChatOllama(
    model="phi4-mini",
)


llm_transformer_filtered = LLMGraphTransformer(
    llm=llm,
    allowed_nodes=["City", "Attraction", "Event"],
    allowed_relationships=["CITY_HAS_ATTRACTION", "CITY_HAS_EVENT"],
    node_properties=["city_description", "attraction_description", "city_population",
                     "attraction_type", "climate_type", "event_date", "event_description"],
    additional_instructions="""
You are an expert in extracting entites about a city, which can help in boosting tourism.
All Entites should be in Title Case.

`Attraction` should be a name of the tourist spot.
`Event` should be a cultural event that takes place in the city. Look at description and Date
`city_population` should be numeric and latest population of the city

`attraction_type` - Examples: Museum, Zoo, etc.
`Climate Type` - Possible values: Tropical, Dry, Temperate, Continental, Polar
**Important** - Extract only the City which is there in User Query. Do not extract any other City
"""
)


In [None]:
import os
os.chdir("/mnt/c/Users/ABC/ArangoGraph/air")
os.listdir()


In [None]:
# Nodes are the Airports - Iterate
from langchain_community.retrievers import WikipediaRetriever
from tqdm import tqdm
import json
import random
cntr = 0
retriever = WikipediaRetriever(top_k_results = 1)
all_vertices = []
all_edges = []
all_nodes = []

In [None]:
top_airport = ['ATL', 'M52', 'JFK', 'LAX', 'DFW', 'DEN', 'ORD', 'MCO', 'LAS', 'MIA', 'SEA', 'EWR', 'SFO', 'IAH', 'DCA', 'BWI', 'SAN', 'MDW', 'AUS' ]
for airport in tqdm(nodes):

    try:
        search_term = (f"{airport[1]['city']}, {airport[1]['state']}")
        # print(all_nodes, search_term, airport[0])
        if search_term not in all_nodes:

            city_id = ''



            all_nodes.append(search_term)
            inv = f"City of {search_term.strip()}"
            # print(inv)
            documents = retriever.invoke(inv)
            # print(documents[0].page_content)
            if len(documents) > 0:
                graph_documents_filtered = llm_transformer_filtered.convert_to_graph_documents(
                [documents[0]]
                )
                summary = ''
                try:
                    summary = documents[0].metadata['summary']
                except:
                    pass
                # print(graph_documents_filtered)
                city_name = re.sub('[^a-zA-Z0-9]+', '', str(search_term)).upper()
                for n in graph_documents_filtered[0].nodes:
                    # print(n)
                    n_id = re.sub('[^a-zA-Z0-9]+', '', str(n.id)).upper()
                    if n.type == 'City':

                        n.properties['description'] = summary
                        n.properties['city_name'] = search_term
                        n_id = city_name
                        all_edges.append({
                        "name": "CITY_HAS_AIRPORT",
                        "edge_collection": "CITY_HAS_AIRPORT",
                        "_from": n.type.upper() + "/" + city_name,
                        "_to":  airport[0],
                        })
                    elif n.type == 'Attraction':
                        n.properties['attraction_name'] = str(n.id)
                    elif n.type == 'Event':
                        n.properties['event_name'] = str(n.id)


                    all_vertices.append({
                        "name": n.type.upper(),
                        "collection": n.type.upper(),
                        "_id": n.type.upper() + "/" + n_id,
                        "_key": n_id,
                        **n.properties
                    })

                for n in graph_documents_filtered[0].relationships:
                    # print(n.source, n.target, n.type)
                    n_s_id = re.sub('[^a-zA-Z0-9]+', '', str(n.source.id)).upper()
                    n_t_id = re.sub('[^a-zA-Z0-9]+', '', str(n.target.id)).upper()
                    if n.source.type.upper() == 'CITY':
                        n_s_id = city_name
                    all_edges.append({
                        "name": n.type,
                        "edge_collection": n.type,
                        "_from": n.source.type.upper() + "/" + n_s_id,
                        "_to":  n.target.type.upper() + "/" + n_t_id,
                        **n.properties
                    })

    except:
        pass


    if cntr % 20 == 0:
        with open('all_nodes.json', 'w') as fp:
            json.dump(all_nodes, fp)
        with open('all_vertices.json', 'w') as fp:
            json.dump(all_vertices, fp)
        with open('all_edges.json', 'w') as fp:
            json.dump(all_edges, fp)


    cntr += 1







  lis = BeautifulSoup(html).find_all('li')
100%|██████████| 3375/3375 [4:59:13<00:00,  5.32s/it]  


In [None]:
import json
# with open('all_nodes.json', 'r') as fp:
#             all_nodes = json.load(fp)
# with open('all_vertices.json', 'r') as fp:
#     all_vertices = json.load( fp)
# with open('all_edges.json', 'r') as fp:
#     all_edges = json.load(fp)
len(all_vertices), len(all_edges)

(2881, 2802)

In [None]:
# Import to Arango DB
# Create Collections, Delete if exists
import pandas as pd

vertices_df = pd.DataFrame(all_vertices)

edges_df = pd.DataFrame(all_edges)

vertice_names = list(vertices_df['name'].unique())
edge_names  = list(edges_df['name'].unique())
for collection_name in vertice_names:
    db.delete_collection(collection_name, ignore_missing=True)
    db.create_collection(collection_name, edge=False )
for collection_name in edge_names:
    db.delete_collection(collection_name, ignore_missing=True)
    db.create_collection(collection_name, edge=True)

edges_df = edges_df.fillna('')
vertices_df = vertices_df.fillna('')
vertices_df['_key'].unique(), vertices_df['_id'].unique()

(array(['WICHITAKS', 'OLDCOWTOWNMUSEUM', 'COLUMBUSMT', ...,
        'SPANISHFORKUT', 'HOLLYWOODFL', 'HOLLYWOODBEACHHOTEL'],
       dtype=object),
 array(['CITY/WICHITAKS', 'ATTRACTION/OLDCOWTOWNMUSEUM', 'CITY/COLUMBUSMT',
        ..., 'CITY/SPANISHFORKUT', 'CITY/HOLLYWOODFL',
        'ATTRACTION/HOLLYWOODBEACHHOTEL'], dtype=object))

In [None]:
# Import the Document Collection
for collection_name in tqdm(vertice_names):
    data = pd.DataFrame(vertices_df[vertices_df['name'] == collection_name]).drop('name', axis = 1)
    # print(data.head())
    if collection_name == 'CITY':
        data = data[['_id', '_key', 'collection', 'city_name', 'description', 'climate_type',
                     'description' ]]
    elif collection_name == 'ATTRACTION':
        data = data[['_id', '_key', 'collection','attraction_name', 'attraction_type', 'attraction_description']]
    elif collection_name == 'EVENT':
        data = data[['_id', '_key', 'collection','event_name', 'event_date', 'event_description']]


    docs = data.to_dict(orient='records')
    print(docs[0])
    db.collection(collection_name).import_bulk(docs, batch_size=5000, on_duplicate="ignore", halt_on_error=False)


  docs = data.to_dict(orient='records')


{'_id': 'CITY/WICHITAKS', '_key': 'WICHITAKS', 'collection': 'CITY', 'city_name': 'Wichita, KS', 'description': 'Wichita (  WITCH-ih-taw) is the most populous city in the U.S. state of Kansas and the county seat of Sedgwick County.  As of the 2020 census, the population of the city was 397,532, and the Wichita metro area had a population of 647,610.  It is located in south-central Kansas along the Arkansas River.\nWichita began as a trading post on the Chisholm Trail in the 1860s and was incorporated as a city in 1870. It became a destination for cattle drives traveling north from Texas to Kansas railroads, earning it the nickname "Cowtown".  In 1875, Wyatt Earp served as a police officer in Wichita for about one year before going to Dodge City.\nIn the 1920s and 1930s, businessmen and aeronautical engineers established aircraft manufacturing companies in Wichita, including Beechcraft, Cessna, and Stearman Aircraft. The city became an aircraft production hub known as "The Air Capital o

 33%|███▎      | 1/3 [00:05<00:10,  5.00s/it]

{'_id': 'ATTRACTION/OLDCOWTOWNMUSEUM', '_key': 'OLDCOWTOWNMUSEUM', 'collection': 'ATTRACTION', 'attraction_name': 'Old Cowtown Museum', 'attraction_type': 'Museum', 'attraction_description': "Maintains historical artifacts and exhibits the city's early history."}


 67%|██████▋   | 2/3 [00:06<00:02,  2.99s/it]

{'_id': 'EVENT/BATTLEOFYAZOOCITY', '_key': 'BATTLEOFYAZOOCITY', 'collection': 'EVENT', 'event_name': 'Battle Of Yazo City', 'event_date': '1864-03-05', 'event_description': 'A battle between Union troops and Confederates.'}


100%|██████████| 3/3 [00:06<00:00,  2.33s/it]


In [None]:
# Import the Edges
for collection_name in tqdm(edge_names):
    data = pd.DataFrame(edges_df[edges_df['name'] == collection_name]).drop('name', axis = 1)
    data = data[["_from", "_to", "edge_collection"]]
    docs = data.to_dict(orient='records')
    print(collection_name, docs[0])
    db.collection(collection_name).import_bulk(docs, batch_size=5000, on_duplicate="ignore", halt_on_error=False)


  0%|          | 0/3 [00:00<?, ?it/s]

CITY_HAS_AIRPORT {'_from': 'CITY/WICHITAKS', '_to': 'airports/ICT', 'edge_collection': 'CITY_HAS_AIRPORT'}


 33%|███▎      | 1/3 [00:00<00:01,  1.15it/s]

CITY_HAS_ATTRACTION {'_from': 'CITY/WICHITAKS', '_to': 'ATTRACTION/OLDCOWTOWNMUSEUM', 'edge_collection': 'CITY_HAS_ATTRACTION'}


100%|██████████| 3/3 [00:01<00:00,  1.64it/s]

CITY_HAS_EVENT {'_from': 'CITY/YAZOOCITYMS', '_to': 'EVENT/BATTLEOFYAZOOCITY', 'edge_collection': 'CITY_HAS_EVENT'}





In [75]:
edge_definitions = [
            {
                "edge_collection": "flights",
                "from_vertex_collections": ["airports"],
                "to_vertex_collections": ["airports"]
            },
            {
                "edge_collection": "CITY_HAS_ATTRACTION",
                "from_vertex_collections": ["CITY"],
                "to_vertex_collections": ["ATTRACTION"]
            },
            {
                "edge_collection": "CITY_HAS_AIRPORT",
                "from_vertex_collections": ["CITY"],
                "to_vertex_collections": ["airports"]
            }
]
db.delete_graph('Tourism', ignore_missing=True, drop_collections=False)
db.create_graph('Tourism', edge_definitions)

<Graph Tourism>

## 2.3 Generating Vector Embeddings for Semantic Search


In [74]:
from langchain_openai import OpenAIEmbeddings

from langchain_ollama import OllamaEmbeddings

# Alternatively you can use Ollama locally
embeddings = OllamaEmbeddings(
    model="snowflake-arctic-embed:110m",
)

embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002"
)

In [75]:
index_document = faiss.IndexFlatL2(len(embeddings.embed_query("hello world")))

vector_store_document = FAISS(
    embedding_function=embeddings,
    index=index_document,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

In [76]:
from uuid import uuid4

from langchain_core.documents import Document

documents = []
for n in all_vertices:
    if n['name'] == 'CITY':
        print(n)
        if 'description' in n:
            des = n['city_name'] + ' - ' + n['description']
        if 'city_description' in n:
            des = n['city_name'] + ' - ' + n['city_description']
    elif n['name'] == 'ATTRACTION':

        des = n['attraction_name']
        if 'attraction_description' in n:
            des+= ' - ' + n['attraction_description']
    elif n['name'] == 'EVENT':
        des = n['event_name']
        if 'event_description' in n:
            des+= ' - ' + n['event_description']
    document_1 = Document(
        page_content=des[0:512],
        metadata={"source": n['_id'], 'type': n['name']},
    )
    documents.append(document_1)

uuids = [str(uuid4()) for _ in range(len(documents))]
vector_store_document.add_documents(documents=documents, ids=uuids)

{'name': 'CITY', 'collection': 'CITY', '_id': 'CITY/WICHITAKS', '_key': 'WICHITAKS', 'city_population': '397532', 'climate_type': 'Temperate', 'description': 'Wichita (  WITCH-ih-taw) is the most populous city in the U.S. state of Kansas and the county seat of Sedgwick County.  As of the 2020 census, the population of the city was 397,532, and the Wichita metro area had a population of 647,610.  It is located in south-central Kansas along the Arkansas River.\nWichita began as a trading post on the Chisholm Trail in the 1860s and was incorporated as a city in 1870. It became a destination for cattle drives traveling north from Texas to Kansas railroads, earning it the nickname "Cowtown".  In 1875, Wyatt Earp served as a police officer in Wichita for about one year before going to Dodge City.\nIn the 1920s and 1930s, businessmen and aeronautical engineers established aircraft manufacturing companies in Wichita, including Beechcraft, Cessna, and Stearman Aircraft. The city became an aircr

['2da74a6b-e07a-4d60-9635-100c696bdc29',
 'af282568-dce6-43bc-8c4f-6f293c5f26a8',
 '7953bc29-c61b-4ee8-bedb-520dae00ae88',
 '036c5d52-ecf5-49c0-866e-543880de8057',
 '80611138-5b77-4b76-845b-b7ca4a9f2b75',
 '991e4ba6-747e-4022-9ac2-5a4aae9b0148',
 'a6474de5-4b7d-40ae-9764-b07a8e1dd9da',
 '07cfc718-01ca-4651-8a30-c57e179d3295',
 '7fe4630a-0f85-4803-a97d-04419e339a6a',
 'f5b42612-b2da-4f50-951a-7b852a70aba8',
 '7477b2f4-4aaa-46eb-83d2-d4d0d796caa6',
 'df161fc7-de6d-455b-9905-c5e53819854a',
 'ea893d9d-01be-4462-b031-ddb43dd8d3ef',
 '2b0139fa-5597-4e05-914c-1b71ea3ad885',
 '4a54e23e-eb37-4d13-8305-81f27d724176',
 'a17852d1-951e-4a1a-a174-f82472ce38b4',
 '77524813-aadd-4019-9276-5a5706f0db01',
 '2819efc3-4dc3-4e14-9fa6-9b7455df0332',
 '04fadde5-9629-4139-8d46-42116f848fdb',
 'f3c0d1ed-91e4-4175-9ad4-46f0f7a206ea',
 '1ef688a9-2757-4619-8953-e859ae0192b7',
 '5541d9f6-6613-4101-8c84-f6df0e3a6e47',
 'bf8ac45b-c6bf-4350-9af0-1d695b59dd4a',
 'd1cde1a3-78c9-444c-ac12-491ba8379426',
 '16ba6b79-59da-

In [78]:
# vector_store.get_by_ids([ 'e4fec14d-867b-4cae-bcc8-1733fe7d268d',
#                          ]),
vector_store_document.search('San Francisco, CA', search_type='similarity')#, len(documents), len(vector_store_doc.index_to_docstore_id)


[Document(id='13a95ce3-e5f9-4426-a81c-db358beb8596', metadata={'source': 'ATTRACTION/SANFRANCISCOINTERNATIONALAIRPORT', 'type': 'ATTRACTION'}, page_content='San Francisco International Airport'),
 Document(id='e8152499-a753-4044-bdeb-b2bb4111216a', metadata={'source': 'CITY/SANFRANCISCOCA', 'type': 'CITY'}, page_content='San Francisco, CA - San Francisco, officially the City and County of San Francisco, is a commercial, financial, and cultural center within Northern California. With a population of 808,988 residents as of 2023, San Francisco is the fourth-most populous city in the state of California and the 17th-most populous in the United States. It covers a land area of 46.9 square miles (121 square kilometers) at the upper end of the San Francisco Peninsula, making it the second-most densely populated major U.S. city an'),
 Document(id='ae0735a0-437b-4456-8fb5-0bfececf6cf6', metadata={'source': 'ATTRACTION/CALIFORNIASTATEUNIVERSITYSANBERNARDINO', 'type': 'ATTRACTION'}, page_content

In [78]:
index_nodes = faiss.IndexFlatL2(len(embeddings.embed_query("hello world")))

vector_store_nodes = FAISS(
    embedding_function=embeddings,
    index=index_nodes,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

from uuid import uuid4

from langchain_core.documents import Document

documents = []
for n in all_vertices:
    if n['name'] == 'CITY':
            des = n['city_name'] + ' - ' + n['_id']
    elif n['name'] == 'ATTRACTION':
        des = n['attraction_name'] + " - " + n['_id']
    elif n['name'] == 'EVENT':
        des = n['event_name'] + ' - ' + n['_id']
    document_1 = Document(
        page_content=des[0:512],
        metadata={"source": n['_id'], 'type': n['name']},
    )
    documents.append(document_1)

uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store_nodes.add_documents(documents=documents, ids=uuids)

['1767f174-42c9-4bb5-a0f3-62938aa2aa9f',
 'e8bcca17-26f3-4484-b49a-d70e979685fe',
 'e86fcf6d-34d5-40b2-9150-bac0d6382e80',
 'e24a7b35-a8e2-4584-8cd3-afd86ca9515a',
 '8f3cce84-ee20-445e-b108-fa68a711fc49',
 '5536ff48-10f4-470e-8898-8e456867eb41',
 'e1e67856-06cb-45fe-831c-5892f6b1194a',
 '0ff65e92-9406-4509-9c91-7106124fca85',
 '4456614c-370d-42dc-85ef-40b6fb8f42a2',
 'fa352708-b5b1-4df2-b5d2-77441c27a89b',
 '4f21d3ab-411e-45dc-971c-1dca3ba131de',
 '35615e28-8bf1-4854-8aa9-c88104ccddc2',
 '86bc87ba-3e5a-49f2-9395-9ee635ee5e1a',
 '3c25eb23-cb8e-4584-80f9-e4c6fd71ddd0',
 '06250a68-351a-4282-afb3-02bf215e067f',
 'dd6e5f56-5bfa-4586-b106-233fd8c61ffb',
 'b885760a-8269-4ad6-9458-be1fa4259b30',
 'f18fcaf3-f95c-43c5-8509-bcc6f207110e',
 'd568019e-9b1a-4a48-8983-54d835ff85fb',
 '28fd87df-9bc8-4079-a91b-6cf2596e2205',
 '9964ce9c-f696-4df6-905f-063cccd18b87',
 '0722c77a-e128-4e6c-95b4-425a2fabfcb7',
 'c7fc2b64-dafa-402b-a45d-43bf666574a6',
 '32f3c22c-f23c-4c3e-a4d9-c62110502e45',
 'e8d82039-060d-

In [86]:
def stop_word_remover(text):
    word_tokens = word_tokenize(text)
    word_list = [re.sub('[^A-Za-z0-9]+', '', word) for word in word_tokens if word.lower() not in stopwords]
    return  " ".join(word_list).strip()

retriever = vector_store_nodes.as_retriever(search_type="similarity", search_kwargs={"k": 6,'filter': {'type':'CITY'}})
# retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3,'filter': {'type':'ATTRACTION'}})
# retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3,'filter': {'type':'EVENT'}})
q = stop_word_remover('  New York, NY')
print(q)
retriever.invoke(q)

New York  NY


[Document(id='6484eb08-561d-4691-a4b5-03adcace82bb', metadata={'source': 'CITY/NEWYORKNY', 'type': 'CITY'}, page_content='New York, NY - CITY/NEWYORKNY'),
 Document(id='248f6c52-ba82-4f2f-a07d-d831ca3007b6', metadata={'source': 'CITY/ALBANYNY', 'type': 'CITY'}, page_content='Albany, NY - CITY/ALBANYNY'),
 Document(id='b0456ab1-6070-4b8a-b38c-d5607c9988db', metadata={'source': 'CITY/NEWBURGHNY', 'type': 'CITY'}, page_content='Newburgh, NY - CITY/NEWBURGHNY'),
 Document(id='631b1917-b873-4f5b-ae42-1da5fc373ab0', metadata={'source': 'CITY/SHIRLEYNY', 'type': 'CITY'}, page_content='Shirley, NY - CITY/SHIRLEYNY'),
 Document(id='fca037ad-4a12-4e65-b5f3-3dca50d9ae7d', metadata={'source': 'CITY/HAMBURGNY', 'type': 'CITY'}, page_content='Hamburg, NY - CITY/HAMBURGNY'),
 Document(id='794d4899-91d6-4973-937e-7216e22abdf9', metadata={'source': 'CITY/HAMILTONNY', 'type': 'CITY'}, page_content='Hamilton, NY - CITY/HAMILTONNY')]

In [83]:
vector_store_document.save_local("voyage_faiss_index")
vector_store_nodes.save_local("voyage_faiss_index_nodes")

# 3. Connect to Graph


In [3]:
# 1. Re-connect to the same Graph

G_adb = nxadb.MultiDiGraph(name="Tourism", db=db,
                            read_parallelism=20, read_batch_size=500000)
# Pre-Fetch all data
# This will load all nodes
nodes = list(G_adb.nodes(data=True))

# This will load all edges Nodes:  

edges = list(G_adb.edges(data=True))
print('Nodes: ', len(nodes), '; Edges: ', len(edges))

[05:45:29 +0000] [INFO]: Graph 'Tourism' exists.
[05:45:29 +0000] [INFO]: Default node type set to 'ATTRACTION'


{'vertex_collections': [{'name': 'ATTRACTION', 'fields': []}, {'name': 'CITY', 'fields': []}, {'name': 'airports', 'fields': []}], 'edge_collections': [], 'database_config': {'endpoints': ['https://a02eeb6a1295.arangodb.cloud:8529'], 'database': '_system', 'username': 'root', 'password': 'mnnqy0EBZqib4PHiDel8'}, 'load_config': {'parallelism': 20, 'batch_size': 500000, 'prefetch_count': 5, 'load_all_vertex_attributes': True, 'load_all_edge_attributes': False}}
{'load_adj_dict': False, 'load_coo': False, 'is_directed': False, 'is_multigraph': False, 'symmetrize_edges_if_directed': False}
{'vertex_collections': [{'name': 'ATTRACTION', 'fields': []}, {'name': 'CITY', 'fields': []}, {'name': 'airports', 'fields': []}], 'edge_collections': [{'name': 'flights', 'fields': []}, {'name': 'CITY_HAS_AIRPORT', 'fields': []}, {'name': 'CITY_HAS_ATTRACTION', 'fields': []}], 'database_config': {'endpoints': ['https://a02eeb6a1295.arangodb.cloud:8529'], 'database': '_system', 'username': 'root', 'passw

In [4]:
# 2. Define the llm object
# We use GPT-4o-mini
import os

llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
arango_graph = ArangoGraph(db)


### Define a Semantic Search Graph

In [7]:
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
from uuid import uuid4

from langchain_core.documents import Document

from langchain_openai import OpenAIEmbeddings

# from langchain_ollama import OllamaEmbeddings

# embeddings = OllamaEmbeddings(
#     model="snowflake-arctic-embed:110m",
# )

embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002"
)

vector_store = FAISS.load_local(
    "voyage_faiss_index", embeddings, allow_dangerous_deserialization=True
)
vector_store_document = vector_store

vector_store_nodes = FAISS.load_local(
    "voyage_faiss_index_nodes", embeddings, allow_dangerous_deserialization=True
)

retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retriever_nodes = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 6})


stop_words = ['what', 'place', 'visit', 'event', 'city', 'airport', 'where']
rag_prompt = hub.pull("rlm/rag-prompt")

def stop_word_remover(text):
    word_tokens = word_tokenize(text)
    word_list = [re.sub('[^A-Za-z0-9]+', '', word) for word in word_tokens if word.lower() not in stopwords]
    return  " ".join(word_list).strip()
# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    nearby_nodes: str
    answer: str



# Define application steps
def retrieve(state: State):
    query = stop_word_remover(state["question"])

    retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 10,'filter': {'type':'CITY'}})
    retrieved_docs = retriever.invoke(query)
    retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3,'filter': {'type':'ATTRACTION'}})
    retrieved_docs.extend(retriever.invoke(query))
    retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3,'filter': {'type':'EVENT'}})
    retrieved_docs.extend(retriever.invoke(query))


    # print(len(retrieved_docs))
    nearby_nodes = []
    for r in retrieved_docs:
        try:
            node_id = (r.metadata['source'])
            print(node_id)
            if node_id[0:4] == 'CITY':
                aql = f""" FOR city IN CITY
                    FILTER city._id == "{node_id}"

                    LET airports = (
                        FOR v, e IN 1..1 OUTBOUND city._id GRAPH 'Tourism'
                        FILTER e.edge_collection == 'CITY_HAS_AIRPORT'
                        RETURN {{
                            format: CONCAT(city._id, " -> ", e.edge_collection, " -> ", v._id, '\ncity_name of ', city._id, ' is ', city.city_name)
                        }}
                    )

                    LET attractions = (
                        FOR v, e IN 1..1 OUTBOUND city._id GRAPH 'Tourism'
                        FILTER e.edge_collection == 'CITY_HAS_ATTRACTION'
                        RETURN {{
                            format: CONCAT(city._id, " -> ", e.edge_collection, " -> ", v._id, '\nNote: attraction_name of ', v._id, ' is ', v.attraction_name, '; attaction_description is ', v.attraction_description)
                        }}
                    )
                    LET events = (
                        FOR v, e IN 1..1 OUTBOUND city._id GRAPH 'Tourism'
                        FILTER e.edge_collection == 'CITY_HAS_EVENT'
                        RETURN {{
                            format: CONCAT(city._id, " -> ", e.edge_collection, " -> ", v._id, '\nNote: event_name of ', v._id, ' is ', v.event_name, '; event_description is ', v.event_description)
                        }}
                    )

                    RETURN {{
                        format: APPEND(airports, attractions)
                    }}
                """
                cursor = db.aql.execute(aql)
                # Print the results
                for doc in cursor:
                    for f in (doc['format']):
                        nearby_nodes.extend(f['format'].split('\n'))
        except Exception as e:
            print(str(e))
            pass
    nearby_nodes = sorted(nearby_nodes)

    return {"context": retrieved_docs, 'nearby_nodes': '\n'.join(nearby_nodes)}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content.strip() for doc in state["context"])
    rag_content = f""" **Information from Documents:
{docs_content}
** Information from Nearby Nodes of Data:
{state['nearby_nodes']}"""
    print(rag_content)
    messages = rag_prompt.invoke({"question": state["question"], "context": rag_content })
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
response = graph.invoke({"question": "Find places to visit in San Francisco.?"})
response['answer']



CITY/SANFRANCISCOCA
CITY/ANGWINCA
CITY/SANJOSECA
ATTRACTION/SANFRANCISCOINTERNATIONALAIRPORT
ATTRACTION/YOSEMITENATIONALPARK
ATTRACTION/PARKSANDTRAILS
 **Information from Documents:
San Francisco, CA - San Francisco, officially the City and County of San Francisco, is a commercial, financial, and cultural center within Northern California. With a population of 808,988 residents as of 2023, San Francisco is the fourth-most populous city in the state of California and the 17th-most populous in the United States. It covers a land area of 46.9 square miles (121 square kilometers) at the upper end of the San Francisco Peninsula, making it the second-most densely populated major U.S. city an

Angwin, CA - The following airports are in the area around the San Francisco Bay, including the cities of San Jose, San Francisco, and Oakland.  The list includes only public-use and/or government-owned airports in the eleven counties (the nine counties that border the bay, plus Santa Cruz and San Benit

'In San Francisco, you can visit iconic attractions such as Alcatraz Island, the Golden Gate Bridge, and catch a game with the Golden State Warriors or the San Francisco Giants.'

In [8]:
graph_schema = { 'Graph Schema': [g for g in arango_graph.schema['Graph Schema'] if g['graph_name'] == 'Tourism'], \
'Collection Schema': [g for g in arango_graph.schema['Collection Schema'] \
                        if g['collection_name'] in ['CITY_HAS_ATTRACTION', 'CITY_HAS_AIRPORT', 'CITY_HAS_EVENT',
                                                    'ATTRACTION', 'CITY', 'EVENT', 'airports', 'flights' ]]}
graph_schema

{'Graph Schema': [{'graph_name': 'Tourism',
   'edge_definitions': [{'edge_collection': 'CITY_HAS_AIRPORT',
     'from_vertex_collections': ['CITY'],
     'to_vertex_collections': ['airports']},
    {'edge_collection': 'CITY_HAS_ATTRACTION',
     'from_vertex_collections': ['CITY'],
     'to_vertex_collections': ['ATTRACTION']},
    {'edge_collection': 'flights',
     'from_vertex_collections': ['airports'],
     'to_vertex_collections': ['airports']}]}],
 'Collection Schema': [{'collection_name': 'CITY',
   'collection_type': 'document',
   'document_properties': [{'name': '_key', 'type': 'str'},
    {'name': '_id', 'type': 'str'},
    {'name': '_rev', 'type': 'str'},
    {'name': 'collection', 'type': 'str'},
    {'name': 'city_name', 'type': 'str'},
    {'name': 'description', 'type': 'str'},
    {'name': 'climate_type', 'type': 'str'}],
   'example_document': {'_key': 'WICHITAKS',
    '_id': 'CITY/WICHITAKS',
    '_rev': '_jVU9FFq---',
    'collection': 'CITY',
    'city_name': 'Wi

In [9]:
# Context Manager
# To Store Outputs across Function calls
class ContextManager:
    def __init__(self):
        self.context = {}

    def clear(self):
        self.context = {}

    def store_context(self, key, value):
        self.context[key] = value

    def get_context(self, key):
        return self.context.get(key, "No context available")

context_manager = ContextManager()


In [10]:
# Modified AQL Template, to be used after NetworkX Call
AQL_GENERATION_TEMPLATE = """Task: Generate an ArangoDB Query Language (AQL) query from a User Input and Results of a NetworkX Algorithm.

You are an ArangoDB Query Language (AQL) expert responsible for translating a `User Input` and NetworkX Algorithm Results into an ArangoDB Query Language (AQL) query.

You are given an `ArangoDB Schema`. It is a JSON Object containing:
1. `Graph Schema`: Lists all Graphs within the ArangoDB Database Instance, along with their Edge Relationships.
2. `Collection Schema`: Lists all Collections within the ArangoDB Database Instance, along with their document/edge properties and a document/edge example.

You are also given a `NetworkX Algorithm Results`. It will be a dictionary which has been calculated previously. You need to use that to create the AQL.
You may also be given a set of `AQL Query Examples` to help you create the `AQL Query`. If provided, the `AQL Query Examples` should be used as a reference, similar to how `ArangoDB Schema` should be used.

Things you should do:
- Think step by step.
- Rely on `ArangoDB Schema`, `NetworkX Algorithm Results` and `AQL Query Examples` (if provided) to generate the query.
- Begin the `AQL Query` by the `WITH` AQL keyword to specify all of the ArangoDB Collections required.
- Return the `AQL Query` wrapped in 3 backticks (```).
- Use only the provided relationship types and properties in the `ArangoDB Schema` and any `AQL Query Examples` queries.
- Only answer to requests related to generating an AQL Query.
- If a request is unrelated to generating AQL Query, say that you cannot help the user.

Things you should not do:
- Do not use any properties/relationships that can't be inferred from the `ArangoDB Schema` or `NetworkX Algorithm Results` or the `AQL Query Examples`.
- Do not include any text except the generated AQL Query.
- Do not provide explanations or apologies in your responses.
- Do not generate an AQL Query that removes or deletes any data.

Under no circumstance should you generate an AQL Query that deletes any data whatsoever.

ArangoDB Schema:
{adb_schema}


AQL Query Examples (Optional):
{aql_examples}

User Input:
{user_input}

AQL Query:
"""

In [11]:
def get_context(query: str):

    retrieved_docs = ["# Example Nodes:\n*Important*: Only use Examples for Filtering the nodes. Do not use the data to create new Graphs\n\n## Here are few examples of related City Nodes. Format: City Name - _id  \n"]
    retriever = vector_store_nodes.as_retriever(search_type="similarity", search_kwargs={"k": 6,'filter': {'type':'CITY'}})
    city_nodes = retriever.invoke(query)
    retrieved_docs.extend([r.page_content for r in city_nodes])
    # print(retrieved_docs)
    retrieved_docs.extend(["\n## Here are few examples of related Attraction Nodes. Format: Attraction Name - _id  \n"])
    retriever = vector_store_nodes.as_retriever(search_type="similarity", search_kwargs={"k": 3,'filter': {'type':'ATTRACTION'}})
    retrieved_docs.extend([r.page_content for r in retriever.invoke(query)])
    retrieved_docs.extend(["\n## Here are few examples of related Event Nodes. Format: Event Name - _id  \n"])
    retriever = vector_store_nodes.as_retriever(search_type="similarity", search_kwargs={"k": 3,'filter': {'type':'EVENT'}})
    retrieved_docs.extend([r.page_content for r in retriever.invoke(query)])
    # print(len(retrieved_docs))
    nearby_nodes = []
    for r in city_nodes:
        try:
            node_id = (r.metadata['source'])
            # print(node_id)
            if node_id[0:4] == 'CITY':
                aql = f""" FOR city IN CITY
                    FILTER city._id == "{node_id}"

                    LET airports = (
                        FOR v, e IN 1..1 OUTBOUND city._id GRAPH 'Tourism'
                        FILTER e.edge_collection == 'CITY_HAS_AIRPORT'
                        RETURN {{
                            format: CONCAT(city._id, " -> ", e.edge_collection, " -> ", v._id)
                        }}
                    )

                    LET attractions = (
                        FOR v, e IN 1..1 OUTBOUND city._id GRAPH 'Tourism'
                        FILTER e.edge_collection == 'CITY_HAS_ATTRACTION'
                        RETURN {{
                            format: CONCAT(city._id, " -> ", e.edge_collection, " -> ", v._id)
                        }}
                    )
                    LET events = (
                        FOR v, e IN 1..1 OUTBOUND city._id GRAPH 'Tourism'
                        FILTER e.edge_collection == 'CITY_HAS_EVENT'
                        RETURN {{
                            format: CONCAT(city._id, " -> ", e.edge_collection, " -> ", v._id)
                        }}
                    )

                    RETURN {{
                        format: APPEND(airports, attractions)
                    }}
                """
                cursor = db.aql.execute(aql)
                # Print the results
                for doc in cursor:
                    for f in (doc['format']):
                        nearby_nodes.extend(f['format'].split('\n'))
        except Exception as e:
            # print(str(e))
            pass
    nearby_nodes = sorted(nearby_nodes)
    retrieved_docs.append('\n# Here are some nearby Nodes for reference:\n')
    retrieved_docs.extend(nearby_nodes)
    context = '\n'.join(retrieved_docs)
    print(context)
    return context
get_context('Shortest distance between New york and San Francisco ')

# Example Nodes:
*Important*: Only use Examples for Filtering the nodes. Do not use the data to create new Graphs

## Here are few examples of related City Nodes. Format: City Name - _id  

San Francisco, CA - CITY/SANFRANCISCOCA
New York, NY - CITY/NEWYORKNY
Albany, NY - CITY/ALBANYNY
San Carlos, CA - CITY/SANCARLOSCA
San Jose, CA - CITY/SANJOSECA
Shirley, NY - CITY/SHIRLEYNY

## Here are few examples of related Attraction Nodes. Format: Attraction Name - _id  

San Francisco International Airport - ATTRACTION/SANFRANCISCOINTERNATIONALAIRPORT
San Francisco Giants - ATTRACTION/SANFRANCISCOGIANTS
New York Times - ATTRACTION/NEWYORKTIMES

## Here are few examples of related Event Nodes. Format: Event Name - _id  


# Here are some nearby Nodes for reference:

CITY/ALBANYNY -> CITY_HAS_AIRPORT -> airports/ALB
CITY/ALBANYNY -> CITY_HAS_ATTRACTION -> ATTRACTION/FORTORANGE
CITY/NEWYORKNY -> CITY_HAS_AIRPORT -> airports/JRB
CITY/SANCARLOSCA -> CITY_HAS_AIRPORT -> airports/SQL
CITY/SANFRANCISC

'# Example Nodes:\n*Important*: Only use Examples for Filtering the nodes. Do not use the data to create new Graphs\n\n## Here are few examples of related City Nodes. Format: City Name - _id  \n\nSan Francisco, CA - CITY/SANFRANCISCOCA\nNew York, NY - CITY/NEWYORKNY\nAlbany, NY - CITY/ALBANYNY\nSan Carlos, CA - CITY/SANCARLOSCA\nSan Jose, CA - CITY/SANJOSECA\nShirley, NY - CITY/SHIRLEYNY\n\n## Here are few examples of related Attraction Nodes. Format: Attraction Name - _id  \n\nSan Francisco International Airport - ATTRACTION/SANFRANCISCOINTERNATIONALAIRPORT\nSan Francisco Giants - ATTRACTION/SANFRANCISCOGIANTS\nNew York Times - ATTRACTION/NEWYORKTIMES\n\n## Here are few examples of related Event Nodes. Format: Event Name - _id  \n\n\n# Here are some nearby Nodes for reference:\n\nCITY/ALBANYNY -> CITY_HAS_AIRPORT -> airports/ALB\nCITY/ALBANYNY -> CITY_HAS_ATTRACTION -> ATTRACTION/FORTORANGE\nCITY/NEWYORKNY -> CITY_HAS_AIRPORT -> airports/JRB\nCITY/SANCARLOSCA -> CITY_HAS_AIRPORT -> ai

In [12]:
# 4. Define all the Tools, including Re-try Mechanism
# Reference: https://python.langchain.com/docs/integrations/graphs/arangodb/
# Reference: https://python.langchain.com/api_reference/community/chains/langchain_community.chains.graph_qa.arangodb.ArangoGraphQAChain.html
from langchain_core.prompts.prompt import PromptTemplate

@tool
def semantic_search(query: str):
    """This tool is available to search Text the Descriptions of Cities and Tourist Attraction and Events.
    Do not use this Tool if you need AQL or Graph Networkx algorithm

      **Args:**
    - query: Query Asked by the User
    """
    response = graph.invoke({"question": query})
    return response['answer']

@tool
def text_to_aql(query: str):
    """This tool is available to invoke the
    ArangoGraphQAChain object, which enables you to
    translate a Natural Language Query into AQL and execute the AQL.

    Use this tool only when you need to execute a AQL Query to retrieve data from Graph Database.

    **Args:**
    - query: Query Asked by the User
    """
    try:
        context = get_context(query)
        llm = ChatOpenAI(temperature=0, model_name="gpt-4o")


        chain = ArangoGraphQAChain.from_llm(
            llm=llm,
            graph=arango_graph,
            verbose=True,
            allow_dangerous_requests=True,
            return_aql_query=True,
            return_aql_result = True
        )
        chain.return_aql_result = True



        result = chain.invoke(context + "\n\nQuery:\n" + query)

        context_manager.store_context("aql_result", result["aql_result"])
        return 'Data Stored in Memory. Ensure to set `use_aql_result` to True for further processing'
    except Exception as e:
        return f'An error occurred - {str(e)}. You can re-try the tool with more detailed input'

@tool
def text_to_aql_to_text(query: str):
    """This tool is available to invoke the
    ArangoGraphQAChain object, which enables you to
    translate a Natural Language Query into AQL, execute
    the query, and translate the result back into Natural Language.

    Use this tool only when you need to execute a single AQL Query to revert to the user

    **Args:**
    - query: Query Asked by the User
    """
    try:
        llm = ChatOpenAI(temperature=0, model_name="gpt-4o")


        context = get_context(query)

        llm = ChatOpenAI(temperature=0, model_name="gpt-4o")


        chain = ArangoGraphQAChain.from_llm(
            llm=llm,
            graph=arango_graph,
            verbose=True,
            allow_dangerous_requests=True,
            return_aql_query=True,
            return_aql_result = True
        )
        chain.return_aql_result = True



        result = chain.invoke(context + "\n\nQuery:\n" + query)

        context_manager.store_context("aql_text_result", result["result"])
        return str(result["result"])
    except Exception as e:
        return f'An error occurred - {str(e)}. You can re-try the tool with more detailed input'

# 5. Define the Text to NetworkX/cuGraph Tool
# Note: It is encouraged to experiment and improve this section! This is just a placeholder:

@tool
def text_to_nx_algorithm_to_text(query: str, use_aql_result: bool = False):
    """This tool is available to invoke a NetworkX Algorithm on
    the ArangoDB Graph. You are responsible for accepting the natural
    language query, determining the appropriate NetworkX algorithm to
    execute, running the algorithm on the provided `G_adb` graph, and
    translating the results back into natural language as a concise answer.

    If the query (e.g., traversals, shortest path, etc.) can be solved using
    Arango Query Language, then do not use this tool.

    **Args:**
    - query: Query Asked by the User
    - use_aql_result: True if previously the text_to_aql Tool had been called for retrieving data

    """

    try:
        llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
        G_query = G_adb
        context = get_context(query)


        base_query = context + f"""I have a NetworkX Graph called `G_query` with the following schema: {graph_schema}
        """

        if use_aql_result:
            graph_data = context_manager.get_context("aql_result")
            G_query = nx.MultiDiGraph(graph_data)
            base_query = context + f"""I have a NetworkX Graph called `G_query` with the following schema: {graph_schema}
        """

        ######################
        print("\n\n1) Generating NetworkX code - use_aql_result: " , use_aql_result)

        text_to_nx = llm.invoke(f"""{base_query}

        I need to answer the following graph analysis query: {query}.

        Determine the most precise NetworkX algorithm to answer this query and generate the Python code to do so.
        Think step by step. Only assume that networkx and standard Python libraries are available.
        You must prefer the algorithms supported by nx-cugraph. Use the most suitable backend.
        For community detectio, prefer leiden_communities or louvain_communities
        Only provide python code that I can directly execute via `exec()`. Do not provide any instructions.

        Note:
        - Use the `flights` edge collection for connections between two airports
        - Use the `CITY_HAS_AIRPORT` edge collection for identifying airport for a city
        - Use the `CITY_HAS_ATTACTION` edge collection for connection between a city and it's attraction
        - Always execute the code in try except block.

        Ensure your code:
        - Executes directly with exec().
        - Sets the final answer in a variable called FINAL_RESULT.
        - Assigns FINAL_RESULT a short, concise answer.

        Your code:
        """).content

        text_to_nx_cleaned = re.sub(r"^```python\n|```$", "", text_to_nx, flags=re.MULTILINE).strip()

        print('-'*10)
        print(text_to_nx_cleaned)
        print('-'*10)

        ######################
        print("\n2) Executing NetworkX code")

        global_vars = {"G_query": G_query, "nx": nx}
        local_vars = {}

        attempt = 1
        MAX_ATTEMPTS = 3
        last_exception = None

        while attempt <= MAX_ATTEMPTS:
            try:
                exec(text_to_nx_cleaned, global_vars, local_vars)
                # On success, capture the final generated code for later use.
                text_to_nx_final = text_to_nx
                break
            except Exception as e:
                print(f"EXEC ERROR on attempt {attempt}: {e}")
                last_exception = e
                # Ask the LLM to correct the generated code based on the error.
                text_to_nx = llm.invoke(f"""{base_query}
                    I need to answer the following graph analysis query: {query}.

                    I generated the following Python code to answer a NetworkX analysis query:
                    ---
                    {text_to_nx_cleaned}
                    ---
                    When executing this code on the graph `G_query`, it raised the following exception: {e}.
                    Please provide a corrected version of the code that resolves the error.
                    Only provide python code that I can directly execute via `exec()`. Do not provide any instructions or explaination.
                    **Important:** 
                    - Always Enclose the corrected code in a code block ```
                    - Always execute the code in try except block.

                    Your corrected code should execute directly via exec() and assign a concise answer to FINAL_RESULT.
                """).content

                text_to_nx_cleaned = re.sub(r"^```python\n|```$", "", text_to_nx, flags=re.MULTILINE).strip()
                print('-'*10)
                print(text_to_nx_cleaned)
                print('-'*10)

                attempt += 1

        if attempt > MAX_ATTEMPTS and "FINAL_RESULT" not in local_vars:
            return f"EXEC ERROR after {MAX_ATTEMPTS} attempts: {last_exception}"

        print('-'*10)
        FINAL_RESULT = local_vars["FINAL_RESULT"]
        print(f"FINAL_RESULT: {FINAL_RESULT}")
        print('-'*10)
        context_manager.store_context("networkx_result", FINAL_RESULT)

        ######################
        print("3) Formulating final answer")

        nx_to_text = llm.invoke(f"""
            I have a NetworkX Graph called `G_adb` with the following schema: {graph_schema}

            I received the following graph analysis query: {query}.

            I executed the following Python code:
            ---
            {text_to_nx_final}
            ---
            and obtained `FINAL_RESULT` with the value: {FINAL_RESULT}.

            Based on this information, generate a short and concise natural language answer to the query.

            Your response:
        """).content

        return nx_to_text
    except Exception as e:
        return f'An error occurred - {str(e)}. You can re-try the tool with more detailed input'


@tool
def use_networkx_result_in_aql(text_query):
    """This tool is available to invoke a AQL Query using the
    ArangoGraphQAChain object after a NetworkX Algorithm has been executed
    using the tool `text_to_nx_algorithm_to_text` on the ArangoDB Graph.

    If the query (e.g., traversals, shortest path, etc.) can be solved using
    Arango Query Language, then do not use this tool.

    **Args:**
    - text_query: Query which needs an answer

    """

    try:
        context = get_context(text_query)
        networkx_result = context_manager.get_context("networkx_result")
        print('networkx_result: ', networkx_result)

        llm = ChatOpenAI(temperature=0, model_name="gpt-4o")



        aql_generation_prompt = PromptTemplate(
        input_variables=["adb_schema", "aql_examples", "user_input", "networkx_result"],
        template=AQL_GENERATION_TEMPLATE,
        )

        chain =  ArangoGraphQAChain.from_llm(
            llm=llm,
            graph=arango_graph,
            aql_generation_prompt=aql_generation_prompt,
            verbose=True,
            allow_dangerous_requests=True
        )
        chain.return_aql_result = True

        result = chain.invoke(context + "\nQuery: \n" + text_query)
        result = chain.invoke({"user_input": f"None\n\nNetworkX Algorithm Results: \n{networkx_result}",
                                "query": context + "\nQuery: \n" + text_query })

        context_manager.store_context("networkx_aql_result", result["result"])
        return str(result["result"])
    except Exception as e:
        return f'An error occurred - {str(e)}. You can re-try the tool with more detailed input'



In [13]:
# 6. Create the Agentic Application

tools = [semantic_search, text_to_aql_to_text, text_to_nx_algorithm_to_text, text_to_aql, use_networkx_result_in_aql]
prompt = hub.pull("hwchase17/react")

print(prompt)
def query_graph(query):
    llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
    agent = create_react_agent(llm, tools, prompt)
    agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

    context_manager.clear()
    final_state = agent_executor.invoke({"input": query})
    return final_state['output']



input_variables=['agent_scratchpad', 'input', 'tool_names', 'tools'] input_types={} partial_variables={} metadata={'lc_hub_owner': 'hwchase17', 'lc_hub_repo': 'react', 'lc_hub_commit_hash': 'd15fe3c426f1c4b3f37c9198853e4a86e20c425ca7f4752ec0c9b0e97ca7ea4d'} template='Answer the following questions as best you can. You have access to the following tools:\n\n{tools}\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: {input}\nThought:{agent_scratchpad}'


In [80]:
query = "What is name of San Francisco airport and what all places can i visit?."
resp = query_graph(query)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo answer the question, I need to find the name of the San Francisco airport and the tourist attractions or places to visit in San Francisco. I will start by searching for the name of the San Francisco airport.

Action: semantic_search
Action Input: "San Francisco airport name"[0m **Information from Documents:
Angwin, CA - The following airports are in the area around the San Francisco Bay, including the cities of San Jose, San Francisco, and Oakland.  The list includes only public-use and/or government-owned airports in the eleven counties (the nine counties that border the bay, plus Santa Cruz and San Benito Counties) that make up the Census Bureau's San Jose–San Francisco–Oakland, CA Combined Statistical Area.

San Francisco, CA - San Francisco, officially the City and County of San Francisco, is a commercial, financial, and cultural center within Northern California. With a population of 808,988 residents as of 2023, San

## Simple Queries - Use Cases Answerable by AQL Queries



In [None]:
simple_queries = []
### Connectivity and Basic Path Finding
simple_queries.append("Which cities are directly connected by flights to San Francisco?")
simple_queries.append("What is the shortest flight path between San Francisco and New York?")
### Tourist Attraction Analysis
simple_queries.append("Which cities have the most tourist attractions?")
simple_queries.append("What are all the tourist attractions in a San Francisco?")
simple_queries.append("Which Top 5 cities have both airports and tourist attractions?")

### Airport and Flight Statistics
simple_queries.append("Which airports have the most incoming and outgoing flights?")
simple_queries.append("What is the average number of direct flight connections per airport?")

### Airport Connectivity Analysis
simple_queries.append("Which airports serve as major hubs with the most connections?")
simple_queries.append("Which cities have airports with direct flights to the most other cities?")

### Tourist Attraction Distribution
simple_queries.append("How many tourist attractions are there per city on average?")
simple_queries.append("Which cities have the highest ratio of tourist attractions to airports?")


In [101]:
simple_answers = []
for q in simple_queries:
  ans = ''
  try:
    ans = query_graph(q)
  except:
    ans = 'An Error Occurred in LLM'
    # Re-try
    try:
      ans = query_graph(q)
    except Exception as e:
      ans = 'An Error Occurred while executing the LLM Chain. Please re-try. Error: ' + str(e)
  simple_answers.append(ans)
  



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo determine which cities are directly connected by flights to San Francisco, I need to query the graph database to find the direct flight connections. This can be done using an AQL query to retrieve the relevant data.

Action: text_to_aql
Action Input: "Find cities directly connected by flights to San Francisco."[0m# Example Nodes:
*Important*: Only use Examples for Filtering the nodes. Do not use the data to create new Graphs

## Here are few examples of related City Nodes. Format: City Name - _id  

San Francisco, CA - CITY/SANFRANCISCOCA
San Jose, CA - CITY/SANJOSECA
Sacramento, CA - CITY/SACRAMENTOCA
Fresno, CA - CITY/FRESNOCA
San Carlos, CA - CITY/SANCARLOSCA
Santa Rosa, CA - CITY/SANTAROSACA

## Here are few examples of related Attraction Nodes. Format: Attraction Name - _id  

San Francisco International Airport - ATTRACTION/SANFRANCISCOINTERNATIONALAIRPORT
San Francisco Giants - ATTRACTION/SANFRANCISCOGIANTS
San Die

In [103]:
for q, a in zip(simple_queries, simple_answers):
  print('_' * 50)
  print('Question: ', q)
  print('Answer: ', a)


__________________________________________________
Question:  Which cities are directly connected by flights to San Francisco?
Answer:  There are currently no cities in the database that have direct flights to San Francisco.
__________________________________________________
Question:  What is the shortest flight path between San Francisco and New York?
Answer:  The shortest flight path from San Francisco to New York is a direct flight from San Francisco International Airport (SFO) to New York's Downtown Manhattan Heliport (JRB).
__________________________________________________
Question:  Which cities have the most tourist attractions?
Answer:  The city with the most tourist attractions is Corinth, MS, with 18 attractions. It is followed by Riverside, CA with 10 attractions. Millbrook, NY and Cross Keys, NJ each have 9 attractions. Several other cities, including Ironwood, MI, Hanksville, UT, Independence, KS, Gardner, KS, Ottawa, KS, and Olathe, KS, each have 8 attractions.
________

## Complex Queries - Use Cases Requiring NetworkX Algorithms

In [None]:
complex_queries = []
### Complex Network Analysis
complex_queries.append("What is the average clustering coefficient of the airport network?")
complex_queries.append("Which airports are the most central in the network based on betweenness centrality?")
complex_queries.append("Which airports are the most connected (highest degree centrality)?")
complex_queries.append("What is the most influential airport in the network based on passenger traffic?")

### Community Detection
complex_queries.append("Which airports naturally form regional airline clusters?")

complex_queries.append("Can we identify clusters of closely connected airports?")
complex_queries.append("Are there distinct communities of cities based on their flight connections?")

### Minimum Spanning Tree (Prim’s, Kruskal’s)
complex_queries.append("What is the minimal set of flight routes that can connect all airports efficiently?")

### Network Flow Analysis (Max Flow, Min Cut)
complex_queries.append("Where are the bottlenecks in the global airline network?")


In [93]:
complex_answers = []
for q in complex_queries:
  complex_answers.append(query_graph(q))
  



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo find the average clustering coefficient of the airport network, I need to use a NetworkX algorithm that can calculate this metric. The clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. I will use the `text_to_nx_algorithm_to_text` tool to perform this calculation.

Action: text_to_nx_algorithm_to_text
Action Input: Calculate the average clustering coefficient of the airport network.[0m# Example Nodes:
*Important*: Only use Examples for Filtering the nodes. Do not use the data to create new Graphs

## Here are few examples of related City Nodes. Format: City Name - _id  


## Here are few examples of related Attraction Nodes. Format: Attraction Name - _id  

Boise Airport - ATTRACTION/BOISEAIRPORT
Tri-Cities Airport - ATTRACTION/TRICITIESAIRPORT
Eagle Lake Regional Airport - ATTRACTION/EAGLELAKEREGIONALAIRPORT

## Here are few examples of related Event Nodes. Format: Even

In [94]:
for q, a in zip(complex_queries, complex_answers):
  print('_' * 50)
  print('Question: ', q)
  print('Answer: ', a)


__________________________________________________
Question:  What is the average clustering coefficient of the airport network?
Answer:  The average clustering coefficient of the airport network is 0.6450, indicating a relatively high level of interconnectedness among airports, with many being part of tightly-knit groups where direct flights are common.
__________________________________________________
Question:  Which airports are the most central in the network based on betweenness centrality?
Answer:  The most central airport in the network, based on betweenness centrality, is located in Fayetteville.
__________________________________________________
Question:  Which airports are the most connected (highest degree centrality)?
Answer:  The airport with the highest degree centrality, indicating it is the most connected, is Hartsfield-Jackson Atlanta International Airport (ATL).
__________________________________________________
Question:  What is the most influential airport in th

## Hybrid Queries - Use Cases Requiring Both Cypher Queries and NetworkX


In [107]:
hybrid_queries = []

### Advanced Route Planning
hybrid_queries.append("What is the most efficient multi-city tour covering the top 10 tourist attractions?")
hybrid_queries.append("Design a 5 City Tour which covers Museums")

### Comparative City Analysis
hybrid_queries.append("Which cities serve as the best hubs for both air travel and tourism?")

### Network Growth Prediction
hybrid_queries.append("Based on current network structure and tourist attraction distribution, which city pairs are most likely to establish new flight routes?")


In [104]:
hybrid_answers = []
for q in hybrid_queries:
  ans = ''
  try:
    ans = query_graph(q)
  except:
    ans = 'An Error Occurred in LLM'
    # Re-try
    try:
      ans = query_graph(q)
    except Exception as e:
      ans = 'An Error Occurred while executing the LLM Chain. Please re-try. Error: ' + str(e)
  hybrid_answers.append(ans)
  



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo determine the most efficient multi-city tour covering the top 10 tourist attractions, I need to identify the top 10 tourist attractions and then find the optimal route connecting these locations. This involves two steps: first, retrieving the top 10 tourist attractions, and second, calculating the most efficient route.

1. Retrieve the top 10 tourist attractions.
2. Calculate the most efficient route connecting these attractions.

I will start by retrieving the top 10 tourist attractions.

Action: semantic_search
Action Input: "top 10 tourist attractions"[0mATTRACTION/MEXICOCITYATTRACTIONS
ATTRACTION/LOUVRE
ATTRACTION/ROYALGORGE
 **Information from Documents:
Mexico City Attractions

Louvre - The most-visited art museum in the world.

Royal Gorge - A popular tourist destination known for its scenic views and outdoor activities.
** Information from Nearby Nodes of Data:

[36;1m[1;3mI don't know.[0m[32;1m[1;3mIt seems 



----------
FINAL_RESULT: `louvain_communities' is unable to convert graph from backend 'arangodb' to 'cugraph' backend, which was specified with the `backend='cugraph'` keyword argument. No other backends will be attempted, because the backend was specified with the `backend='cugraph'` keyword argument.
----------
3) Formulating final answer
[38;5;200m[1;3mThe analysis to identify city pairs most likely to establish new flight routes encountered an issue due to an incompatibility with the specified backend for community detection. The Louvain method could not convert the graph from the 'arangodb' backend to the 'cugraph' backend as specified. To resolve this, consider using a compatible backend or removing the backend specification to allow the algorithm to choose a suitable one automatically.[0m[32;1m[1;3mIt seems there was an issue with the backend compatibility for the community detection algorithm. To resolve this, I will attempt to analyze the network structure again without 

In [106]:
for q, a in zip(hybrid_queries, hybrid_answers):
  print('_' * 50)
  print('Question: ', q)
  print('Answer: ', a)


__________________________________________________
Question:  What is the most efficient multi-city tour covering the top 10 tourist attractions?
Answer:  An Error Occurred while executing the LLM Chain. Please re-try. Error: An output parsing error occurred. In order to pass this error back to the agent and have it try again, pass `handle_parsing_errors=True` to the AgentExecutor. This is the error: Could not parse LLM output: `I am unable to calculate the most efficient multi-city tour covering the top 10 tourist attractions due to limitations in the current implementation. The necessary subgraph creation from the ArangoDB Graph is not supported, which is essential for this analysis.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE 
__________________________________________________
Question:  Design a 5 City Tour which covers Museums
Answer:  The 5-city tour covering museums includes Denver (Denver Art Museum), Hartford (Wa

In [None]:
!pip install gradio

In [None]:
import gradio as gr

gr.Interface(fn=query_graph, inputs="text", outputs="text").launch(share=True)

  from .autonotebook import tqdm as notebook_tqdm


* Running on local URL:  http://127.0.0.1:7862
* Running on public URL: https://e9bad1cad629b27bee.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)






[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo determine the top 5 cities that have both airports and tourist attractions, I need to query the graph database to find cities with both these features. I will use an AQL query to retrieve this information.

Action: text_to_aql
Action Input: "Find the top 5 cities that have both airports and tourist attractions."[0m# Example Nodes:
*Important*: Only use Examples for Filtering the nodes. Do not use the data to create new Graphs

## Here are few examples of related City Nodes. Format: City Name - _id  


## Here are few examples of related Attraction Nodes. Format: Attraction Name - _id  

Tri-Cities Airport - ATTRACTION/TRICITIESAIRPORT
San Francisco International Airport - ATTRACTION/SANFRANCISCOINTERNATIONALAIRPORT
Boise Airport - ATTRACTION/BOISEAIRPORT

## Here are few examples of related Event Nodes. Format: Event Name - _id  


# Here are some nearby Nodes for reference:



[1m> Entering new ArangoGraphQAChain chain.

Traceback (most recent call last):
  File "/home/ninad/.venvs/rapids24.12/lib/python3.12/site-packages/langchain/agents/agent.py", line 1358, in _iter_next_step
    output = self._action_agent.plan(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ninad/.venvs/rapids24.12/lib/python3.12/site-packages/langchain/agents/agent.py", line 465, in plan
    for chunk in self.runnable.stream(inputs, config={"callbacks": callbacks}):
  File "/home/ninad/.venvs/rapids24.12/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3414, in stream
    yield from self.transform(iter([input]), config, **kwargs)
  File "/home/ninad/.venvs/rapids24.12/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3401, in transform
    yield from self._transform_stream_with_config(
  File "/home/ninad/.venvs/rapids24.12/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2201, in _transform_stream_with_config
    chunk: Output = context.run(next, iterator)  # type: 



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo determine which airports serve as major hubs with the most connections, I need to analyze the graph data to identify airports with the highest number of connections. This can be achieved by using a NetworkX algorithm to find nodes (airports) with the highest degree centrality, which indicates the number of direct connections each node has.

Action: text_to_nx_algorithm_to_text
Action Input: "Find airports with the highest degree centrality to identify major hubs with the most connections."[0m# Example Nodes:
*Important*: Only use Examples for Filtering the nodes. Do not use the data to create new Graphs

## Here are few examples of related City Nodes. Format: City Name - _id  


## Here are few examples of related Attraction Nodes. Format: Attraction Name - _id  

Cyber Hub - ATTRACTION/CYBERHUB
Mankato Regional Airport - ATTRACTION/MANKATOREGIONALAIRPORT
Tri-Cities Airport - ATTRACTION/TRICITIESAIRPORT

## Here are few e