# ArangoDB + LangChain

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/Langchain.ipynb)

Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. However, using these LLMs in isolation is often insufficient for creating a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.

[LangChain](https://www.langchain.com/) is a framework for developing applications powered by language models. It enables applications that are:
- Data-aware: connect a language model to other sources of data
- Agentic: allow a language model to interact with its environment

On July 25 2023, ArangoDB introduced the first release of the [ArangoGraphQAChain](https://langchain-langchain.vercel.app/docs/integrations/providers/arangodb) to the LangChain community, allowing you to leverage LLMs to provide a natural language interface for your ArangoDB data.

Please note: This notebook uses the LangChain `ChatOpenAI` wrapper, which requires you to have a **paid** [OpenAI API Key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key). However, other Chat Models are available as well: https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/chat_models 

You can get a local ArangoDB instance running via the [ArangoDB Docker image](https://hub.docker.com/_/arangodb):  

```
docker run -p 8529:8529 -e ARANGO_ROOT_PASSWORD= arangodb/arangodb
```

An alternative is to use the [ArangoDB Cloud Connector package](https://github.com/arangodb/adb-cloud-connector#readme) to get a temporary cloud instance running:

In [1]:
%%capture
!pip install python-arango # The ArangoDB Python Driver
!pip install adb-cloud-connector # The ArangoDB Cloud Instance provisioner
!pip install openai
!pip install langchain==0.0.242

In [2]:
# Instantiate ArangoDB Database
import json
from arango import ArangoClient
from adb_cloud_connector import get_temp_credentials

con = get_temp_credentials(tutorialName="LangChain")

db = ArangoClient(hosts=con["url"]).db(
    con["dbName"], con["username"], con["password"], verify=True
)

print(json.dumps(con, indent=2))

Log: requesting new credentials...
Succcess: new credentials acquired
{
  "dbName": "TUTn5g8q1l6xwbxpo4v8tjpr9",
  "username": "TUTil17b1x7h8uyp4q99qign",
  "password": "TUTsix7e9lh57pg2mton2hhns",
  "hostname": "tutorials.arangodb.cloud",
  "port": 8529,
  "url": "https://tutorials.arangodb.cloud:8529"
}


In [3]:
# Instantiate the ArangoDB-LangChain Graph
from langchain.graphs import ArangoGraph

graph = ArangoGraph(db)

## Populating the Database

We will rely on the Python Driver to import our [GameOfThrones](https://github.com/arangodb/example-datasets/tree/master/GameOfThrones) data into our database.

In [4]:
if db.has_graph("GameOfThrones"):
    db.delete_graph("GameOfThrones", drop_collections=True)

db.create_graph(
    "GameOfThrones",
    edge_definitions=[
        {
            "edge_collection": "ChildOf",
            "from_vertex_collections": ["Characters"],
            "to_vertex_collections": ["Characters"],
        },
    ],
)

documents = [
    {
        "_key": "NedStark",
        "name": "Ned",
        "surname": "Stark",
        "alive": True,
        "age": 41,
        "gender": "male",
    },
    {
        "_key": "CatelynStark",
        "name": "Catelyn",
        "surname": "Stark",
        "alive": False,
        "age": 40,
        "gender": "female",
    },
    {
        "_key": "AryaStark",
        "name": "Arya",
        "surname": "Stark",
        "alive": True,
        "age": 11,
        "gender": "female",
    },
    {
        "_key": "BranStark",
        "name": "Bran",
        "surname": "Stark",
        "alive": True,
        "age": 10,
        "gender": "male",
    },
]

edges = [
    {"_to": "Characters/NedStark", "_from": "Characters/AryaStark"},
    {"_to": "Characters/NedStark", "_from": "Characters/BranStark"},
    {"_to": "Characters/CatelynStark", "_from": "Characters/AryaStark"},
    {"_to": "Characters/CatelynStark", "_from": "Characters/BranStark"},
]

db.collection("Characters").import_bulk(documents)
db.collection("ChildOf").import_bulk(edges)

{'error': False,
 'created': 4,
 'errors': 0,
 'empty': 0,
 'updated': 0,
 'ignored': 0,
 'details': []}

## Getting & Setting the ArangoDB Schema

An initial ArangoDB Schema is generated upon instantiating the `ArangoDBGraph` object. Below are the schema's getter & setter methods should you be interested in viewing or modifying the schema:

In [5]:
# The schema should be empty here,
# since `graph` was initialized prior to ArangoDB Data ingestion (see above).

import json

print(json.dumps(graph.schema, indent=4))

{
    "Graph Schema": [],
    "Collection Schema": []
}


In [6]:
graph.set_schema()

In [7]:
# We can now view the generated schema

import json

print(json.dumps(graph.schema, indent=4))

{
    "Graph Schema": [
        {
            "graph_name": "GameOfThrones",
            "edge_definitions": [
                {
                    "edge_collection": "ChildOf",
                    "from_vertex_collections": [
                        "Characters"
                    ],
                    "to_vertex_collections": [
                        "Characters"
                    ]
                }
            ]
        }
    ],
    "Collection Schema": [
        {
            "collection_name": "ChildOf",
            "collection_type": "edge",
            "edge_properties": [
                {
                    "name": "_key",
                    "type": "str"
                },
                {
                    "name": "_id",
                    "type": "str"
                },
                {
                    "name": "_from",
                    "type": "str"
                },
                {
                    "name": "_to",
                    "type": "str

## Querying the ArangoDB Database

We can now use the ArangoDB Graph QA Chain to inquire about our data

Please note: This notebook uses the LangChain `ChatOpenAI` wrapper, which requires you to have a **paid** [OpenAI API Key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key).

In [8]:
import os

os.environ["OPENAI_API_KEY"] = "your-key-here"

In [9]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ArangoGraphQAChain

chain = ArangoGraphQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True
)

In [10]:
chain.run("Is Ned Stark alive?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters
FOR character IN Characters
FILTER character.name == "Ned" && character.surname == "Stark"
RETURN character.alive
[0m
AQL Result:
[32;1m[1;3m[True][0m

[1m> Finished chain.[0m


'Ned Stark is alive.'

In [11]:
chain.run("Who is the oldest character")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters
FOR character IN Characters
SORT character.age DESC
LIMIT 1
RETURN character
[0m
AQL Result:
[32;1m[1;3m[{'_key': 'NedStark', '_id': 'Characters/NedStark', '_rev': '_gd7CuMK---', 'name': 'Ned', 'surname': 'Stark', 'alive': True, 'age': 41, 'gender': 'male'}][0m

[1m> Finished chain.[0m


'The oldest character in the database is Ned Stark. He is a male character who is currently alive and is 41 years old.'

In [12]:
chain.run("Does Arya Stark have a dead parent?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters, ChildOf
FOR v, e IN 1..1 OUTBOUND 'Characters/AryaStark' ChildOf
FILTER v.alive == false
RETURN e
[0m
AQL Result:
[32;1m[1;3m[{'_key': '63349999', '_id': 'ChildOf/63349999', '_from': 'Characters/AryaStark', '_to': 'Characters/CatelynStark', '_rev': '_gd7CuNi--A'}][0m

[1m> Finished chain.[0m


'Yes, Arya Stark does have a dead parent. According to the information in the database, Arya Stark is connected to Catelyn Stark through the "ChildOf" relationship. The result of the query shows that there is an edge between Arya Stark and Catelyn Stark, indicating that Catelyn Stark is Arya\'s parent.'

In [13]:
chain.run("Are Arya Stark and Ned Stark related?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters, ChildOf
FOR v, e, p IN 1..1 OUTBOUND 'Characters/AryaStark' ChildOf
    FILTER p.vertices[1]._key == 'NedStark'
    RETURN p
[0m
AQL Result:
[32;1m[1;3m[{'vertices': [{'_key': 'AryaStark', '_id': 'Characters/AryaStark', '_rev': '_gd7CuMK--A', 'name': 'Arya', 'surname': 'Stark', 'alive': True, 'age': 11, 'gender': 'female'}, {'_key': 'NedStark', '_id': 'Characters/NedStark', '_rev': '_gd7CuMK---', 'name': 'Ned', 'surname': 'Stark', 'alive': True, 'age': 41, 'gender': 'male'}], 'edges': [{'_key': '63349997', '_id': 'ChildOf/63349997', '_from': 'Characters/AryaStark', '_to': 'Characters/NedStark', '_rev': '_gd7CuNi---'}], 'weights': [0, 1]}][0m

[1m> Finished chain.[0m


'Yes, Arya Stark and Ned Stark are related. They are connected through the "ChildOf" relationship. Arya Stark is the child of Ned Stark.'

In [14]:
chain.run("Who is the youngest child of Ned Stark? Use INBOUND")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters, ChildOf
FOR v, e, p IN 1..1 INBOUND 'Characters/NedStark' ChildOf
SORT v.age ASC
LIMIT 1
RETURN v
[0m
AQL Result:
[32;1m[1;3m[{'_key': 'BranStark', '_id': 'Characters/BranStark', '_rev': '_gd7CuMK--C', 'name': 'Bran', 'surname': 'Stark', 'alive': True, 'age': 10, 'gender': 'male'}][0m

[1m> Finished chain.[0m


'The youngest child of Ned Stark is Bran Stark. He is 10 years old and is alive.'

In [15]:
chain.run("Add John Snow as a new male character (age 31)")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters
INSERT {
  "_key": "JohnSnow",
  "name": "John",
  "surname": "Snow",
  "alive": true,
  "age": 31,
  "gender": "male"
} INTO Characters
[0m
AQL Result:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


'Summary: \nJohn Snow has been successfully added as a new male character with an age of 31.'

In [16]:
chain.run("Add Eddard Stark, a 60-year old male")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters
INSERT {
  "_key": "EddardStark",
  "name": "Eddard",
  "surname": "Stark",
  "alive": true,
  "age": 60,
  "gender": "male"
} INTO Characters
[0m
AQL Result:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


'Based on your request, I have successfully added a new character named Eddard Stark to the database. Eddard Stark is a 60-year-old male.'

In [17]:
chain.run("Create a ChildOf edge from Characters/JohnSnow to Characters/EddardStark")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters, ChildOf
INSERT {
  "_from": "Characters/JohnSnow",
  "_to": "Characters/EddardStark"
} INTO ChildOf
[0m
AQL Result:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


'A ChildOf edge has been successfully created from the character John Snow to Eddard Stark.'

In [18]:
chain.run("What can you tell me about the characters?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters
RETURN Characters
[0m
AQL Query Execution Error: 
[33;1m[1;3mAQL: collection 'Characters' used as expression operand (while instantiating plan)[0m

AQL Query (2):[32;1m[1;3m
FOR character IN Characters
RETURN character
[0m
AQL Result:
[32;1m[1;3m[{'_key': 'NedStark', '_id': 'Characters/NedStark', '_rev': '_gd7CuMK---', 'name': 'Ned', 'surname': 'Stark', 'alive': True, 'age': 41, 'gender': 'male'}, {'_key': 'CatelynStark', '_id': 'Characters/CatelynStark', '_rev': '_gd7CuMK--_', 'name': 'Catelyn', 'surname': 'Stark', 'alive': False, 'age': 40, 'gender': 'female'}, {'_key': 'AryaStark', '_id': 'Characters/AryaStark', '_rev': '_gd7CuMK--A', 'name': 'Arya', 'surname': 'Stark', 'alive': True, 'age': 11, 'gender': 'female'}, {'_key': 'BranStark', '_id': 'Characters/BranStark', '_rev': '_gd7CuMK--C', 'name': 'Bran', 'surname': 'Stark', 'alive': True, 'age': 10, 'gender': 'male'}, {'_key':

'The characters in the database include Ned Stark, Catelyn Stark, Arya Stark, Bran Stark, Jon Snow, and Eddard Stark. Ned Stark is a male character who is alive and 41 years old. Catelyn Stark is a female character who is deceased and 40 years old. Arya Stark is a female character who is alive and 11 years old. Bran Stark is a male character who is alive and 10 years old. Jon Snow is a male character who is alive and 31 years old. Eddard Stark is a male character who is alive and 60 years old.'

In [19]:
chain.run("What can you tell me about the edges?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH ChildOf
RETURN ChildOf
[0m
AQL Query Execution Error: 
[33;1m[1;3mAQL: collection 'ChildOf' used as expression operand (while instantiating plan)[0m

AQL Query (2):[32;1m[1;3m
FOR c IN ChildOf
RETURN c
[0m
AQL Result:
[32;1m[1;3m[{'_key': '63349997', '_id': 'ChildOf/63349997', '_from': 'Characters/AryaStark', '_to': 'Characters/NedStark', '_rev': '_gd7CuNi---'}, {'_key': '63349998', '_id': 'ChildOf/63349998', '_from': 'Characters/BranStark', '_to': 'Characters/NedStark', '_rev': '_gd7CuNi--_'}, {'_key': '63349999', '_id': 'ChildOf/63349999', '_from': 'Characters/AryaStark', '_to': 'Characters/CatelynStark', '_rev': '_gd7CuNi--A'}, {'_key': '63350000', '_id': 'ChildOf/63350000', '_from': 'Characters/BranStark', '_to': 'Characters/CatelynStark', '_rev': '_gd7CuNi--C'}, {'_key': '63350022', '_id': 'ChildOf/63350022', '_from': 'Characters/JohnSnow', '_to': 'Characters/EddardStark', '_rev': '_gd7D

'The edges in the database represent relationships between different characters. Based on the AQL query, the result shows the details of the edges. For example, we can see that Arya Stark is a child of Ned Stark, Bran Stark is also a child of Ned Stark, and so on. The AQL result provides the key, ID, from_vertex, to_vertex, and revision information for each edge.'

## Chain Modifiers

You can alter the values of the following `ArangoDBGraphQAChain` class variables to modify the behaviour of your chain results


In [22]:
# Specify the maximum number of AQL Query Results to return
chain.top_k = 10

# Specify whether or not to return the AQL Query in the output dictionary
chain.return_aql_query = True

# Specify whether or not to return the AQL JSON Result in the output dictionary
chain.return_aql_result = True

# Specify the maximum amount of AQL Generation attempts that should be made
chain.max_aql_generation_attempts = 5

# Specify a set of AQL Query Examples, which are passed to
# the AQL Generation Prompt Template to promote few-shot-learning.
# Defaults to an empty string.
chain.aql_examples = """
# Is Ned Stark alive?
RETURN DOCUMENT('Characters/NedStark').alive

# Is Arya Stark the child of Ned Stark?
FOR e IN ChildOf
    FILTER e._from == "Characters/AryaStark" AND e._to == "Characters/NedStark"
    RETURN e
"""

In [23]:
chain.run("Is Ned Stark alive?")

# chain("Is Ned Stark alive?") # Returns a dictionary with the AQL Query & AQL Result



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters
RETURN DOCUMENT('Characters/NedStark').alive
[0m
AQL Result:
[32;1m[1;3m[True][0m

[1m> Finished chain.[0m


'Yes, Ned Stark is alive.'

In [24]:
chain.run("Is Bran Stark the child of Ned Stark?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH ChildOf, Characters
FOR e IN ChildOf
    FILTER e._from == "Characters/BranStark" AND e._to == "Characters/NedStark"
    RETURN e
[0m
AQL Result:
[32;1m[1;3m[{'_key': '63349998', '_id': 'ChildOf/63349998', '_from': 'Characters/BranStark', '_to': 'Characters/NedStark', '_rev': '_gd7CuNi--_'}][0m

[1m> Finished chain.[0m


'Yes, Bran Stark is indeed the child of Ned Stark.'