# ArangoDB 🥑 + LangChain 🦜🔗 (Basics)

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/Langchain.ipynb)

**Looking for the full notebook? Check out [LangChain_Full.ipynb](https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/Langchain_Full.ipynb).**

Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. However, using these LLMs in isolation is often insufficient for creating a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.

[LangChain](https://www.langchain.com/) is a framework for developing applications powered by language models. It enables applications that are:
- Data-aware: connect a language model to other sources of data
- Agentic: allow a language model to interact with its environment

On July 25 2023, ArangoDB introduced the first release of the [ArangoGraphQAChain](https://langchain-langchain.vercel.app/docs/integrations/providers/arangodb) to the LangChain community, allowing you to leverage LLMs to provide a natural language interface for your ArangoDB data.

**Please note**: This notebook uses the LangChain `ChatOpenAI` wrapper, which requires you to have a **paid** [OpenAI API Key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key). However, other Chat Models are available as well: https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/chat_models

## Setup

In [None]:
%%capture

# 1: Install the dependencies

!pip install python-arango # The ArangoDB Python Driver
!pip install adb-cloud-connector # The ArangoDB Cloud Instance provisioner
!pip install arango-datasets # Datasets package
!pip install openai==1.6.1
!pip install langchain==0.1.0

In [None]:
# 2: Provision a temporary ArangoDB Cloud instance

from adb_cloud_connector import get_temp_credentials

connection = get_temp_credentials(tutorialName="LangChain")

connection

In [None]:
# 3: Instantiate the ArangoDB-Python driver

from arango import ArangoClient

client = ArangoClient(hosts=connection["url"])

db = client.db(
    connection["dbName"],
    connection["username"],
    connection["password"],
    verify=True
)

db

In [None]:
# 4: Load sample data
# We'll be relying on our Game Of Thrones dataset, representing the parent-child
# relationships of certain characters from the GoT universe

from arango_datasets import Datasets

Datasets(db).load("GAME_OF_THRONES")

In [None]:
# 5: Instantiate the ArangoDB-LangChain Graph wrapper

from langchain.graphs import ArangoGraph

graph = ArangoGraph(db)

graph.schema

In [None]:
# 6: Set your OpenAI API Key
# https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key

import os

os.environ["OPENAI_API_KEY"] = "sk-..."

In [None]:
# 7: Instantiate the OpenAI Chat model
# Note that other models can be used as well
# Ref: https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/chat_models

from langchain.chat_models import ChatOpenAI

model = ChatOpenAI(temperature=0, model_name='gpt-4')

In [None]:
# 8: Instantiate the LangChain Question-Answering Chain with
# our **model** and **graph**

from langchain.chains import ArangoGraphQAChain

chain = ArangoGraphQAChain.from_llm(model, graph=graph, verbose=True)

## Prompting

"Prompting" is the process of providing a language model with a set of text-based instructions to achieve some arbitrary output. The text can be as simple as a single word, or as complex as a full paragraph. The model is responsible for generating a response based on the content of the prompt.

This section will utilize the [ChatOpenAI](https://python.langchain.com/docs/integrations/chat/openai) wrapper under the hood to translate Natural Language into ArangoDB Query Language (AQL) queries. In fact, the `chain` object we've created is responsible for the following steps:
1. Translate the Natural Language Prompt into an AQL Query
2. Execute the AQL Query against the ArangoDB database to retrieve a JSON Result
3. If a query error occurs, go back to step 1 with a modified prompt to include the error message
3. Translate the JSON Result to a Natural Language answer

Let's take a look at how this works in practice.

In [None]:
chain.invoke("Who are the 2 youngest characters?")

In [None]:
chain.invoke("How are Bran Stark and Arya Stark related?")

In [None]:
chain.invoke("Who are Bran Stark’s grandparents?")

In [None]:
chain.invoke("Fetch me the character count for each family")

In [None]:
chain.invoke("What is the age difference between Rickard Stark and Arya Stark?")

In [None]:
chain.invoke("Wie alt ist Rickard Stark?") # (German: "How old is Rickard Stark?")

In [None]:
chain.invoke("What is the average age within the Stark family?")

In [None]:
chain.invoke("Does Bran Stark have a dead parent?")

In [None]:
chain.invoke("Who are Catelyn Stark's children?")

In [None]:
chain.invoke("Add Jon Snow, 31, a male character")

In [None]:
chain.invoke("Create a ChildOf edge from Jon Snow to Ned Stark")

In [None]:
chain.invoke("Who is related to Ned Stark?")

In [None]:
chain.invoke("What can you tell me about the characters?")

In [None]:
chain.invoke("What is the shortest path from Bran Stark to Rickard Stark?")

In [None]:
chain.invoke("What is th family tree of Joffrey Baratheon?")

In [None]:
chain.invoke("What is the relationship between Bran Stark and Rickard Stark?")

In [None]:
chain.invoke("Are Arya Stark and Ned Stark related?")

In [None]:
chain.invoke("Is Ned Stark alive?")

In [None]:
chain.invoke("Ned Stark has died. Update the data")

In [None]:
chain.invoke("How many characters are alive? How many characters are dead?")

In [None]:
chain.invoke("Is Arya Stark an orphan?")

## Prompt Modifiers

[Prompt Engineering](https://en.wikipedia.org/wiki/Prompt_engineering) can be defined as process of improving a prompt to achieve a better result from a language model.

The `chain` object comes with a set of built-in prompt modifiers that can be used to improve the quality of the results. These modifiers are:
- `top_k`: Limit the maximum number of results returned by the AQL Query execution
- `max_aql_generation_attempts`: Limit the maximum number of times the AQL Query is generated before giving up (i.e if the query is invalid)
- `return_aql_query`: Return the AQL Query as part of the output dictionary (useful for debugging)
- `return_aql_result`: Return the AQL Query Result as part of the output dictionary (useful for debugging)
- `aql_examples`: A list of AQL Examples for the model to learn from when generating the next AQL Query. This is a powerful tool for teaching the model how to generate AQL Queries for your specific dataset.

Let's start by looking at the `aql_examples` modifier.

In [None]:
# Notice how the following prompt returns nothing;
chain.invoke("Who are the grandchildren of Rickard Stark?")

In [None]:
# This is because the wrong AQL Traversal direction is used! LLMs can hallucinate.. 
# A simple reminder to use INBOUND (instead of OUTBOUND) returns the correct result;
chain.invoke("Who are the grandchildren of Rickard Stark? Remember to use INBOUND")

In [None]:
# We can solidify this pattern by making using of **chain.aql_examples**

# The AQL Examples modifier instructs the LLM to adapt its AQL-completion style
# to the user’s examples. These examples are passed to the AQL Generation Prompt
# Template to promote few-shot-learning.

chain.aql_examples = """
# Who are the grandchildren of Rickard Stark?
WITH Characters, ChildOf
FOR v, e IN 2..2 INBOUND 'Characters/RickardStark' ChildOf
  RETURN v

# Is Ned Stark alive?
RETURN DOCUMENT('Characters/NedStark').alive

...
"""

# Note how we are no longer specifying the use of INBOUND
chain.invoke("Who is the grandchildren of Tywin Lannister?")

In [None]:
# Other modifiers include:

# Specify the maximum number of AQL Query Results to return
chain.top_k = 5

# Specify the maximum amount of AQL Generation attempts that should be made
# before returning an error
chain.max_aql_generation_attempts = 5

# Specify whether or not to return the AQL Query in the output dictionary
# Use `chain("...")` instead of `chain.invoke("...")` to see this change
chain.return_aql_query = True

# Specify whether or not to return the AQL JSON Result in the output dictionary
# Use `chain("...")` instead of `chain.invoke("...")` to see this change
chain.return_aql_result = True