# Multi-step research agent example

The research agent is inspired by [this project](https://github.com/rahulnyk/research_agent) and [BabyAGI](https://github.com/yoheinakajima/babyagi/tree/main).

The idea is as follows: we start with a research question and some source of data we can retrieve from. We retrieve the data relevant for the original question, but then instead of feeding it into the LLM prompt to answer the question, like a conventional RAG would do, we use it to ask an LLM what further questions, based on the retrieved context, would be most useful to answer the original question. We then pick one of these to do retrieval on, and by repeating that process, build a tree of questions, each with attached context, which we store as a knowledge graph.

When we decide we've done this for long enough (currently just a constraint on the number of nodes), we then walk back up the graph, first answering the leaf questions, then using these answers (along with the context retrieved for their parent question) to answer the parent question, etc. 

In [1]:
from pathlib import Path
import shutil
import os, sys
import platform

import kuzu
from dotenv import load_dotenv

# This assumes you have a .env file in the examples folder, containing your OpenAI key
load_dotenv()

WORKING_DIR = Path(os.path.realpath("."))

try: 
    from motleycrew import MotleyCrew
except ImportError:
    # if we are running this from source
    motleycrew_location = os.path.realpath(WORKING_DIR / "..")
    sys.path.append(motleycrew_location)

if "Dropbox" in WORKING_DIR.parts and platform.system() == "Windows":
    # On Windows, kuzu has file locking issues with Dropbox
    DB_PATH = os.path.realpath(os.path.expanduser("~") + "/Documents/research_db")
else:
    DB_PATH = os.path.realpath(WORKING_DIR / "research_db")

shutil.rmtree(DB_PATH)

from motleycrew import MotleyCrew
from motleycrew.storage import MotleyKuzuGraphStore
from motleycrew.common.utils import configure_logging
from motleycrew.applications.research_agent.question_task_recipe import QuestionTaskRecipe
from motleycrew.applications.research_agent.answer_task_recipe import AnswerTaskRecipe
from motleycrew.tool.simple_retriever_tool import SimpleRetrieverTool

configure_logging(verbose=True)

In [9]:
DATA_DIR = os.path.realpath(os.path.join(WORKING_DIR, "mahabharata/text/TinyTales"))
PERSIST_DIR = WORKING_DIR / "storage"

In [3]:
# Only run this the first time you run the notebook, to get the data
!git clone https://github.com/rahulnyk/mahabharata.git

fatal: destination path 'mahabharata' already exists and is not an empty directory.


In [4]:
db = kuzu.Database(DB_PATH)
graph_store = MotleyKuzuGraphStore(db)
crew = MotleyCrew(graph_store=graph_store)

2024-05-15 13:02:03,872 - INFO - Node table MotleyGraphNode does not exist in the database, creating
2024-05-15 13:02:03,887 - INFO - Relation table dummy from MotleyGraphNode to MotleyGraphNode does not exist in the database, creating


In [5]:
QUESTION = "Why did Arjuna kill Karna, his half-brother?"
MAX_ITER = 3
ANSWER_LENGTH = 200

query_tool = SimpleRetrieverTool(DATA_DIR, PERSIST_DIR, return_strings_only=True)

# We need to pas the crew to the TaskRecipes so they have access to the graph store
# and the crew is aware of them

# The question recipe is responsible for new question generation
question_recipe = QuestionTaskRecipe(
    crew=crew, question=QUESTION, query_tool=query_tool, max_iter=MAX_ITER
)

# The answer recipe is responsible for rolling the answers up the tree
answer_recipe = AnswerTaskRecipe(answer_length=ANSWER_LENGTH, crew=crew)

# Only kick off the answer recipe once the question recipe is done
question_recipe >> answer_recipe

2024-05-15 13:02:05,096 - INFO - Loading all indices.
2024-05-15 13:02:05,696 - INFO - Node table TaskRecipeNode does not exist in the database, creating
2024-05-15 13:02:05,724 - INFO - Property name not present in table for label TaskRecipeNode, creating
2024-05-15 13:02:05,752 - INFO - Property done not present in table for label TaskRecipeNode, creating
2024-05-15 13:02:05,779 - INFO - Inserting new node with label TaskRecipeNode: name='QuestionTaskRecipe' done=False
2024-05-15 13:02:05,813 - INFO - Node created OK
2024-05-15 13:02:05,816 - INFO - Relation table task_recipe_is_upstream from TaskRecipeNode to TaskRecipeNode does not exist in the database, creating
2024-05-15 13:02:05,838 - INFO - Node table Question does not exist in the database, creating
2024-05-15 13:02:05,862 - INFO - Property question not present in table for label Question, creating
2024-05-15 13:02:05,888 - INFO - Property answer not present in table for label Question, creating
2024-05-15 13:02:05,913 - INFO

QuestionTaskRecipe(name=QuestionTaskRecipe, done=False)

In [6]:
# And now run the recipes
done_tasks = crew.run()


2024-05-15 13:02:07,858 - INFO - Available task recipes: [QuestionTaskRecipe(name=QuestionTaskRecipe, done=False)]
2024-05-15 13:02:07,858 - INFO - Processing recipe: QuestionTaskRecipe(name=QuestionTaskRecipe, done=False)
2024-05-15 13:02:07,870 - INFO - Loaded unanswered questions: [Question(id=0, question=Why did Arjuna kill Karna, his half-brother?, answer=None, context=None)]
2024-05-15 13:02:08,620 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-15 13:02:08,629 - INFO - Most pertinent question according to the tool: question='Why did Arjuna kill Karna, his half-brother?' answer=None context=None
2024-05-15 13:02:08,630 - INFO - Got 1 matching tasks for recipe QuestionTaskRecipe(name=QuestionTaskRecipe, done=False)
2024-05-15 13:02:08,630 - INFO - Processing task: Task(status=pending)
2024-05-15 13:02:08,631 - INFO - Assigned task Task(status=pending) to agent <motleycrew.applications.research_agent.question_generator.QuestionGenera

In [8]:
final_answer = done_tasks[-1].question

print("Question: ", final_answer.question)
print("Answer: ", final_answer.answer)
print("To explore the graph:")
print(f"docker run -p 8000:8000  -v {DB_PATH}:/database --rm kuzudb/explorer:latest")
print("And in the kuzu explorer at http://localhost:8000 enter")
print("MATCH (A)-[r]->(B) RETURN *;")

Question:  Why did Arjuna kill Karna, his half-brother?
Answer:  Arjuna killed Karna during their duel in the Mahabharata under complex circumstances influenced by divine interventions and curses. During the duel, Karna's chariot wheel got stuck in the mud, and as he attempted to free it, he was vulnerable. Karna, bound by a curse from his teacher Parashurama, forgot the mantra to invoke the powerful Brahmastra weapon at this critical moment. Additionally, Krishna, serving as Arjuna's charioteer, played a strategic role by lowering their chariot at a crucial moment earlier in the duel, causing a serpent-arrow aimed at Arjuna's head to miss its fatal mark. These factors, combined with the psychological impact of Krishna reminding Karna of his past dishonorable acts, left Karna disheartened and distracted. Consequently, Arjuna, abiding by the rules of warfare and with Krishna's guidance, seized the opportunity to strike Karna while he was defenseless, leading to Karna's death. This act w

![This is what you will see in Kuzu explorer](img/kuzu_explorer.png)