# Multi-step research agent

The research agent is inspired by [this project](https://github.com/rahulnyk/research_agent) and [BabyAGI](https://github.com/yoheinakajima/babyagi/tree/main).

The idea is as follows: we start with a research question and some source of data we can retrieve from. We retrieve the data relevant for the original question, but then instead of feeding it into the LLM prompt to answer the question, like a conventional RAG would do, we use it to ask an LLM what further questions, based on the retrieved context, would be most useful to answer the original question. We then pick one of these to do retrieval on, and by repeating that process, build a tree of questions, each with attached context, which we store as a knowledge graph.

When we decide we've done this for long enough (currently just a constraint on the number of nodes), we then walk back up the graph, first answering the leaf questions, then using these answers (along with the context retrieved for their parent question) to answer the parent question, etc. 

In [1]:
from pathlib import Path
import shutil
import os, sys
import platform

import kuzu
from dotenv import load_dotenv

# This assumes you have a .env file in the examples folder, containing your OpenAI key
load_dotenv()

WORKING_DIR = Path(os.path.realpath("."))


try:
    from motleycrew import MotleyCrew
except ImportError:
    # if we are running this from source
    motleycrew_location = os.path.realpath(WORKING_DIR / "..")
    sys.path.append(motleycrew_location)

if "Dropbox" in WORKING_DIR.parts and platform.system() == "Windows":
    # On Windows, kuzu has file locking issues with Dropbox
    DB_PATH = os.path.realpath(os.path.expanduser("~") + "/Documents/research_db")
else:
    DB_PATH = os.path.realpath(WORKING_DIR / "research_db")

if os.path.isdir(DB_PATH):
    shutil.rmtree(DB_PATH)

from motleycrew import MotleyCrew
from motleycrew.storage import MotleyKuzuGraphStore
from motleycrew.common.utils import configure_logging
from motleycrew.applications.research_agent.question_task import QuestionTask
from motleycrew.applications.research_agent.answer_task import AnswerTask
from motleycrew.tools.simple_retriever_tool import SimpleRetrieverTool

configure_logging(verbose=True)

In [2]:
DATA_DIR = os.path.realpath(os.path.join(WORKING_DIR, "mahabharata/text/TinyTales"))
PERSIST_DIR = WORKING_DIR / "storage"

In [3]:
# Only run this the first time you run the notebook, to get the raw data
!git clone https://github.com/rahulnyk/mahabharata.git

fatal: destination path 'mahabharata' already exists and is not an empty directory.


In [4]:
db = kuzu.Database(DB_PATH)
graph_store = MotleyKuzuGraphStore(db)
crew = MotleyCrew(graph_store=graph_store)

2024-05-25 16:18:36,853 - INFO - Node table MotleyGraphNode does not exist in the database, creating
2024-05-25 16:18:36,860 - INFO - Relation table dummy from MotleyGraphNode to MotleyGraphNode does not exist in the database, creating


In [5]:
QUESTION = "Why did Arjuna kill Karna, his half-brother?"
MAX_ITER = 3
ANSWER_LENGTH = 200

query_tool = SimpleRetrieverTool(DATA_DIR, PERSIST_DIR, return_strings_only=True)

# We need to pass the crew to the Tasks so they have access to the graph store
# and the crew is aware of them

# The question task is responsible for new question generation
question_recipe = QuestionTask(
    crew=crew, question=QUESTION, query_tool=query_tool, max_iter=MAX_ITER
)

# The answer task is responsible for rolling the answers up the tree
answer_recipe = AnswerTask(answer_length=ANSWER_LENGTH, crew=crew)

# Only kick off the answer task once the question task is done
question_recipe >> answer_recipe

2024-05-25 16:18:37,591 - INFO - Loading all indices.
2024-05-25 16:18:37,708 - INFO - Node table TaskNode does not exist in the database, creating
2024-05-25 16:18:37,710 - INFO - Property name not present in table for label TaskNode, creating
2024-05-25 16:18:37,712 - INFO - Property done not present in table for label TaskNode, creating
2024-05-25 16:18:37,714 - INFO - Node table QuestionGenerationTaskUnit does not exist in the database, creating
2024-05-25 16:18:37,716 - INFO - Property status not present in table for label QuestionGenerationTaskUnit, creating
2024-05-25 16:18:37,717 - INFO - Property output not present in table for label QuestionGenerationTaskUnit, creating
2024-05-25 16:18:37,719 - INFO - Property question not present in table for label QuestionGenerationTaskUnit, creating
2024-05-25 16:18:37,722 - INFO - Relation table QuestionGenerationTaskUnit_belongs from QuestionGenerationTaskUnit to TaskNode does not exist in the database, creating
2024-05-25 16:18:37,723 -

QuestionTask(name=QuestionTask, done=False)

In [6]:
# And now run the recipes
done_items = crew.run()

2024-05-25 16:18:37,813 - INFO - Available tasks: [QuestionTask(name=QuestionTask, done=False)]
2024-05-25 16:18:37,814 - INFO - Processing task: QuestionTask(name=QuestionTask, done=False)
2024-05-25 16:18:37,820 - INFO - Loaded unanswered questions: [Question(id=0, question=Why did Arjuna kill Karna, his half-brother?, answer=None, context=None)]
2024-05-25 16:18:38,882 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-25 16:18:38,890 - INFO - Most pertinent question according to the tool: question='Why did Arjuna kill Karna, his half-brother?' answer=None context=None
2024-05-25 16:18:38,891 - INFO - Got a matching unit for task QuestionTask(name=QuestionTask, done=False)
2024-05-25 16:18:38,891 - INFO - Processing task: TaskUnit(status=pending)
2024-05-25 16:18:38,892 - INFO - Assigned unit TaskUnit(status=pending) to agent <motleycrew.applications.research_agent.question_generator.QuestionGeneratorTool object at 0x1345b7ec0>, dispatch

In [7]:
final_answer = done_items[-1].question

print("Question: ", final_answer.question)
print("Answer: ", final_answer.answer)
print("To explore the graph:")
print(f"docker run -p 8000:8000  -v {DB_PATH}:/database --rm kuzudb/explorer:latest")
print("And in the kuzu explorer at http://localhost:8000 enter")
print("MATCH (A)-[r]->(B) RETURN *;")

Question:  Why did Arjuna kill Karna, his half-brother?
Answer:  Arjuna killed Karna during their intense duel in the Kurukshetra war, a pivotal and tragic moment driven by a combination of strategy, fate, and divine intervention. As the duel progressed, Karna faced a critical disadvantage when his chariot wheel got stuck in the mud. Despite his plea for a pause based on the warrior code, Krishna pointed out Karna's past dishonorable actions, undermining his request for fairness. At this vulnerable moment, Karna, unable to use his celestial weapon due to a curse that caused him to forget the necessary invocation, was defenseless. Krishna seized this opportunity to instruct Arjuna to take action. Despite Arjuna's initial hesitation to strike an unarmed opponent, he followed Krishna's guidance. Arjuna's arrow fatally wounded Karna, leading to his death. This act was not just a tactical move in the war but was also influenced by the overarching narrative of destiny and moral conflict, whe

![This is what you will see in Kuzu explorer](img/kuzu_explorer.png)