# Lesson 1: Router Engine

Welcome to Lesson 1.

To access the `requirements.txt` file, the data/pdf file required for this lesson and the `helper` and `utils` modules, please go to the `File` menu and select`Open...`.

I hope you enjoy this course!

## Setup

In [1]:
from helper import get_openai_api_key

OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio

nest_asyncio.apply()

## Load Data

To download this paper, below is the needed code:

#!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf

**Note**: The pdf file is included with this lesson. To access it, go to the `File` menu and select`Open...`.

In [3]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()

## Define LLM and Embedding model

In [4]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [5]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

## Define Summary Index and Vector Index over the Same Data

In [6]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

## Define Query Engines and Set Metadata

In [7]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [8]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
)

## Define Router Query Engine

In [9]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

In [10]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: This choice indicates that the document is useful for summarization questions related to MetaGPT..
[0mThe document introduces MetaGPT, a meta-programming framework that enhances multi-agent systems based on Large Language Models (LLMs) by incorporating human-like Standardized Operating Procedures (SOPs). MetaGPT assigns specific roles to agents, streamlines workflows, and improves task decomposition through structured communication and efficient information sharing mechanisms. It includes an executable feedback mechanism to iteratively improve code quality and has demonstrated superior performance in code generation tasks. The document also emphasizes the importance of human-inspired SOPs in artificial multi-agent systems and suggests future research directions in this area.


In [11]:
print(len(response.source_nodes))

34


In [12]:
response = query_engine.query(
    "How do agents share information with other agents?"
)
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it focuses on retrieving specific context from the MetaGPT paper, which may contain information on how agents share information with other agents..
[0mAgents share information with other agents by utilizing a shared message pool where they can publish structured messages. This shared message pool allows all agents to exchange messages directly, enabling them to both publish their own messages and access messages from other agents transparently. Additionally, agents can subscribe to relevant messages based on their role profiles, allowing them to extract the information they need for their specific tasks and responsibilities.


## Let's put everything together

In [13]:
from utils import get_router_query_engine

query_engine = get_router_query_engine("metagpt.pdf")

In [14]:
response = query_engine.query("Tell me about the ablation study results?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: Ablation study results typically involve specific context from the MetaGPT paper, making choice 2 more relevant..
[0mThe ablation study results show that MetaGPT effectively addresses challenges related to context utilization, code hallucinations, and information overload in software development. By accurately unfolding natural language descriptions and maintaining information validity, MetaGPT eliminates ambiguity and enables LLMs to focus on relevant data. Additionally, the system reduces code hallucinations by guiding LLMs through granular tasks like requirement analysis and package selection. To tackle information overload, MetaGPT utilizes a global message pool and a subscription mechanism to streamline communication and filter out irrelevant contexts, enhancing the efficiency and relevance of information exchange.


### Additional tasks

Now that I understand what a router engine is and how it operates, I'm interested to see how SimpleDirectoryReader works with multiple formats (besides PDFs). I will try it with a Word document and possibly a Markdown format. I will follow the steps presented in the official documentation: https://docs.llamaindex.ai/en/stable/understanding/loading/loading/

In [15]:
# pip install docx2txt

In [37]:
word_documents = SimpleDirectoryReader(input_files=["ant_colony.docx"]).load_data()

In [38]:
word_documents

[Document(id_='9a6dc5d4-0855-4d71-a3c4-c87814b02164', embedding=None, metadata={'file_name': 'ant_colony.docx', 'file_path': 'ant_colony.docx', 'file_size': 5606, 'creation_date': '2024-07-24', 'last_modified_date': '2024-07-24'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. Artificial ants stand for multi-agent methods inspired by the behavior of real ants. The pheromone-based communication of biological ants is often the predominant paradigm used.[2] Combinations of artificial ants and local search algorithms have become a method of choice f

In [44]:
markdown_documents = SimpleDirectoryReader(input_files=["README_beetroot.md"]).load_data()

In [45]:
markdown_documents

[Document(id_='426ee6ff-fbb5-4fa5-acee-3407b95cb201', embedding=None, metadata={'file_path': 'README_beetroot.md', 'file_name': 'README_beetroot.md', 'file_size': 2602, 'creation_date': '2024-07-24', 'last_modified_date': '2024-07-24'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nTheMatrix\n\nThis web application was created using Django 4.1.5 as final project for Beetroot Academy\'s "Python Development" Course. It represents a taxi app which allows users to place orders, track their status, chat with both current order\'s driver and administrators of the company and also rate the drivers that they\'ve interacted with. The data for our orders is retrieved from Google Maps using Google APIs.\n\n\n', start_char_idx=None, end_char_idx=None, text_tem

In [50]:
md_vector_index = VectorStoreIndex.from_documents(markdown_documents)
md_vector_query_engine = md_vector_index.as_query_engine()

In [51]:
md_vector_tool = QueryEngineTool.from_defaults(
    query_engine=md_vector_query_engine,
    description=(
        "Useful for retrieving specific context from the readme file."
    ),
)

In [52]:
md_query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        md_vector_tool,
    ],
    verbose=True
)

In [54]:
md_response = md_query_engine.query(
    "How many authors are listed in the readme?"
)
print(str(md_response))

[1;3;38;5;200mSelecting query engine 0: The readme file is likely to list the authors of the project, making it useful for retrieving this specific context..
[0mThree


The query engine seems to work smoothly with all the file formats tested.