# Router Query Engine

Routers serve as specialized modules designed to process a user's query and select from a set of predefined "choices," characterized by their metadata.

There are two primary types of core router modules:

1. **LLM Selectors:** These selectors present the available choices as a text prompt, utilizing the LLM text completion endpoint for decision-making.

2. **Pydantic Selectors:** Here, choices are passed in the form of Pydantic schemas to a function-calling endpoint. The results are then returned as Pydantic objects.

## Setup

Install `llama-index`

In [None]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.8.51.post1-py3-none-any.whl (792 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m792.6/792.6 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
Collecting aiostream<0.6.0,>=0.5.2 (from llama-index)
  Downloading aiostream-0.5.2-py3-none-any.whl (39 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7 (from llama-index)
  Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)
Collecting deprecated>=1.2.9.3 (from llama-index)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting langchain>=0.0.303 (from llama-index)
  Downloading langchain-0.0.323-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=0.26.4 (from llama-index)
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
C

In [None]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

In [None]:
import logging
import sys

# Set up the root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)  # Set logger level to INFO

# Clear out any existing handlers
logger.handlers = []

# Set up the StreamHandler to output to sys.stdout (Colab's output)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)  # Set handler level to INFO

# Add the handler to the logger
logger.addHandler(handler)

from llama_index import (
    VectorStoreIndex,
    SummaryIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
)

import openai
from IPython.display import display, HTML


# Setup openai api key
openai.api_key = 'YOUR OPENAI API KEY'

NumExpr defaulting to 2 threads.


## Download Data

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2023-10-26 15:27:46--  https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2023-10-26 15:27:46 (5.32 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



## Load data

In [None]:
# load documents
documents = SimpleDirectoryReader("data/paul_graham").load_data()

# initialize service context (set chunk size)
service_context = ServiceContext.from_defaults(chunk_size=1024)
nodes = service_context.node_parser.get_nodes_from_documents(documents)

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


## Define Summary Index and Vector Index over Same Data

In [None]:
# Summary Index for summarization questions
summary_index = SummaryIndex(nodes)

# Vector Index for answering specific context questions
vector_index = VectorStoreIndex(nodes)

## Define Query Engines.

1. Summary Index Query Engine.
2. Vector Index Query Engine.

In [None]:
# Summary Index Query Engine
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
    service_context = service_context
)

# Vector Index Query Engine
vector_query_engine = vector_index.as_query_engine(service_context = service_context)

## Build summary index and vector index tools

In [None]:
from llama_index.tools.query_engine import QueryEngineTool

# Summary Index tool
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description="Useful for summarization questions related to Paul Graham eassy on What I Worked On.",
)

# Vector Index tool
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for retrieving specific context from Paul Graham essay on What I Worked On.",
)

## Define Router Query Engine

Various selectors are at your disposal, each offering unique characteristics.

Pydantic selectors, supported exclusively by gpt-4-0613 and the default gpt-3.5-turbo-0613, utilize the OpenAI Function Call API. Instead of interpreting raw JSON, they yield pydantic selection objects.

On the other hand, LLM selectors employ the LLM to generate a JSON output, which is then parsed to query the relevant indexes.

For both selector types, you can opt to route to either a single index or multiple indexes.

## PydanticSingleSelector

Use the OpenAI Function API to generate/parse pydantic objects under the hood for the router selector.

In [None]:
from llama_index.query_engine.router_query_engine import RouterQueryEngine
from llama_index.selectors.llm_selectors import LLMSingleSelector, LLMMultiSelector
from llama_index.selectors.pydantic_selectors import (
    PydanticMultiSelector,
    PydanticSingleSelector,
)

# Create Router Query Engine
query_engine = RouterQueryEngine(
    selector=PydanticSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
)

In [None]:
response = query_engine.query("What is the summary of the document?")

Selecting query engine 0: This choice is specifically mentioned as useful for summarization questions..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1663 request_id=05f4bc40c48d57cee65224592f507ffb response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=3454 request_id=1ab0df88737640a055cf07a645327e0f response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=3200 request_id=e71b7608c1ab09d064c07948f3b2cb3b response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=3640 request_id=196158cac541e87a52869bf64436d4b2 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=3501 request_id=0328626e5dfa525a1ca0270556b87427 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=4414 requ

In [None]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

## LLMSingleSelector

Utilize OpenAI (or another LLM) to internally interpret the generated JSON and determine a sub-index for routing.

In [None]:
# Create Router Query Engine
query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
)

In [None]:
response = query_engine.query("What is the summary of the document?")

Selecting query engine 0: The summary of the document is related to Paul Graham's essay on What I Worked On..
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1750 request_id=200c1a6ac1dd410b2949e20280a3ffa7 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=2227 request_id=f38a68434e62ee7bf5fce7ef0a6abda4 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=3185 request_id=ce2c3240d255a15883201d5f24b666d0 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=3271 request_id=927af0471aaeab67bfec171162b1e858 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=3644 request_id=271b1a920ceeae812fb5e17b14a8d5b6 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=423

In [None]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [None]:
response = query_engine.query("What did Paul Graham do after RICS?")

Selecting query engine 1: The question is asking for specific context about what Paul Graham did after RICS, which is better suited for retrieving specific context from the essay..


In [None]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

## PydanticMultiSelector

If you anticipate queries being directed to multiple indexes, it's advisable to use a multi-selector. This selector dispatches the query to various sub-indexes and subsequently aggregates the responses through a summary index to deliver a comprehensive answer.

## Let's create a simplekeywordtable index and corresponding tool.

In [None]:
from llama_index import SimpleKeywordTableIndex

keyword_index = SimpleKeywordTableIndex(nodes)

keyword_query_engine = keyword_index.as_query_engine(service_context=service_context)

keyword_tool = QueryEngineTool.from_defaults(
    query_engine=keyword_query_engine,
    description="Useful for retrieving specific context using keywords from Paul Graham essay on What I Worked On.",
)

## Build a router query engine.

In [None]:
query_engine = RouterQueryEngine(
    selector=PydanticMultiSelector.from_defaults(),
    query_engine_tools=[
        vector_tool,
        keyword_tool,
        summary_tool
    ],
)

In [None]:
# This query could use either a keyword or vector query engine, so it will combine responses from both
response = query_engine.query(
    "What were noteable events and people from the authors time at Interleaf and YC?"
)

In [None]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))