In this notebook we will look into `RouterQueryEngine` to route the user queries to one of the available query engine tools. These tools can be different indicies/ query engine on same documents/ different documents.

### Installation

In [None]:
!pip install llama-index
!pip install llama-index-llms-anthropic
!pip install llama-index-embeddings-huggingface

### Set Logging

In [None]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

import logging
import sys

# Set up the root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)  # Set logger level to INFO

# Clear out any existing handlers
logger.handlers = []

# Set up the StreamHandler to output to sys.stdout (Colab's output)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)  # Set handler level to INFO

# Add the handler to the logger
logger.addHandler(handler)

from IPython.display import display, HTML

### Set Anthropic API Key

In [None]:
import os
os.environ['ANTHROPIC_API_KEY'] = 'YOUR ANTHROPIC API KEY'

### Set LLM and Embedding model

In [None]:
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

In [None]:
llm = Anthropic(temperature=0.0, model='claude-2.1')
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [None]:
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512

### Download Document

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-02-29 12:32:03--  https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-02-29 12:32:03 (5.65 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



### Load Document

In [None]:
# load documents
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("data/paul_graham").load_data()

### Create Indicies and Query Engines.

In [None]:
from llama_index.core import SummaryIndex, VectorStoreIndex
# Summary Index for summarization questions
summary_index = SummaryIndex.from_documents(documents)

# Vector Index for answering specific context questions
vector_index = VectorStoreIndex.from_documents(documents)

In [None]:
# Summary Index Query Engine
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)

# Vector Index Query Engine
vector_query_engine = vector_index.as_query_engine()

### Creat tools for summary and vector query engines.

In [None]:
from llama_index.core.tools.query_engine import QueryEngineTool

# Summary Index tool
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description="Useful for summarization questions related to Paul Graham eassy on What I Worked On.",
)

# Vector Index tool
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for retrieving specific context from Paul Graham essay on What I Worked On.",
)

### Create Router Query Engine

In [None]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors.llm_selectors import LLMSingleSelector, LLMMultiSelector

In [None]:
# Create Router Query Engine
query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
)

### Test Queries

In [None]:
response = query_engine.query("What is the summary of the document?")

HTTP Request: POST https://api.anthropic.com/v1/complete "HTTP/1.1 200 OK"
Selecting query engine 0: Choice 1 states it is useful for summarization questions related to the Paul Graham essay on What I Worked On. Since the question asks for a summary of the document, choice 1 is most relevant..
HTTP Request: POST https://api.anthropic.com/v1/complete "HTTP/1.1 200 OK"


In [None]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [None]:
response = query_engine.query("What did Paul Graham do after RICS?")

HTTP Request: POST https://api.anthropic.com/v1/complete "HTTP/1.1 200 OK"
Selecting query engine 1: The question is asking what Paul Graham did after RICS, which requires retrieving specific context from his essay rather than just a broad summarization. Choice 2 mentions retrieving specific context from the essay, so it is the most relevant choice..
HTTP Request: POST https://api.anthropic.com/v1/complete "HTTP/1.1 200 OK"


In [None]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))