# Ensemble Query Engine

When building a retrieval-augmented generation (RAG) application, experimenting with various query pipelines—such as top-k retrieval, keyword search, and knowledge graphs—is often necessary.

**Concept**: What if we could simultaneously use multiple strategies and have a language model (LLM):

1. Rate the relevance of each query result.

2. Synthesize the results into a coherent answer.

The Ensemble Query Engine allows you to do just that. This guide explains how to experiment with different query pipelines and strategies, have the LLM evaluate the relevance of each result, and synthesize the final response.

## Key Purposes

1. **Multi-Strategy Retrieval**: Try multiple retrieval strategies at once, such as top-k retrieval, keyword search, and knowledge graphs. This helps compare the effectiveness of various approaches.

2. **Relevance Evaluation**: Have the LLM rate how pertinent each retrieved result is to the original query, ensuring only the most relevant information is considered.

3. **Result Synthesis**: Let the LLM combine the most relevant information from different retrieval methods, leveraging its language understanding to create a comprehensive final answer.

## How to Use the Ensemble Query Engine

1. **Set Up Retrieval Tools**: Configure different retrieval tools, such as a keyword search tool and a vector search tool.

2. **Configure the Router Query Engine**: Set up a router query engine with a selector to choose the relevant retrievals and a summarizer to synthesize the final answer.

3. **Run Queries**: Use the router query engine to process queries and return synthesized responses that leverage multiple retrieval strategies evaluated by the LLM.

## Benefits

The Ensemble Query Engine enables you to harness the strengths of different retrieval methods and the reasoning capabilities of LLMs in a unified querying interface. It simplifies experimentation with various approaches, helping you find the optimal configuration for your RAG application.

In [None]:
%%capture
!pip install llama-index==0.10.37 llama-index-embeddings-openai==0.1.9 qdrant-client==1.9.1 llama-index-vector-stores-qdrant==0.2.8 llama-index-llms-openai==0.1.19

In [1]:
import os
import sys
from getpass import getpass
import nest_asyncio

from IPython.display import Markdown, display

from dotenv import load_dotenv

nest_asyncio.apply()

load_dotenv("")

sys.path.append('../helpers')

from utils import setup_llm, setup_embed_model, setup_vector_store

In [2]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY'] or getpass("Enter your OpenAI API key: ")

In [3]:
CO_API_KEY = os.environ['CO_API_KEY'] or getpass("Enter your Cohere API key: ")

In [4]:
# QDRANT_URL = os.environ['QDRANT_URL'] or getpass("Enter your Qdrant URL:")

QDRANT_URL=":memory:"

In [5]:
QDRANT_API_KEY = os.environ['QDRANT_API_KEY'] or  getpass("Enter your Qdrant API Key:")

In [6]:
from llama_index.core.settings import Settings
from utils import setup_llm, setup_embed_model

setup_llm(
    provider="openai",
    api_key=OPENAI_API_KEY, 
    model="gpt-4o", 
    temperature=0.75, 
    system_prompt="""Use ONLY the provided context and generate a complete, coherent answer to the user's query. 
    Your response must be grounded in the provided context and relevant to the essence of the user's query.
    """
    )

setup_embed_model(
    provider="openai", 
    model="text-embedding-3-small",
    api_key=OPENAI_API_KEY
    )

In [7]:
import random
from llama_index.core.storage.docstore import SimpleDocumentStore
from utils import get_documents_from_docstore, group_documents_by_author, sample_documents

documents = get_documents_from_docstore("../data/words-of-the-senpais")

random.seed(42)

documents_by_author = group_documents_by_author(documents)

senpai_documents = sample_documents(documents_by_author, num_samples=10)

In [8]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=128, chunk_overlap=8)

nodes = splitter.get_nodes_from_documents(senpai_documents)

In [9]:
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults()

storage_context.docstore.add_documents(nodes)

### [`SimpleKeywordTableIndex`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/keyword_table/simple_base.py) Class Overview

The `SimpleKeywordTableIndex` class is a simplified version of a keyword-based indexing system. 

During index construction, the KeywordTableIndex takes a dataset of text documents, chunks them, and uses GPT to extract relevant keywords. These keywords are stored in a table referencing the respective text chunk. 

During a query, the KeywordTableIndex extracts relevant keywords and uses them to retrieve a set of candidate text chunk IDs. The initial answer is constructed using the first text chunk and then refined with subsequent chunks.

- **Index Construction**: 
    - Splits text documents into chunks.
    - Extracts keywords for each chunk.
    - Stores keywords in a table referencing the text chunks.

- **Query Modes**:
    - **Default**: Uses GPT for keyword extraction and constructs answers by refining text chunks.
    - **Simple**: Uses regex for keyword extraction (implemented by `SimpleKeywordTableIndex`).


In [None]:
from llama_index.core import SimpleKeywordTableIndex, VectorStoreIndex

keyword_index = SimpleKeywordTableIndex(
    nodes,
    storage_context=storage_context,
    show_progress=True,
)
vector_index = VectorStoreIndex(
    nodes,
    storage_context=storage_context,
    show_progress=True,
)

In [11]:
from llama_index.core import PromptTemplate

QA_PROMPT_TMPL = """
Context:
---------------------
{context_str}
---------------------
Based on the context above, answer the question below. If the answer is not in the context, 
inform the user without making up an answer. Additionally, provide a relevance score for the answer.

Question: {query_str}
Answer (with relevance score):
"""


QA_PROMPT = PromptTemplate(QA_PROMPT_TMPL)

keyword_query_engine = keyword_index.as_query_engine(
    text_qa_template=QA_PROMPT
)

vector_query_engine = vector_index.as_query_engine(text_qa_template=QA_PROMPT)

In [12]:
response = vector_query_engine.query(
    "What is the importance of focusing on who you work with and what you work on, rather than just how hard you work?"
)

In [13]:
print(response)

Focusing on who you work with and what you work on is important because these factors have a greater impact on your success than merely how hard you work. By choosing the right people and projects, you position yourself to utilize your unique skills and knowledge most effectively, potentially becoming the best in your field. This approach ensures that your efforts are not just about the quantity of work but about the quality and strategic value of the work you do.

Relevance score: 9/10


In [14]:
response = keyword_query_engine.query(
    "What is the importance of focusing on who you work with and what you work on, rather than just how hard you work?"
)

In [15]:
print(response)

The importance of focusing on who you work with and what you work on, rather than just how hard you work, is highlighted in "The Almanack of Naval Ravikant." Naval emphasizes that while working hard is important, the people you collaborate with and the projects you undertake have a greater impact on your success. This perspective suggests that the quality and alignment of your work and relationships can lead to more meaningful and sustainable outcomes than merely the effort you put in.

Relevance score: 10/10


### [QueryEngineTool](https://github.com/run-llama/llama_index/blob/7849b1a851d88ee28e1bfd05d19f18e40d5b8e10/llama-index-core/llama_index/core/tools/query_engine.py#L17)

Tools are abstractions designed to be used by data agents or LLMs and provide a structured way for them to perform tasks

A `QueryEngineTool` is a specific type of tool designed to interface with and wrap existing query engines. It enables agents to perform complex queries by leveraging the capabilities of the underlying query engine.

#### Use Cases

- **Integrating Query Engines**: Allows agents to interact with query engines and other agents.

- **Complex Query Handling**: Helps execute sophisticated queries and data retrieval operations.

In [23]:
from llama_index.core.tools import QueryEngineTool

keyword_tool = QueryEngineTool.from_defaults(
    query_engine=keyword_query_engine,
    description="Useful for answering finding documents based on keywords and incomplete thoughts from a user.",
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for answering fully-formed questions from a user.",
)

## Define a Router Query Engine

The [`LLMMultiSelector`](https://github.com/run-llama/llama_index/blob/7849b1a851d88ee28e1bfd05d19f18e40d5b8e10/llama-index-core/llama_index/core/selectors/llm_selectors.py#L141) uses LLMs to make decisions. Uses a prompt to present choices to the LLM, which then selects the most relevant options based on the query. It can be used alone or integrated into query engines and retrievers. 

#### Key Functions:

1. **Data Source Selection**: Chooses the best data source from multiple options.

2. **Operational Decisions**: Decides whether to perform summarization or semantic search.

3. **Multi-Routing**: Evaluates multiple choices simultaneously and combines the results.


### Use Cases

- Selecting the right data source.
- Choosing between summarization and semantic search.
- Combining results from multiple choices.


In [24]:
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMMultiSelector

from llama_index.core.response_synthesizers import TreeSummarize

TREE_SUMMARIZE_PROMPT_TMPL = """
Context from multiple sources is below. Each source may have a relevance score.

---------------------
{context_str}
---------------------

Based on the information from the sources above, answer the question below. 

If the answer is not in the context, inform the user without making up an answer.

Question: {query_str}
Answer:
"""

tree_summarize = TreeSummarize(
    summary_template=PromptTemplate(TREE_SUMMARIZE_PROMPT_TMPL)
)

query_engine = RouterQueryEngine(
    selector=LLMMultiSelector.from_defaults(),
    query_engine_tools=[
        keyword_tool,
        vector_tool,
    ],
    summarizer=tree_summarize,
    verbose=True,
)

In [25]:
response = await query_engine.aquery(
    "How can I develop specific knowledge that will help me build wealth and achieve happiness?"
)
print(response)

[1;3;38;5;200mSelecting query engine 1: The question is a fully-formed query asking for specific knowledge on building wealth and achieving happiness..
[0mBased on the provided context, the answer to your question is not explicitly stated. However, the context does suggest that making better decisions through accurate knowledge is crucial for achieving wealth and happiness. 

Relevance Score: 7/10


In [None]:
from llama_index.core.response.notebook_utils import display_response

display_response(
    response, show_source=True, source_length=500, show_source_metadata=True
)

In [26]:
response.source_nodes

[NodeWithScore(node=TextNode(id_='07a70a38-393f-486d-a167-f2f0445bbb6c', embedding=None, metadata={'page_number': 76, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2b988748-3b49-47e9-bdc9-a991a881b36b', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_number': 76, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant'}, hash='3a49e6f7304356b26fde4a4bdd5ca2747beef954f087319476965d27832e485c'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='5251b7d7-2c3f-4f9d-8613-e90d3c608936', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='2c7fb28a5b690ce69c33842bf7c7f83eb7164010e9a14f92d362072b45aff070')}, text='technology and large workforces and capital, our decisions are leveraged more and more. If

In [27]:
response = await query_engine.aquery(
    "specific knowledge, build wealth, achieve happiness?"
)
print(response)

[1;3;38;5;200mSelecting query engine 0: The query 'specific knowledge, build wealth, achieve happiness?' consists of keywords and incomplete thoughts rather than a fully-formed question..
[0mTo build wealth and achieve happiness, you should apply specific knowledge with leverage and accountability. Naval Ravikant emphasizes that you must put in the time and effort, and position yourself to be the best at what you do with your unique skill set. This involves enjoying the process and continuously working at it. Success takes time, but with persistence and the right approach, you will eventually get what you deserve.

Relevance score: 9/10


In [28]:
response.source_nodes

[NodeWithScore(node=TextNode(id_='962509bf-f9ec-46bb-838b-47f2a92bd1f3', embedding=None, metadata={'page_number': 58, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='c39bf708-c7c5-4dfa-bb82-7b0dd8200809', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_number': 58, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant'}, hash='5c1815ed6e9e704ff80faac48a69863e93bd78cd88f248176a4a28ac5db2b993'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='ed7d4818-0d2f-439a-8313-58c56676832f', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='879f09e151844c814f1b2e9ea14684ac92e91938a15afadf984cf6c51ba8b627')}, text='became extremely successful. You just had to give them a long enough timescale. It never h

In [29]:
response = await query_engine.aquery(
    "Calm mind, clear schedule, clear mind. What do I need to do to achieve these?"
)
print(response)

[1;3;38;5;200mSelecting query engine 1: The user's query is a fully-formed question seeking advice on achieving a calm mind and clear schedule..
[0mTo achieve a calm mind, clear schedule, and clear mind, you need to focus on emancipating your mind from old habits, prejudices, and restrictive thought processes. Cultivate an alert mind by being sincere to yourself, which will lead you to a deeper truth and clarity. Additionally, be aware of regurgitated emotional responses and preconceived notions that might cloud your reality, especially in contexts like politics and business.

Relevance score: 9/10


In [30]:
response.source_nodes

[NodeWithScore(node=TextNode(id_='1acaf638-e9ce-4a34-9a35-c3a9bc9d928e', embedding=None, metadata={'page_number': 43, 'file_name': '../data/striking-thoughts.pdf', 'title': 'Striking Thoughts', 'author': 'Bruce Lee'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='67165402-0cd1-4609-9caa-71c256e63422', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_number': 43, 'file_name': '../data/striking-thoughts.pdf', 'title': 'Striking Thoughts', 'author': 'Bruce Lee'}, hash='c12ac78f3cfe07a229d059a34e23da85f9be6ff77cd680790029cfb171927a0d'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='9c7c99ed-d767-4640-885e-6619146e4a4c', node_type=<ObjectType.TEXT: '1'>, metadata={'page_number': 43, 'file_name': '../data/striking-thoughts.pdf', 'title': 'Striking Thoughts', 'author': 'Bruce Lee'}, hash='6139ebbf8cbdaceb2662105da3e0ba24a01d2d6822277fdeb9f45bc62e7a09a5'), <NodeRelationship.NEXT: '3'>: R