<a href="https://colab.research.google.com/github/SamurAIGPT/LlamaIndex-course/blob/main/query_engines/Query_Engines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Query Engines

Query Engine works on top of retriever and response synthesizer which we discussed in our previous lessons

A query engine allows you to ask question over your data. LlamaIndex supports a bunch of query engines. We will be discussing some of them now

1. Router Query Engine
2. Retriever Router Query Engine
3. Joint QA Summary Query Engine
4. Sub Question Query Engine
5. Custom Retriever with Hybrid Search

### Router Query Engine

Router Query Engine helps you create a query engine over multiple indices. An index is chosen based on the similarity of the question with index description

For example if you have an index knowledgeable on Maths and another index knowledgeable in Physics, you can create a combined query enginer over both these indices by creating a Router Query Engine

Let's try to understand with the help of an example

In [None]:
!pip install llama-index

In [2]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
import openai
openai.api_key = "your-openai-key"

In [3]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().handlers = []
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index import (
    VectorStoreIndex,
    ListIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
)

### Download relevant data

In [4]:
!mkdir data
!wget https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/data/paul_graham_essay.txt
!mv paul_graham_essay.txt data/

--2023-07-22 16:56:24--  https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/data/paul_graham_essay.txt
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84944 (83K) [text/plain]
Saving to: ‘paul_graham_essay.txt’


2023-07-22 16:56:25 (5.35 MB/s) - ‘paul_graham_essay.txt’ saved [84944/84944]



In [4]:
documents = SimpleDirectoryReader("./data/").load_data()

In [5]:
service_context = ServiceContext.from_defaults(chunk_size=1024)
nodes = service_context.node_parser.get_nodes_from_documents(documents)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

In [7]:
list_index = ListIndex(nodes, storage_context=storage_context)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

We are creating 2 query engines, one for summarization tasks and one for QA

For summarization we are using a list index with response mode tree_summarize which can build a tree data structure from input data and create a summary of the data bottom-up

For QA we will be using VectorIndex which can fetch the top relevant documents for creating the response

In [8]:
from llama_index.tools.query_engine import QueryEngineTool
list_query_engine = list_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()
list_tool = QueryEngineTool.from_defaults(
    query_engine=list_query_engine,
    description="Useful for summarization questions related to Paul Graham eassy on What I Worked On.",
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for retrieving specific context from Paul Graham essay on What I Worked On.",
)

Now we will build a RouterQueryEngine on top of these two which can select one of these engines based on the input query

In [9]:
from llama_index.query_engine.router_query_engine import RouterQueryEngine
from llama_index.selectors.llm_selectors import LLMSingleSelector, LLMMultiSelector
from llama_index.selectors.pydantic_selectors import (
    PydanticMultiSelector,
    PydanticSingleSelector,
)

query_engine = RouterQueryEngine(
    selector=PydanticSingleSelector.from_defaults(),
    query_engine_tools=[
        list_tool,
        vector_tool,
    ],
)

Since the question is a summarization question, it picks up the list index and creates a summary to give a response

In [12]:
print(query_engine.query("What is the summary of the document?").response)


The document is a collection of essays written by Paul Graham, reflecting on his journey from working on Viaweb to starting Y Combinator, painting, writing essays, working on Lisp, and writing Bel. He discusses topics such as the difficulty of carrying heavy items, the problems with running a forum and writing essays, leaving Y Combinator, and the concept of invented versus discovered. He also thanks several people for reading drafts of the essays.


Since here we have a QA question, a vector index is chosen to give the appropriate response. Thus based on the input query the proper query engine is selected

In [13]:
print(query_engine.query("What did Paul Graham do after RICS?").response)


Paul Graham decided to paint. He wanted to see how good he could get if he dedicated himself to painting and left his job at Y Combinator. He recruited Dan Giffin and two undergrads to help him build a web app for making web apps, and he moved to Cambridge to start the company.


### Retriever Router Query Engine

Retriever Router Query Engine functions similar to the above in functionality. The only difference being the Router is powered by a retriever

The advantage of using a vector index powered retriever is, the number of query engines that can be kept as part of the router is no longer limited by model context length. Thus any number of query engines can be used as part of the retriever

In [14]:
import nest_asyncio

nest_asyncio.apply()

In [15]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import (
    VectorStoreIndex,
    ListIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
)

In [16]:
documents = SimpleDirectoryReader("./data/").load_data()

In [17]:
service_context = ServiceContext.from_defaults(chunk_size=1024)
nodes = service_context.node_parser.get_nodes_from_documents(documents)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

In [18]:
list_index = ListIndex(nodes, storage_context=storage_context)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

In [19]:
from llama_index.tools.query_engine import QueryEngineTool

list_query_engine = list_index.as_query_engine(
    response_mode="tree_summarize", use_async=True
)
vector_query_engine = vector_index.as_query_engine(
    response_mode="tree_summarize", use_async=True
)

list_tool = QueryEngineTool.from_defaults(
    query_engine=list_query_engine,
    description="Useful for questions asking for a biography of the author.",
)
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for retrieving specific snippets from the author's life, like his time in college, his time in YC, or more.",
)

Until now the procedure is same as the Router Query Engine, we create 2 query engines. Now here is the difference, we build a Router Query Engine using a vector index which is ObjectIndex mentioned here.

ObjectIndex is an underlying index data structure and can serialize QueryEngineTool objects to indices.

In [20]:
from llama_index import VectorStoreIndex
from llama_index.objects import ObjectIndex, SimpleToolNodeMapping

tool_mapping = SimpleToolNodeMapping.from_objects([list_tool, vector_tool])
obj_index = ObjectIndex.from_objects(
    [list_tool, vector_tool],
    tool_mapping,
    VectorStoreIndex,
)

In [21]:
from llama_index.query_engine import ToolRetrieverRouterQueryEngine

query_engine = ToolRetrieverRouterQueryEngine(obj_index.as_retriever())

In [22]:
response = query_engine.query("What is a biography of the author's life?")

In [23]:
print(response)


Paul Graham is a computer scientist, programmer, and entrepreneur who was born in England. He moved to the United States in the 1980s to pursue a PhD in computer science at Harvard University, where he wrote the book On Lisp and worked on a project called Bel. He then moved to Italy to study art at the Accademia di Belli Arti in Florence. After returning to the United States, he wrote essays and eventually moved back to England with his family. In 2019, he finished Bel and wrote a series of essays. In 2020, he wrote an essay about how he chooses what to work on.


In [24]:
response = query_engine.query("What did Paul Graham do during his time in college?")

In [25]:
print(str(response))


Paul Graham studied computer science and took art classes at Harvard University. He applied to two art schools, RISD and the Accademia di Belli Arti in Florence, and was accepted to RISD. He wrote a dissertation on applications of continuations in order to graduate from Harvard. He then attended RISD, where he took the foundation classes in drawing, color, and design. He also took the entrance exam for the Accademia di Belli Arti in Florence and passed.


### Sub Question Query Engine

A sub question query engine tackles the problem of answering a complex query
by breaking down the complex query into sub questions for each relevant data source, then gather all the intermediate reponses and synthesizes a final response.

In [26]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine
from llama_index.callbacks import CallbackManager, LlamaDebugHandler
from llama_index import ServiceContext

In [27]:
# load data
pg_essay = SimpleDirectoryReader(input_dir="./data/").load_data()

# build index and query engine
query_engine = VectorStoreIndex.from_documents(pg_essay).as_query_engine()

In [28]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="pg_essay", description="Paul Graham essay on What I Worked On"
        ),
    )
]

# Using the LlamaDebugHandler to print the trace of the sub questions
# captured by the SUB_QUESTION callback event type
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])
service_context = ServiceContext.from_defaults(callback_manager=callback_manager)
query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    service_context=service_context,
    use_async=False,
)

In [29]:
response = await query_engine.aquery(
    "How was Paul Grahams life different before and after YC?"
)

Generated 2 sub questions.
[36;1m[1;3m[pg_essay] Q: What did Paul Graham work on before YC?
[0m[33;1m[1;3m[pg_essay] Q: What did Paul Graham work on after YC?
[0m[33;1m[1;3m[pg_essay] A: 
Paul Graham continued to work on writing essays and working on YC. He also worked on Hacker News, which was originally meant to be a news aggregator for startup founders and was called Startup News. He wrote all of YC's internal software in Arc, but gradually stopped working on Arc due to lack of time and the infrastructure depending on it.
[0m[36;1m[1;3m[pg_essay] A: 
Before YC, Paul Graham worked on hacking, writing essays, and Arc, a programming language. He also ran a weekly dinner at his building in Cambridge and created the Summer Founders Program, which invited undergrads to apply for startup funding.
[0m**********
Trace: query
    |_llm ->  3.388304 seconds
    |_sub_questions ->  3.653011 seconds
    |_synthesize ->  3.346864 seconds
      |_llm ->  3.342244 seconds
**********


In [30]:
print(response)


Paul Graham's life changed significantly after YC. Before YC, he was mainly focused on hacking, writing essays, and developing Arc, a programming language. He also ran a weekly dinner at his building in Cambridge and created the Summer Founders Program. After YC, he continued to write essays and work on YC, but he also worked on Hacker News and wrote all of YC's internal software in Arc. He gradually stopped working on Arc due to lack of time and the infrastructure depending on it.


### Joint QA Summary Query Engine

In [31]:
import nest_asyncio

nest_asyncio.apply()

In [32]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [33]:
from llama_index.composability.joint_qa_summary import QASummaryQueryEngineBuilder
from llama_index import SimpleDirectoryReader, ServiceContext, LLMPredictor
from llama_index.response.notebook_utils import display_response
from llama_index.llms import OpenAI

In [34]:
reader = SimpleDirectoryReader("./data/")
documents = reader.load_data()

In [35]:
service_context = ServiceContext.from_defaults(chunk_size=1024)

In [36]:
query_engine_builder = QASummaryQueryEngineBuilder(service_context=service_context)
query_engine = query_engine_builder.build_from_documents(documents)

In [37]:
response = query_engine.query(
    "Can you give me a summary of the author's life?",
)

In [38]:
print(response)


The author, Paul Graham, is a computer scientist and artist. He was born in England and moved to the US to pursue a PhD in computer science. While in grad school, he worked on On Lisp and wrote a dissertation on applications of continuations. He then applied to art schools and was accepted to RISD and the Accademia di Belli Arti in Florence. He moved to Florence and passed the entrance exam, and then spent 3 months writing essays. He then worked on Bel, an interpreter written in itself, for years, while living in England. In 2019, Bel was finished and he wrote essays about topics he had stacked up. He now lives in England and is thinking about what to work on next.


In [39]:
response = query_engine.query(
    "What did the author do during his time in art school?",
)

In [40]:
print(response)


The author took art classes at Harvard, applied to two art schools (RISD and the Accademia di Belli Arti in Florence), took the entrance exam for the Accademia di Belli Arti in Florence, attended the RISD foundation program, and painted still lives in his bedroom at night while attending the Accademia di Belli Arti in Florence. He also learned Italian and studied under professor Ulivi.


### Custom Retriever with Hybrid Search

Keyword based search was the initial form of search used in information retrieval systems. Then recently we have Vector db based search which works based on semantic similarity.

It is not always necessary that a Vector db backed search performs better than a keyword based search on a particular query. It can be vice-versa.

Thus to overcome this, we can use Hybrid search which results in best of both worlds. Let's discuss how we can achieve this with the help of a Custom Retreiver in LlamaIndex

1.   List item
2.   List item



In [41]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import (
    VectorStoreIndex,
    SimpleKeywordTableIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
)
from IPython.display import Markdown, display

In [42]:
documents = SimpleDirectoryReader("./data/").load_data()

In [43]:
service_context = ServiceContext.from_defaults(chunk_size=1024)
node_parser = service_context.node_parser
nodes = node_parser.get_nodes_from_documents(documents)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

In [44]:
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
keyword_index = SimpleKeywordTableIndex(nodes, storage_context=storage_context)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [45]:
from llama_index import QueryBundle

# import NodeWithScore
from llama_index.schema import NodeWithScore

# Retrievers
from llama_index.retrievers import (
    BaseRetriever,
    VectorIndexRetriever,
    KeywordTableSimpleRetriever,
)

from typing import List

In [46]:
class CustomRetriever(BaseRetriever):
    """Custom retriever that performs both semantic search and hybrid search."""

    def __init__(
        self,
        vector_retriever: VectorIndexRetriever,
        keyword_retriever: KeywordTableSimpleRetriever,
        mode: str = "AND",
    ) -> None:
        """Init params."""

        self._vector_retriever = vector_retriever
        self._keyword_retriever = keyword_retriever
        if mode not in ("AND", "OR"):
            raise ValueError("Invalid mode.")
        self._mode = mode

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve nodes given query."""

        vector_nodes = self._vector_retriever.retrieve(query_bundle)
        keyword_nodes = self._keyword_retriever.retrieve(query_bundle)

        vector_ids = {n.node.node_id for n in vector_nodes}
        keyword_ids = {n.node.node_id for n in keyword_nodes}

        combined_dict = {n.node.node_id: n for n in vector_nodes}
        combined_dict.update({n.node.node_id: n for n in keyword_nodes})

        if self._mode == "AND":
            retrieve_ids = vector_ids.intersection(keyword_ids)
        else:
            retrieve_ids = vector_ids.union(keyword_ids)

        retrieve_nodes = [combined_dict[rid] for rid in retrieve_ids]
        return retrieve_nodes

In [47]:
from llama_index import get_response_synthesizer
from llama_index.query_engine import RetrieverQueryEngine

# define custom retriever
vector_retriever = VectorIndexRetriever(index=vector_index, similarity_top_k=2)
keyword_retriever = KeywordTableSimpleRetriever(index=keyword_index)
custom_retriever = CustomRetriever(vector_retriever, keyword_retriever)

# define response synthesizer
response_synthesizer = get_response_synthesizer()

# assemble query engine
custom_query_engine = RetrieverQueryEngine(
    retriever=custom_retriever,
    response_synthesizer=response_synthesizer,
)

# vector query engine
vector_query_engine = RetrieverQueryEngine(
    retriever=vector_retriever,
    response_synthesizer=response_synthesizer,
)
# keyword query engine
keyword_query_engine = RetrieverQueryEngine(
    retriever=keyword_retriever,
    response_synthesizer=response_synthesizer,
)

In [48]:
response = custom_query_engine.query("What did the author do during his time at YC?")

In [50]:
print(response)


The author worked on YC, writing essays, developing internal software in Arc, and creating Hacker News. He also helped select and support founders, resolve disputes between cofounders, and fight with people who maltreated the startups. He worked hard, even at the parts he didn't like, and eventually handed YC over to someone else. After his mother's death, he checked out of YC and decided to pursue painting.
