
### Using a Query Engine to Answer Queries

• Find the  [Notebook](https://colab.research.google.com/github/towardsai/ragbook-notebooks/blob/main/notebooks/Chapter%2008%20-%20Mastering_Advanced_RAG.ipynb)  for this section at  [towardsai.net/book](http://towardsai.net/book).

A  [Query Engine](https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/root.html)  is an advanced interface that allows interaction with data via natural language queries. It’s a wrapper designed to process queries and generate responses. Combining multiple query engines can enhance functionality, meeting the complexity of specific queries.

On the other hand, a  [Chat Engine](https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/root.html)  is suitable for an interactive experience like a conversation, as it requires a series of queries and responses. This offers a more dynamic and engaging way to interact with data.

To create a query engine, one typically uses the `.as_query_engine()` method on generated indices. Here are the steps included in creating indexes from text files and using query engines to engage with the dataset:

Install necessary packages using the command: `!pip install -q llama-index==0.9.14.post3 deeplake==3.8.8 openai==1.3.8 cohere==4.37.`

And set up the API key environment variables:

In [None]:
import os
from advanced_rag_custom_utils.helper import get_openai_api_key, get_activeloop_api_key
OPENAI_API_KEY = get_openai_api_key()
ACTIVELOOP_API_KEY = get_activeloop_api_key()

Download a text file as your source document. We used a file containing a collection of essays by Paul Graham consolidated into a single text file. You can also download this file directly from [towardsai.net/book](http://towardsai.net/book). Alternatively, you can use the following commands in your terminal to create a directory and download the file into it:

In [None]:
!mkdir -p './paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O './paul_graham/paul_graham_essay.txt'

Now, use the `SimpleDirectoryReader` in the LlamaIndex framework to read all files from the designated directory. This class is designed to automatically navigate through the files, converting them into Document objects.

In [None]:
from llama_index import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader("./paul_graham").load_data()

We will also use the `ServiceContext` to break the lengthy single document into numerous smaller chunks with some overlap. Following that, we will make nodes from the generated documents.

In [None]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(chunk_size=512, chunk_overlap=64)
node_parser = service_context.node_parser

nodes = node_parser.get_nodes_from_documents(documents)

The nodes should be stored in a vector database for convenient access. The `DeepLakeVectorStore` class can create an empty dataset by specifying a path. You can access the processed dataset using the genai360 organization ID or update it to match your *Activeloop* username and store the data on your account.

In [None]:
from llama_index.vector_stores import DeepLakeVectorStore

my_activeloop_org_id = ACTIVELOOP_API_KEY
my_activeloop_dataset_name = "LlamaIndex_paulgraham_essays"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"

# Create an index over the documnts
vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=False)

The new database will be used within a `StorageContext` object, allowing for the processing of nodes to establish relationships as required. At last, the `VectorStoreIndex`  receives the nodes and their corresponding links to the database and uploads the data to the cloud. It builds the index and creates embeddings for each segment.

In [None]:
from llama_index.storage.storage_context import StorageContext
from llama_index import VectorStoreIndex

storage_context = StorageContext.from_defaults(vector_store=vector_store)
storage_context.docstore.add_documents(nodes)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

The generated index is the base for defining the query engine. To start a query engine, you can utilize the index object and call the `.as_query_engine()` method. The code snippet below uses the streaming flag to improve the end user’s experience by minimizing idle waiting time (further information will be provided on this topic). In addition, it utilizes the `similarity_top_k` flag to determine the number of documents to retrieve from the index to answer the query.

In [None]:
query_engine = vector_index.as_query_engine(streaming=True, similarity_top_k=10)

The final step is interacting with the source data using the `.query()` method. We can now ask questions, and the query engine generates answers using retrievers and a response synthesizer.

In [None]:
streaming_response = query_engine.query(
    "What does Paul Graham do?",
)
streaming_response.print_response_stream()

    Paul Graham is an artist and entrepreneur. He is passionate about creating paintings that can stand the test of time. He has also co-founded Y Combinator, a startup accelerator, and is actively involved in the startup ecosystem. While he has a background in computer science and has worked on software development projects, his primary focus is on his artistic pursuits and supporting startups.

The query engine can be set up to operate in a streaming mode, delivering a response stream in real-time to improve interactivity. This feature is advantageous in minimizing downtime for end users. Users can easily view each word as it is generated, eliminating the need to wait for the model to produce the complete response.

### Splitting Complex Queries into Subqueries

The `SubQuestionQueryEngine` is a class designed to handle complex queries effectively. This engine can break down a user’s primary question into multiple sub-questions, address each individually, and subsequently synthesize the answers to formulate a comprehensive response. To implement this approach, alter the earlier query engine configuration, specifically deactivate the streaming flag, as it is incompatible with this method.

In [None]:
query_engine = vector_index.as_query_engine(similarity_top_k=10)

The `query_engine` can be registered as a tool by using the `QueryEngineTool` class, along with descriptive metadata. This description informs the framework about the functionalities of this tool, facilitating the selection of the most appropriate tool for a given task, particularly in scenarios where there are multiple tools at disposal. Following this, the integration of previously declared tools and the service context, as established earlier, is utilized to initialize the `SubQuestionQueryEngine` object.

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine

query_engine_tools = [
    QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="pg_essay",
            description="Paul Graham essay on What I Worked On",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    service_context=service_context,
    use_async=True,
)

The pipeline is ready to split a question into simpler subquestions. As shown, it formulates three queries, each responding to a portion of the query, and attempts to locate their answers separately. A response synthesizer then processes the responses to produce the final output.

In [None]:
response = query_engine.query(
    "How was Paul Grahams life different before, during, and after YC?"
)
print( ">>> The final response:\n", response )

 >>> The final response:  
Paul Graham's life was different before, during, and after YC. Before YC, he worked on a variety of projects including writing essays, developing YC's internal software in Arc, and creating Hacker News. During YC, his focus shifted to writing essays and working on YC itself. After YC, he continued writing essays but also worked on various projects such as developing the programming language Arc and later its new version called Bel. He also explored other potential projects and engaged in painting for a period of time. Overall, his work and interests evolved throughout these different phases of his life.