# RAG Retrieval Optimization - Query Rewriting using Amazon Bedrock and Llamaindex

Query rewrite in RAG is a technique that reformulates or transforms the original user query to improve the retrieval process and ultimately enhance the quality of generated responses. This strategy involves modifying the input query in various ways, such as expanding it with related terms, simplifying complex queries, or breaking them down into sub-questions. The goal is to bridge the gap between the user's natural language input and the system's ability to find relevant information in the knowledge base. By rewriting queries, RAG systems can increase both recall (retrieving more relevant documents) and precision (improving the relevance of retrieved information), leading to more accurate and comprehensive answers.

In this lab, we will build a query engine using LlamaIndex to answering a complex query about Amazon's 10K SEC filing from years 2022 and 2023. The lab uses the SubQuestionQueryEngine module from LLamaIndex to first breaks down the complex query into sub questions for each relevant data source then gathers all the intermediate responses and synthesizes to get better final response from Amazon's 10K documents.

Here are the components we used:

- Vector Database (Faiss / local)
- LLM (Amazon Bedrock - Nova Pro)
- Embeddings Model (Bedrock Titan Text Embeddings v2.0)
- Datasets ( Amazons 10-k sec filings from year 2022 and 2023 )
- Llamaindex  (This example is built on referece llamaindex documentation available at - https://docs.llamaindex.ai/en/stable/examples/query_engine/sub_question_query_engine/)


## Pre-req
You must run the `[workshop_setup.ipynb]`(../lab00-setup/workshop_setup.ipynb) notebook in `lab00-setup` before starting this lab.

In [None]:
import warnings
warnings.warn("Warning: if you did not run lab00-setup, please go back and run the lab00 notebook") 

In [None]:
import os
from pathlib import Path

# Define the config content
config_content = """[profile default]
region = us-west-2
output = json
"""

# Get the path to the .aws directory and config file
home_dir = "/home/sagemaker-user"  # SageMaker specific path
aws_dir = os.path.join(home_dir, ".aws")
config_path = os.path.join(aws_dir, "config")

# Check if config file already exists
if os.path.exists(config_path):
    print(f"AWS config file already exists at {config_path}. No changes made.")
else:
    # Create the .aws directory if it doesn't exist
    os.makedirs(aws_dir, exist_ok=True)
    
    # Create the config file with the content
    with open(config_path, "w") as f:
        f.write(config_content + "\n")
    
    print(f"Created directory {aws_dir} and created AWS config file at {config_path}")

### > Setup
We start by importing necessary llamaindex libraries

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings
from termcolor import colored

We select Amazon Nova Pro as our LLM. For embedding model, we are selecting Amazon Titan Text Embed v2.0. Chunk size is set at 512 for this example.

In [None]:
import json
from typing import Sequence, List
from llama_index.core.settings import Settings
from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.embeddings.bedrock import BedrockEmbedding, Models

profile_name = "default"

# define the LLM
llm = BedrockConverse(
    model="us.amazon.nova-pro-v1:0",
    profile_name=profile_name,
)

# define the embedding model
embed_model = BedrockEmbedding(model = "amazon.titan-embed-text-v2:0")

Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512

from llama_index.core.llms import ChatMessage
from llama_index.core.tools import BaseTool, FunctionTool
import nest_asyncio
nest_asyncio.apply()

### > Document Ingestion
We ingest the documents and use [Titan Text Embeddings v2.0 model](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html) to create the embedding for each document chunk. The amazon folder has SEC-10k files from 2022 and 2023.

In [None]:
# load data
amazon_secfiles = SimpleDirectoryReader(input_dir="../data/lab03/amazon/").load_data()

vector_index = VectorStoreIndex.from_documents(
    amazon_secfiles,
    use_async=True,
)

Define the query engine from index

In [None]:
# build index and query engine
vector_query_engine = vector_index.as_query_engine(
    top_k=1
)

### > Test Using Naive RAG
Let's try a common multi-part question on the 10K document using the naive RAG approach...

Example questions
- What are amazons key priorities before, during and after covid?
- What were key challenges faced by Amazon in year 2022 and 2023?

In [None]:
query = "What are amazons key priorities before, during and after covid?"

In [None]:
raw_response = vector_query_engine.query(query)
print(colored(raw_response, "green"))

### > Test Using SubQuestionQueryEngine to rewrite the query

`SubQuestionQueryEngine` is a llamaindex module designed to tackle the problem of answering a complex query using multiple data sources.
It first breaks down the complex query into sub questions for each relevant data source, then gather all the intermediate reponses and synthesizes a final response.

In this example, the `SubQuestionQueryEngine` will use the LLM model (Claude3 Sonnet) to breaking down of complex queries into sub-queries.

In [None]:
# setup base query engine as tool
query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="Amazon-10k",
            description="Amazon SEC 10-k filings for years 2022 and 2023",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

Run the same query using `SubQuestionQueryEngine`

In [None]:
transformed_response = query_engine.query(query)
print("\n")
print(colored(transformed_response, "green"))

### > Display the results side-by-side 

Notice splitting the question into sub questions increase your chances of matching the right and complete information and generate a more comprehensive final answer.

In [None]:
import pandas as pd
from IPython.display import display, HTML

# Create the first table
df = pd.DataFrame({
    'Naive RAG': [query, raw_response],
    'RAG w/ Query Rewrite': [query, transformed_response]
})

output=""
output += df.style.hide().set_table_attributes("style='display:inline'")._repr_html_()
output += "&nbsp;"

display(HTML(output))