- user queries are a challenge, if a user provides an ambiguous query, they'll get ambiguous matches

- modifying and expanding queries to improve/ enhance retrieval effectiveness

- when dealing with complex or ambiguous queries, RAG systems face challenged in retrieving the most relevant information.

- query transformation techniques address this issue by trying to reformulate queries to better match relevant documents or to retrieve more comprehensive information.

---
- **query rewriting**: reformulating queries to be more specifc and detailed, improving the likelihood of retrieving relevant information
---
- **step-back prompting**: generates broader queries for better context/ background info retrieval
---
- **sub-query decomposition**: breaks down complex queries into simpler sub-queries for more comprehensive information retrieval that covers diff aspects of a complex query
---

- each technique can be used independently or in combination, depending on the specific use case

- query transformation techniques offer powerful ways to enchance the retrieval capabilities of RAG systems. They can significantly improve the relevance, context, and comprehensiveness of retrieved info. These methods are particularly valuable in domains where queries can be complex or cultifaceted.

### AWS setup

In [2]:
import boto3
from dotenv import load_dotenv, find_dotenv
import os
import json

In [3]:
load_dotenv(find_dotenv())

True

In [4]:
# === AWS Configuration === #
COGNITO_REGION = os.getenv("COGNITO_REGION")
BEDROCK_REGION = os.getenv("BEDROCK_REGION")
MODEL_ID = os.getenv("MODEL_ID")
IDENTITY_POOL_ID = os.getenv("IDENTITY_POOL_ID")
USER_POOL_ID = os.getenv("USER_POOL_ID")
APP_CLIENT_ID = os.getenv("APP_CLIENT_ID")
USERNAME = os.getenv("USERNAME")
PASSWORD = os.getenv("PASSWORD")

In [6]:
# === Helper: Get AWS Credentials === #
def get_credentials(username, password):
    idp_client = boto3.client("cognito-idp", region_name=COGNITO_REGION)
    response = idp_client.initiate_auth(
        AuthFlow="USER_PASSWORD_AUTH",
        AuthParameters={"USERNAME": username, "PASSWORD": password},
        ClientId=APP_CLIENT_ID,
    )
    id_token = response["AuthenticationResult"]["IdToken"]

    identity_client = boto3.client("cognito-identity", region_name=COGNITO_REGION)
    identity_response = identity_client.get_id(
        IdentityPoolId=IDENTITY_POOL_ID,
        Logins={f"cognito-idp.{COGNITO_REGION}.amazonaws.com/{USER_POOL_ID}": id_token},
    )

    creds_response = identity_client.get_credentials_for_identity(
        IdentityId=identity_response["IdentityId"],
        Logins={f"cognito-idp.{COGNITO_REGION}.amazonaws.com/{USER_POOL_ID}": id_token},
    )

    return creds_response["Credentials"]

In [7]:
# === Helper: Invoke Claude via Bedrock === #
def invoke_bedrock(prompt_text):
    credentials = get_credentials(USERNAME, PASSWORD)

    bedrock_runtime = boto3.client(
        "bedrock-runtime",
        region_name=BEDROCK_REGION,
        aws_access_key_id=credentials["AccessKeyId"],
        aws_secret_access_key=credentials["SecretKey"],
        aws_session_token=credentials["SessionToken"],
    )

    payload = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 4000,
        "temperature": 0,
        "messages": [{"role": "user", "content": prompt_text}]
    }

    response = bedrock_runtime.invoke_model(
        body=json.dumps(payload),
        modelId=MODEL_ID,
        contentType="application/json",
        accept="application/json"
    )

    result = json.loads(response["body"].read())
    return result["content"][0]["text"]

In [12]:
invoke_bedrock("Hi")

'Hello! How can I assist you today?'

### Query rewriting

In [14]:
from langchain_core.prompts import PromptTemplate

In [15]:
query_rewrite_template = """You are an AI assistant tasked with reformulating user queries to improve retrieval in a RAG system. 
Given the original query, rewrite it to be more specific, detailed, and likely to retrieve relevant information.

Original query: 
{original_query}

Rewritten query:"""

In [16]:
query_rewrite_prompt = PromptTemplate(
    template = query_rewrite_template,
    input_variables=["original_query"],
)

In [17]:
from langchain_core.runnables import RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [18]:
query_rewriter = query_rewrite_prompt | RunnableLambda(lambda x: x.text) | RunnableLambda(invoke_bedrock)

In [35]:
original_query = """During my recent exam, I accidentally clicked on a pop-up notification, and the activity log recorded that I was away from the exam page for about 20–30 seconds. However, after that click, I did not interact with anything outside the exam page. I simply did not click or type within the exam during that short period.

May I ask if this kind of inactivity (not interacting on the exam page after clicking an outside notification) is also recorded as being “out of the exam,” or does the log only reflect actual navigation away from the page?

Thank you for clarifying."""

In [36]:
rewritten_query = query_rewriter.invoke({"original_query":original_query})

In [37]:
# import textwrap
# print(textwrap.TextWrapper(width=100).fill(rewritten_query))

In [38]:
print(rewritten_query)

Here is a rewritten version of the query that is more specific, detailed, and likely to retrieve relevant information:

I recently took an online exam and accidentally clicked on a pop-up notification during the exam. This caused the activity log to record that I was away from the exam page for 20-30 seconds. However, after clicking the notification, I did not interact with anything outside the exam page - I simply did not click or type within the exam during that short period.

I would like to know if this type of brief inactivity on the exam page, after accidentally clicking an outside notification, is also recorded as being "out of the exam" in the activity log. Or does the log only reflect actual navigation and interaction away from the exam page itself?

Could you please clarify how the activity log records and categorizes this type of brief, unintentional inactivity on the exam page, versus actual navigation away from the exam? I want to understand the distinction in how this typ

In [1]:
"""
You are an AI language model assistant. Your task is to generate 5 different versions of the given user question to retrieve relevant documents from a vector database. By generating ultiple prespectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search.
Provide alternative questions.

Original Question:
{questions}
"""
print()




In [2]:
"""
You are a helpful assistant that generates multiple search queries based on a single query.
Generate multiple search queries related to: {query}
Output (4 queries)
"""
print()




### Step-back Prompting

In [23]:
step_back_template = """You are an AI assistant tasked with generating broader, more general queries to improve context retrieval in a RAG system.
Given the original query, generate a step-back query that is more general and can help retrieve relevant background information.

Original query: 
{original_query}

Step-back query:"""

In [24]:
step_back_prompt = PromptTemplate(
    template=step_back_template,
    input_variables=["original_query"],
)

In [25]:
step_back_chain = step_back_prompt | RunnableLambda(lambda x: x.text) | RunnableLambda(invoke_bedrock)

In [39]:
original_query = """During my recent exam, I accidentally clicked on a pop-up notification, and the activity log recorded that I was away from the exam page for about 20–30 seconds. However, after that click, I did not interact with anything outside the exam page. I simply did not click or type within the exam during that short period.

May I ask if this kind of inactivity (not interacting on the exam page after clicking an outside notification) is also recorded as being “out of the exam,” or does the log only reflect actual navigation away from the page?

Thank you for clarifying."""

In [40]:
step_back_query = step_back_chain.invoke({"original_query":original_query})

In [41]:
print(step_back_query)

The step-back query that can help retrieve relevant background information for the original query is:

"How do online exam systems monitor and record student activity during an exam?"

This query is more general and covers the broader context of how online exam systems track and log student interactions during an exam. It can help retrieve information about the typical functionality and behavior of such systems, which can then be used to better understand the specific scenario described in the original query.

The step-back query focuses on the general principles and mechanisms of online exam monitoring, rather than the specific details of the original situation. This broader perspective can provide the necessary context to better evaluate and interpret the original query, and potentially uncover any relevant policies, guidelines, or technical details that may apply to the described scenario.


### Sub-query Decomposition

In [42]:
subquery_decomposition_template = """You are an AI assistant tasked with breaking down complex queries into simpler sub-queries for a RAG system.
Given the original query, decompose it into 2-4 simpler sub-queries that, when answered together, would provide a comprehensive response to the original query.

Original query: 
{original_query}

Sub-queries:"""

In [43]:
subquery_decomposition_prompt = PromptTemplate(
    template=subquery_decomposition_template,
    input_variables=["original_query"],
)

In [44]:
subquery_decomposer_chain = subquery_decomposition_prompt | RunnableLambda(lambda x: x.text) | RunnableLambda(invoke_bedrock)

In [45]:
original_query = """During my recent exam, I accidentally clicked on a pop-up notification, and the activity log recorded that I was away from the exam page for about 20–30 seconds. However, after that click, I did not interact with anything outside the exam page. I simply did not click or type within the exam during that short period.

May I ask if this kind of inactivity (not interacting on the exam page after clicking an outside notification) is also recorded as being “out of the exam,” or does the log only reflect actual navigation away from the page?

Thank you for clarifying."""

In [46]:
sub_queries = subquery_decomposer_chain.invoke({"original_query":original_query})

In [47]:
print(sub_queries)

To provide a comprehensive response to the original query, we can break it down into the following sub-queries:

1. What is the purpose of the activity log in the exam system?
   - This sub-query aims to understand the purpose and functionality of the activity log, which is crucial for interpreting the recorded events.

2. How does the activity log typically record user interactions during the exam?
   - This sub-query focuses on understanding the specific mechanisms and criteria used by the activity log to record user actions and navigation within the exam.

3. Does the activity log distinguish between clicking on a notification and actively navigating away from the exam page?
   - This sub-query directly addresses the concern raised in the original query, seeking to understand if the activity log differentiates between these two scenarios.

4. What are the potential implications or consequences of the recorded 20-30 seconds of inactivity on the exam page?
   - This sub-query explores