# Query Transformations for Improved Retrieval in RAG Systems
1. Query Rewriting
2. Step-back Prompting
3. Sub-query Decomposition

In [None]:
!pip install langchain langchain-openai python-dotenv langchain-nvidia-ai-endpoints

In [2]:
import getpass
import os


def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("NVIDIA_API_KEY")

NVIDIA_API_KEY: ··········


In [4]:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain.prompts import PromptTemplate


llm = ChatNVIDIA(model="meta/llama-3.1-8b-instruct")

llm.invoke("what is capital of India").content

'The capital of India is **New Delhi**.'

## 1. Query rewriting
To make queries more specific and detailed, improving the likelihood of retrieving relevant information.

In [11]:
re_write_llm = ChatNVIDIA(model="meta/llama-3.1-8b-instruct",temperature=0,max_tokens=4000)

query_rewrite_template = """You are an AI assistant tasked with reformulating user queries to improve retrieval in a RAG system.
Given the original query, rewrite it to be more specific, detailed, and likely to retrieve relevant information.

Original query: {original_query}

Rewritten query:"""


query_rewrite_prompt = PromptTemplate(
    input_variables=["original_query"],
    template=query_rewrite_template,
)

query_rewrite = query_rewrite_prompt | re_write_llm

def rewrite_query(original_query):
  response = query_rewrite.invoke(original_query)
  return response.content


In [12]:
original_query = "What are the impacts of using AI on humans?"
rewritten_query = rewrite_query(original_query)
print("Original query:", original_query)
print("Rewritten query:", rewritten_query)

Original query: What are the impacts of using AI on humans?
Rewritten query: To improve retrieval in a RAG (Retrieval-Augmented Generation) system, I'll reformulate the original query to make it more specific, detailed, and likely to retrieve relevant information. Here's the rewritten query:

"What are the cognitive, emotional, and social effects of integrating artificial intelligence systems into human work environments, including changes in job satisfaction, productivity, and mental health, as well as potential benefits such as improved decision-making and enhanced collaboration?"

This rewritten query is more specific and detailed because it:

1. Specifies the context: "integrating artificial intelligence systems into human work environments"
2. Identifies the effects: "cognitive, emotional, and social"
3. Provides more specific outcomes: "changes in job satisfaction, productivity, and mental health"
4. Includes potential benefits: "improved decision-making and enhanced collaboratio

## 2. Step back Prompting
 To generate broader, more general queries that can help retrieve relevant background information.

In [13]:
step_back_llm = ChatNVIDIA(model="meta/llama-3.1-8b-instruct",temperature=0,max_tokens=4000)

# Create a prompt template for step-back prompting
step_back_template = """You are an AI assistant tasked with generating broader, more general queries to improve context retrieval in a RAG system.
Given the original query, generate a step-back query that is more general and can help retrieve relevant background information.

Original query: {original_query}

Step-back query:"""

step_back_prompt = PromptTemplate(
    input_variables=["original_query"],
    template=step_back_template,
)

step_back_chain = step_back_prompt | step_back_llm

def generate_step_back_query(original_query):
  response = step_back_chain.invoke(original_query)
  return response.content

In [14]:
original_query = "What are the impacts of using AI on humans?"
step_back_query = generate_step_back_query(original_query)
print("Original query:", original_query)
print("Step-back query:", step_back_query)

Original query: What are the impacts of using AI on humans?
Step-back query: To generate a step-back query that is more general and can help retrieve relevant background information, I would ask:

"What are the effects of technological advancements on human society?"

This step-back query is more general because it:

1. Removes the specific focus on AI, allowing the retrieval of information on various technological advancements.
2. Expands the scope to include the broader impact on human society, rather than just the individual.
3. Encourages the retrieval of background information on the historical, social, and cultural context of technological advancements.

This query can help retrieve relevant background information on topics such as:

* The history of technological advancements and their impact on society
* The social and cultural implications of technological change
* The effects of technological advancements on human relationships, work, and daily life

By asking this step-back 

## 3. Sub-query Decomposition
To break down complex queries into simpler sub-queries for more comprehensive information retrieval.

In [16]:
sub_query_llm = ChatNVIDIA(model="meta/llama-3.1-8b-instruct",temperature=0,max_tokens=4000)


# Create a prompt template for sub-query decomposition
subquery_decomposition_template = """You are an AI assistant tasked with breaking down complex queries into simpler sub-queries for a RAG system.
Given the original query, decompose it into 2-4 simpler sub-queries that, when answered together, would provide a comprehensive response to the original query.

Original query: {original_query}

example: What are the impacts of climate change on the environment?

Sub-queries:
1. What are the impacts of climate change on biodiversity?
2. How does climate change affect the oceans?
3. What are the effects of climate change on agriculture?
4. What are the impacts of climate change on human health?"""

subquery_decomposition_prompt = PromptTemplate(
    input_variables=["original_query"],
    template=subquery_decomposition_template,
)

subquery_decomposer_chain = subquery_decomposition_prompt | sub_query_llm

def decompose_query(original_query):
  response = subquery_decomposer_chain.invoke(original_query).content
  sub_queries = [q.strip() for q in response.split('\n') if q.strip() and not q.strip().startswith('Sub-queries:')]
  return sub_queries

In [17]:
original_query = "What are the impacts of using AI on humans?"
sub_queries = decompose_query(original_query)
print("Original query:", original_query)
print("Sub-queries:", sub_queries)

Original query: What are the impacts of using AI on humans?
Sub-queries: ["Here's the breakdown of the original query into 4 simpler sub-queries for a RAG (Red, Amber, Green) system:", '**Original Query:** What are the impacts of using AI on humans?', '**Sub-queries:**', '1. **What are the positive impacts of AI on human productivity and efficiency?**', '* RAG Status: Green (if AI has significantly improved productivity and efficiency)', '* RAG Status: Amber (if AI has had some positive impact, but with limitations or trade-offs)', '* RAG Status: Red (if AI has had no significant positive impact on human productivity and efficiency)', '2. **What are the potential risks of AI on human employment and job displacement?**', '* RAG Status: Red (if AI has led to significant job displacement and unemployment)', '* RAG Status: Amber (if AI has led to some job displacement, but with opportunities for upskilling and reskilling)', '* RAG Status: Green (if AI has not led to significant job displac