# 4. Query Rewriting en sistemas RAG

En un sistema RAG, la calidad de la recuperación depende no solo del índice vectorial y de los embeddings, sino también de **cómo está formulada la consulta**. Consultas ambiguas, largas o con vocabulario poco alineado al corpus pueden provocar recuperación de fragmentos irrelevantes.

El **query rewriting** reformula la consulta original *antes* de la recuperación, usando un modelo de lenguaje para generar una o varias versiones optimizadas para búsqueda semántica. Se aplica justo antes del retriever, sin modificar el índice ni los embeddings.

En este notebook se implementan: **zero-shot**, **few-shot**, **sub-queries**, **step-back** y **HyDE**.

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import FewShotChatMessagePromptTemplate
from dotenv import load_dotenv
import os

load_dotenv()
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")

embeddings_model = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
language_model = ChatGoogleGenerativeAI(model="gemini-2.5-flash")

## Zero-shot Query Rewriting

El modelo recibe la consulta y genera **múltiples reformulaciones** sin ejemplos previos, actuando como generador de sinónimos y variaciones que preservan la intención.

In [None]:
system_rewrite_prompt = """You are a helpful assistant that generates multiple search queries based on a single input query.

Perform query expansion. If there are multiple common ways of phrasing a user query
or common synonyms for key words in the query, make sure to return multiple versions
of the query with the different phrasings.

If there are acronyms or words you are not familiar with, do not try to rephrase them.

Return exactly 3 different rewritten versions of the query.
Do not include explanations, commentary, or any other text besides the numbered rewritten queries."""

zero_shot_prompt = ChatPromptTemplate.from_messages([
    ("system", system_rewrite_prompt),
    ("human", "{question}")
])

chain = zero_shot_prompt | language_model
response = chain.invoke({"question": "Which food items does this recipe need?"})
response

## Few-shot Query Rewriting

Se incorporan **ejemplos** (pregunta → reformulación) que guían al modelo para un estilo más consistente y conciso.

In [None]:
few_shot_examples = [
    {
        "question": "How tall is the Eiffel Tower? It looked so high when I was there last year",
        "answer": "What is the height of the Eiffel Tower?"
    },
    {
        "question": "1 oz is 28 grams, how many cm is 1 inch?",
        "answer": "Convert 1 inch to cm."
    },
    {
        "question": "What's the main point of the article? What did the author try to convey?",
        "answer": "What is the main key point of this article?"
    }
]

example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{question}"),
    ("ai", "{answer}"),
])

few_shots = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=few_shot_examples
)

few_shot_prompt = ChatPromptTemplate.from_messages([
    ("system", system_rewrite_prompt),
    few_shots,
    ("human", "{question}"),
])

chain = few_shot_prompt | language_model
response = chain.invoke({"question": "Which food items does this recipe need?"})
response

## Sub-queries

Se **descompone** una consulta compleja en varias preguntas más simples. Útil cuando la pregunta incluye múltiples aspectos o condiciones.

In [None]:
system_decompose_prompt = """You are a helpful assistant that generates search queries based on a single input query.

Perform query decomposition. Given a user query, break it down into distinct sub-queries that
must be answered in order to answer the original query.

If there are acronyms or words you are not familiar with, do not try to rephrase them.

Return only the decomposed sub-queries.
Do not include explanations, commentary, or any other text besides the numbered sub-queries."""

subqueries_prompt = ChatPromptTemplate.from_messages([
    ("system", system_decompose_prompt),
    ("human", "{question}"),
])

chain = subqueries_prompt | language_model
response = chain.invoke({
    "question": "Which is the most popular programming language for machine learning and is it the most popular programming language overall?"
})
response

## Step-back

Se reformula la pregunta **específica** en una versión más **general** que capture los principios necesarios para responderla. Útil cuando la consulta es muy restrictiva.

In [None]:
system_step_back_prompt = """You are an expert at taking a specific query and extracting a more generic query that captures
the underlying principles needed to answer the specific query.

Given a specific user query, write a more generic query that must be answered in order to answer the specific query.

If you don't recognize a word or acronym, do not try to rewrite it.

Write a concise generic query.
Return only the rewritten generic query.
Do not include explanations, commentary, or any other text."""

step_back_prompt = ChatPromptTemplate.from_messages([
    ("system", system_step_back_prompt),
    ("human", "{question}"),
])

chain = step_back_prompt | language_model
response = chain.invoke({
    "question": "Which is the most popular programming language for machine learning?"
})
response

## HyDE (Hypothetical Document Embeddings)

Las preguntas y las respuestas no siempre son semánticamente similares. **HyDE** genera un **documento hipotético** (respuesta tentativa) y usa su embedding para recuperar, ya que respuestas similares suelen estar más cerca en el espacio vectorial que pregunta-respuesta.

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

actual_document = """
Berkson's paradox describes a form of selection bias in which observing a relationship between two variables within a selected
subpopulation can create a misleading statistical association that does not exist—or is reversed—in the overall population.

This paradox typically arises when inclusion in the observed sample depends on both variables being studied. Because the
selection criterion is influenced by the variables themselves, conditioning on that criterion distorts their apparent relationship.
"""

actual_document_embedding = embeddings_model.embed_documents([actual_document])

system_hyde_prompt = """You are an expert at using a query to generate a document useful for answering the query.

Given a query, generate a paragraph of text that answers the query.
"""

hyde_prompt = ChatPromptTemplate.from_messages([
    ("system", system_hyde_prompt),
    ("human", "{question}"),
])

chain = hyde_prompt | language_model
hypothetical_document = chain.invoke({"question": "What does Berkson's paradox consist on?"})
hypothetical_document

In [None]:
question_embedding = embeddings_model.embed_documents(["What does Berkson's paradox consist on?"])
hypothetical_document_embedding = embeddings_model.embed_documents([hypothetical_document.content])

print(f"Similarity without HyDE: {cosine_similarity(question_embedding, actual_document_embedding)}")
print(f"Similarity with HyDE: {cosine_similarity(hypothetical_document_embedding, actual_document_embedding)}")