# Chapter 3: RAG Part II: Chatting with your Data
## Query transformation

One of the major problems with a basic RAG system is that it relies too heavily on the quality of a user’s query to generate an accurate output. In a production setting, a user is likely to construct their query in an incomplete, ambiguous, or poorly worded manner that leads to model hallucination.

_Query transformation_ is a subset of strategies designed to modify the user’s input to
answer the first RAG problem question: _How do we handle the variability in the
quality of a user’s input?_

### Multi-query Retrieval

A user’s single query can be insufficient to capture the full scope of information required to answer the query comprehensively. The multi-query retrieval strategy resolves this problem by instructing an LLM to generate multiple queries based on a user’s initial query, executing a parallel retrieval of each query from the data source and then inserting the retrieved results as prompt context to generate a final model output.

This strategy is particularly useful for use cases where a single question may rely on multiple perspectives to provide a comprehensive answer. Here’s a code example of multi-query retrieval in action:

1. Setup vector store

**NOTE**: Do not forget to launch a new pgvector docker container before using this notebook. execute ```docker compose up -d``` in the terminal.


In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_postgres.vectorstores import PGVector
from dotenv import load_dotenv
import os

load_dotenv()

# load the document, split it into chunks
raw_documents = TextLoader("./rime.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(raw_documents)

# define embedding model
hf_embedding = HuggingFaceEmbeddings(
    model="sentence-transformers/all-mpnet-base-v2", # use this model to perform the embedding
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": False},
)

# vector store credentials
connection_credentials = f"postgresql+psycopg://{os.getenv('POSTGRES_USER')}:{os.getenv('POSTGRES_PASSWORD')}@localhost:8888/{os.getenv('POSTGRES_DB')}"

# embed each chunk and insert it into the vector store
db = PGVector.from_documents(documents=documents, embedding=hf_embedding, connection=connection_credentials)

2. Setup retriever and llm

In [2]:
from langchain_deepseek import ChatDeepSeek
from langchain_core.prompts import ChatPromptTemplate

retriever = db.as_retriever(search_kwargs={"k": 2})

prompt = ChatPromptTemplate.from_template(
    template=
    """
    Answer the question based only on the following context:
    {context}

    Question: {question}
    """
)
llm = ChatDeepSeek(model="deepseek-chat", temperature=0.0)

3. Setup multi-query runnable

In [3]:
from langchain.prompts import ChatPromptTemplate
from langchain_core.messages import BaseMessage

perspectives_prompt = ChatPromptTemplate.from_template("""You are an AI language model assistant. Your task is to generate five different versions of the given user question, to retrieve relevant documents for a vector database. By generating multiple perspectives on the user question, your goal is to help the user to overcome some of the limitations of the distance-based similarity search. Provide these alternative questions separated by newlines. Original question: {question}""")

def parse_queries_output(message: BaseMessage) -> list[str]:
    return message.content.split("\n")

query_gen = perspectives_prompt | llm | parse_queries_output

Next we take the list of generated queries, retrieve the most relevant docs for each of them in parallel, and then combine to get the unique union of all the retrieved relevant documents:

In [4]:
from langchain_core.documents import Document

def get_unique_union(document_list: list[list[Document]]) -> list[Document]:
    # flatten list of lists, and dedupe them
    deduped_docs = {
        doc.page_content: doc for sublist in document_list for doc in sublist
    }
    # return a flat list of unique docs
    return list(deduped_docs.values())

retrieval_chain = query_gen | retriever.batch | get_unique_union

Because we’re retrieving documents from the same retriever with multiple (related) queries, it’s likely at least some of them are repeated. Before using them as context to answer the question, we need to deduplicate them _(dedupe)_, to end up with a single instance of each.

Notice our use as well of ```.batch```, which runs all generated queries in parallel and returns a list of the results—in this case, a list of lists of ```Documents```, which we then flatten and dedupe as described earlier.

The final step is to construct a prompt, including the user’s question and combined retrieved relevant documents, and a model interface to generate the prediction:

In [5]:
from langchain_core.runnables import chain
from typing import Any

@chain
def multi_query_qa(input: str) -> dict[str, Any]:
    # fetch relevant documents
    docs = retrieval_chain.invoke(input=input)
    formatted_prompt = prompt.invoke(input={"context": docs, "question": input})
    # generate answer
    answer = llm.invoke(input=formatted_prompt)

    return {"answer": answer, "docs": docs, "question": input}

4. Run the model

In [6]:
response = multi_query_qa.invoke(input="what are the main events described in the story of the ancyent marinere?")

print(f"question: {response['question']}\n\nanswer: {response['answer'].content}\n\ndocs: {response['docs']}")

question: what are the main events described in the story of the ancyent marinere?

answer: Based solely on the provided context, the main events described are:

1.  **The Mariner detains a Wedding-Guest:** The ancient mariner stops a guest on his way to a wedding feast. He holds the guest with his "glittering eye," compelling him to listen to his story against his will.

2.  **The Ship's Journey Begins:** The mariner's ship leaves the harbor cheerfully, sailing past the church, hill, and lighthouse. The sun rises from the sea on the left and sets on the right, indicating a southward journey.

3.  **Supernatural Reanimation of the Dead:** A strong wind roars and then suddenly drops. Under the lightning and moon, the dead sailors groan, rise up, and begin to work the ropes of the ship without speaking or moving their eyes. They are described as a "ghastly crew" working like "lifeless tools."

4.  **The Mariner's Terror:** The mariner is forced to work alongside the reanimated corpse of 

**NOTE:** Do not forget to remove the pgvector container when done using this notebook. Execute ```docker compose down --volumes``` in the terminal.