# Chapter 3: RAG Part II: Chatting with your Data
## Query transformation

One of the major problems with a basic RAG system is that it relies too heavily on the quality of a user’s query to generate an accurate output. In a production setting, a user is likely to construct their query in an incomplete, ambiguous, or poorly worded manner that leads to model hallucination.

_Query transformation_ is a subset of strategies designed to modify the user’s input to
answer the first RAG problem question: _How do we handle the variability in the
quality of a user’s input?_

### RAG_Fusion

The RAG-Fusion strategy shares similarities with the multi-query retrieval strategy, except we will apply a final reranking step to all the retrieved documents. This reranking step makes use of the _reciprocal rank fusion_ (RRF) algorithm, which involves combining the ranks of different search results to produce a single, unified ranking. By combining ranks from different queries, we pull the most relevant documents to the top of the final list. RRF is well-suited for combining results from
queries that might have different scales or distributions of scores.

1. Setup vector store

**NOTE**: Do not forget to launch a new pgvector docker container before using this notebook. Execute ```docker compose up -d``` in the terminal.

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_postgres.vectorstores import PGVector
from dotenv import load_dotenv
import os

load_dotenv()

# load the document, split it into chunks
raw_documents = TextLoader("./rime.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(raw_documents)

# define embedding model
hf_embedding = HuggingFaceEmbeddings(
    model="sentence-transformers/all-mpnet-base-v2", # use this model to perform the embedding
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": False},
)

# vector store credentials
connection_credentials = f"postgresql+psycopg://{os.getenv('POSTGRES_USER')}:{os.getenv('POSTGRES_PASSWORD')}@localhost:8888/{os.getenv('POSTGRES_DB')}"

# embed each chunk and insert it into the vector store
db = PGVector.from_documents(documents=documents, embedding=hf_embedding, connection=connection_credentials)

2. Setup retriever and llm

In [14]:
from langchain_deepseek import ChatDeepSeek
from langchain_core.prompts import ChatPromptTemplate

retriever = db.as_retriever(search_kwargs={"k": 4})

prompt = ChatPromptTemplate.from_template(
    template=
    """
    Answer the question based only on the following context:
    {context}

    Question: {question}
    """
)
llm = ChatDeepSeek(model="deepseek-chat", temperature=0.0)

3. Setup RAG-Fusion

Let’s demonstrate RAG-Fusion in code. First, we craft a prompt similar to the multi-query retrieval strategy to generate a list of queries based on the user query:

In [15]:
from langchain.prompts import ChatPromptTemplate
from langchain_core.messages import BaseMessage

prompt_rag_fusion = ChatPromptTemplate.from_template("""You are a helpful assistant that generates multiple search queries based on a single input query. Generate 4 search queries related to: {question}. Put each query in a new line.""")

def parse_queries_output(message: BaseMessage) -> list[str]:
    return message.content.split("\n")

query_generator = prompt_rag_fusion | llm | parse_queries_output

Once we’ve generated our queries, we fetch relevant documents for each query and pass them into a function to _rerank_ (that is, reorder according to relevancy) the final list of relevant documents.

The function ```reciprocal_rank_fusion``` takes a list of the search results of each query,
so a list of lists of ```Documents```, where each inner list of ```Documents``` is sorted by their relevance to that query. The RRF algorithm then calculates a new score for each document based on its ranks (or positions) in the different lists and sorts them to create a final reranked list.

After calculating the fused scores, the function sorts the documents in descending order of these scores to get the final reranked list, which is then returned:

In [16]:
from langchain_core.documents import Document

def reciprocal_rank_fusion(results: list[list[Document]], k:int=60) -> list[Document]:
    """
    Reciprocal Rank Fusion (RRF) on multiple lists of ranked documents and an optional parameter k used in RRF formula.
    """

    # initialize a dict to hold fused scores for each document
    # documents will be keyed by their contents to ensure uniqueness
    fused_scores = {}
    documents = {}

    # iterate each list of ranked documents
    for doc_list in results:
        # iterate each document in the list
        for rank, doc in enumerate(doc_list):
            # use the document content as the key to ensure uniqueness
            doc_str = doc.page_content
            # if the document hasn't been seen yet, initialize its score to 0 and save it
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
                documents[doc_str] = doc
            # update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_docs_strs = sorted(fused_scores.keys(), key=lambda d: fused_scores[d], reverse=True)

    # return the corresponding doc for each doc_str
    return [documents[doc_str] for doc_str in reranked_docs_strs]

rf_chain = query_generator | retriever.batch | reciprocal_rank_fusion

Notice that the function also takes a ```k``` parameter, which determines how much influence documents in each query’s result sets have over the final list of documents. A higher value indicates that lower-ranked documents have more influence.

Finally, we combine our new retrieval chain (now using RRF) with the full chain we’ve seen before:

In [17]:
from langchain_core.runnables import chain
from typing import Any

@chain
def rf_qa(input: str) -> dict[str, Any]:
    # fetch relevant documents
    docs = rf_chain.invoke(input=input)
    # format prompt
    formatted_prompt = prompt.invoke(input={"context": docs, "question": input})
    # generate answer
    answer = llm.invoke(input=formatted_prompt)

    return {"answer": answer, "question": input, "docs": docs}


4. Run the model

In [18]:
response = rf_qa.invoke(input="what are the main events described in the story of the ancyent marinere?")
print(f"question: {response['question']}\n\nanswer: {response['answer'].content}\n\ndocs: {response['docs']}")

question: what are the main events described in the story of the ancyent marinere?

answer: Based solely on the provided context, the main events described are:

1.  **The Mariner stops a wedding guest:** The ancient Mariner detains a guest on his way to a wedding feast and begins to tell his story.
2.  **The ship's journey south:** The ship is driven by storms to a cold, icy country near the South Pole, surrounded by mist, snow, and emerald-green ice.
3.  **The arrival of the Albatross:** An Albatross appears through the fog, is hailed as a good omen, and is fed by the crew. The ice splits, and a south wind springs up, with the bird following the ship.
4.  **The Mariner shoots the Albatross:** For an unexplained reason, the Mariner shoots the Albatross with his crossbow.
5.  **Consequences of the killing:** The good south wind continues to blow, but the Albatross no longer follows the ship. The crew condemns the Mariner for killing the bird that brought the favorable wind.
6.  **The d

RAG-Fusion’s strength lies in its ability to capture the user’s intended expression, navigate complex queries, and broaden the scope of retrieved documents, enabling serendipitous discovery.

**NOTE:** Do not forget to remove the pgvector container when done using this notebook. Execute ```docker compose down --volumes``` in the terminal.