# Query Translation: RAG-Fusion

![rag-fusion](../images/images-rag-fusion.png)

**RAG-Fusion** is a technique that focuses on improving the quality of retrieved information by combining the results of multiple retrieval attempts into a single, optimized list. Let’s break it down step by step:

### The Problem RAG-Fusion Solves

When we ask a RAG system a question, it retrieves documents based on the query to help generate a response. Sometimes, one retrieval attempt isn’t enough because:

- Different ways of phrasing the query might retrieve different relevant documents.
- A single query might miss important information.

RAG-Fusion solves this by using **multiple queries** and then combining their results intelligently.

### How RAG-Fusion Works

1. **Generate Multiple Queries**:

   - The system creates several versions of the original query (e.g., rephrased, summarized, or expanded).
   - Example:
     - Original Query: "How to secure APIs?"
     - Alternate Queries:
       - "API security guidelines"
       - "Best practices for API authentication"
       - "Protecting APIs from attacks"

2. **Retrieve Documents for Each Query**:

   - For each query version, the system retrieves relevant documents from the knowledge base.

3. **Fuse the Results**:

   - The retrieved documents are combined into a single list.
   - Fusion gives priority to documents that appear in multiple queries or rank highly in individual retrievals.
   - This step eliminates duplicates and ensures the final list is more relevant and diverse.

4. **Feed into the LLM**:

   - The fused list of documents is passed to the LLM, along with the user’s original query, for generating a high-quality response.


### Why RAG-Fusion Works

- **Increases Coverage**: By exploring different ways to ask the question, it captures more relevant information.
- **Improves Quality**: Fusion ensures that the best and most relevant documents are included.
- **Reduces Bias**: It avoids over-reliance on a single phrasing of the query.

### Simple Example

#### Query:

*"What are the best ways to secure an API?"*

#### Without RAG-Fusion:

- The system retrieves only the top documents based on the exact query. It might miss some important resources.

#### With RAG-Fusion:

1. Multiple Queries:
   - "Best API security practices"
   - "API authentication methods"
   - "Securing REST APIs"

2. Retrieved Documents:
   - From Query 1: Document A, Document B.
   - From Query 2: Document B, Document C.
   - From Query 3: Document D, Document A.

3. Fused List:
   - Prioritize Document A and Document B (they appear multiple times).
   - Include Document C and Document D for broader coverage.

4. Final Output:
   - A more accurate and comprehensive response because the system considered diverse perspectives.

### Key Benefits of RAG-Fusion

- **More Reliable Answers**: By merging results, it ensures no crucial information is overlooked.
- **Flexible Querying**: Works well even if the user’s original query is vague or incomplete.
- **Enhanced Relevance**: The fused list prioritizes the best sources for generating a response.

In short, RAG-Fusion is like asking the same question in different ways to gather more complete and accurate information, then using the best of those results to provide a great answer!

![rag fusion](../images/rag_fusion.png)

## Setup

In [10]:
%run "../Z - Common/setup.ipynb"

In [11]:
docs = load_sample_data()
split_docs = split_sample_data(docs)
retriever = seed_sample_data(split_docs)

Let's start by defining the prompt and chain to retrieve related documents:

In [12]:
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""

prompt_generate_queries = ChatPromptTemplate.from_template(template)

chain_generate_queries = (
    prompt_generate_queries 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n\n"))
)

RAG-Fusion applies a reciprocal rank fusion function to rerank the results. Let's define the function, and build a chain that calls the function after the different versions of the query have been processed.

In [13]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results


In [14]:
chain_retrieval = (
    chain_generate_queries 
    | retriever.map() 
    | reciprocal_rank_fusion)

In [20]:
question = "What are the main components of an LLM-powered autonomous agent system?"

docs = chain_retrieval.invoke({"question": question})
print(len(docs))
docs

4


[(Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='}\n]\nChallenges#\nAfter going through key ideas and demos of building LLM-centered agents, I start to see a couple common limitations:'),
  0.05),
 (Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomp

We can now build the final chain that after retrieving the results for the different variations of the query and re-ranking, it then uses those documents as context to answer the original question.

In [21]:
from operator import itemgetter
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

chain_rag = (
    {"context": chain_retrieval, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)




In [22]:
result = chain_rag.invoke({"question":question})
result

'Based on the context, a LLM-powered autonomous agent system consists of three main components: (1) LLM as the core controller or "brain," (2) Planning capabilities, which include subgoal decomposition and reflection/refinement mechanisms, and (3) Memory systems. The LLM functions as the primary decision-maker while planning helps break down complex tasks and memory enables the system to retain and learn from past experiences.'

Inspect the [Langchain tracing](https://smith.langchain.com/) to understand the flow better.

Save the results so we can compare later.

In [23]:
write_results("rag-fusion.txt", result)