# Fusion

The following is an example of RAG-Fusion based of the following github repo [RAG-Fusion by Raudaschl](https://github.com/Raudaschl/rag-fusion) and this blog from LlamaIndex: [Building an Advanced Fusion Retriever from Scratch](https://docs.llamaindex.ai/en/stable/examples/low_level/fusion_retriever/?h=fusion)

The example performs the following steps:

- Query expansion
- Vector Search Using Multiple Retrievers
- Fusion of the results using *Reciprocal Rank Fusion*

### Setup

- Import the necessary libraries
- Load environment variables
- Fetch knowledge base from Malazan Wiki
- Create `VectorStoreIndex` using the documents

In [None]:
%pip install llama-index-retrievers-bm25

In [1]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio
nest_asyncio.apply()

In [2]:
import os
from dotenv import load_dotenv
from util.helpers import get_malazan_pages, generate_vector_index, create_and_save_md_files
from llama_index.retrievers.bm25 import BM25Retriever

Add the following to a `.env` file in the root of the project if not already there.

```
OPENAI_API_KEY=<YOUR_KEY_HERE>
```

In [3]:
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [4]:
pages = get_malazan_pages(articles=["Anomander Rake", "Tayschrenn", "Kurald Galain", "Warrens", "Tattersail", "Whiskeyjack", "Kruppe"])
create_and_save_md_files(pages)

In [5]:
index = generate_vector_index()

In [6]:
query = "What are the titles of Anomander Rake?"

In [None]:
query = "What type of warren does Tayshrenn use?"

### Query Expansion

Query expansion is part of the **Pre-retrieval** phase of Advanced RAG.

The idea behind Query Expansion is to generate multiple queries from the original query and then use these queries to search for relevant documents. The idea is that by using multiple queries, we can capture more information about the user's intent and hence retrieve more relevant documents.

Two techniques to generate multiple queries are:

- **Multi-Query**
    - Generate multiple questions from the original query using an LLM. Search the vector database using these questions and then fuse the results.
- **Generated Answer**
    - Generate a hypothetical answer from the original query using an LLM without context. Search the vector database using both the answer and the original query and then fuse the results.
    - The idea behind using answers is that the documents that are similar to the answer are likely to be relevant to the query.
    - The answer doesn't necessarily have to be a correct answer, it can be a random sentence generated by the LLM.

We use "Multiple Questions" in this example.

In [None]:
from llama_index.llms.openai import OpenAI as LlamaOpenAI
from llama_index.core import PromptTemplate

llm = LlamaOpenAI(api_key=OPENAI_API_KEY, model="gpt-3.5-turbo", temperature=0.1)

generate_queries_prompt = PromptTemplate(
    """You are a helpful assistant that generates multiple search queries based on a single input query. 
Generate {num_queries} search queries, one on each line, related to the following input query:
Query: {query}
Queries
"""
)


def generate_queries(llm, query_str: str, num_queries: int = 4):
    fmt_prompt = generate_queries_prompt.format(
        num_queries=num_queries - 1, query=query_str
    )
    response = llm.complete(fmt_prompt)
    queries = [query] + response.text.split("\n")
    return queries

In [None]:
queries = generate_queries(llm, query, num_queries=4)
queries

### Run queries against multiple retrievers

Now we run the generated queries against two different retrievers and get the results.

- Default Vector Search Retriever with cosine similarity
- BM25 Retriever using Okabi B25 implementation

#### BM25 Retriever

The second retriever is a BM25 retriever. BM25 is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query. It is based on the probabilistic information retrieval model. The BM25 function is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of the inter-relationship between the query terms within a document (e.g., their relative proximity).

In [None]:
## vector retriever
vector_retriever = index.as_retriever(similarity_top_k=2)

In [None]:
bm25_retriever = BM25Retriever.from_defaults(
    docstore=index.docstore, similarity_top_k=2
)

In [None]:
def run_queries(queries, retrievers):
    """Run queries against retrievers."""

    results_dict = {}
    for query in queries:
        for i, retriever in enumerate(retrievers):
            query_result = retriever.retrieve(query)            
            results_dict[(query, i)] = query_result

    return results_dict

In [None]:
results_dict = run_queries(queries, [vector_retriever, bm25_retriever])
results_dict

### Fuse the results using Reciprocal Rank Fusion

Reciprocal Rank Fusion is a simple fusion method that combines the results of multiple retrievers based on the reciprocal rank of the documents. The idea is to give more weight to the documents that appear higher in the ranking of the individual retrievers.

Reciprocal Rank Fusion works by calculating the reciprocal rank of each document in the ranking of each retriever and then summing the reciprocal ranks for each document. The documents with higher cumulative reciprocal ranks are considered more relevant and are ranked higher in the final fusion result.

Reciprocal Rank is calculated as:
`1/(rank+k)`
where `k` is a constant that can be used to control the weight of the reciprocal rank. A higher value of `k` gives more weight to the reciprocal rank, while a lower value gives less weight.

In [None]:
from typing import List
from llama_index.core.schema import NodeWithScore


def fuse_results(results_dict, similarity_top_k: int = 2) -> List[NodeWithScore]:
    """Fuse results."""
    k = 60.0  # `k` is a parameter used to control the impact of outlier rankings.
    fused_scores = {}
    text_to_node = {}

    # compute reciprocal rank scores
    for nodes_with_scores in results_dict.values():
        for rank, node_with_score in enumerate(
            sorted(
                nodes_with_scores, key=lambda x: x.score or 0.0, reverse=True
            )
        ):
            text = node_with_score.node.get_content()
            text_to_node[text] = node_with_score
            if text not in fused_scores:
                fused_scores[text] = 0.0
            fused_scores[text] += 1.0 / (rank + k)

    # sort results
    reranked_results = dict(
        sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    )

    # adjust node scores
    reranked_nodes: List[NodeWithScore] = []
    for text, score in reranked_results.items():
        reranked_nodes.append(text_to_node[text])
        reranked_nodes[-1].score = score

    return reranked_nodes[:similarity_top_k]

### Fusion Retriever 

Finally we create a FusionRetriever using the functions we've defined above and run a query against it.

In [None]:
from typing import List
from llama_index.core import QueryBundle
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore


class FusionRetriever(BaseRetriever):
    """Ensemble retriever with fusion."""

    def __init__(
        self,
        llm,
        retrievers: List[BaseRetriever],
        similarity_top_k: int = 2,
    ) -> None:
        """Init params."""
        self._retrievers = retrievers
        self._similarity_top_k = similarity_top_k
        self._llm = llm
        super().__init__()
        
    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve."""
        queries = generate_queries(
            self._llm, query_bundle.query_str, num_queries=4
        )
        results = run_queries(queries, self._retrievers)
        print("Queries:")
        for query in queries:
            print(query)
        final_results = fuse_results(
            results, similarity_top_k=self._similarity_top_k
        )

        print("Final Results:")
        for node_with_score in final_results:
            print(node_with_score)

        return final_results

In [None]:
fusion_retriever = FusionRetriever(
    llm, [vector_retriever, bm25_retriever], similarity_top_k=2
)

Or we can use the `QueryFusionRetriever` from LlamaIndex which does that same behind the scenes.

In [None]:

from llama_index.core.retrievers import QueryFusionRetriever

fusion_retriever = QueryFusionRetriever(
    [vector_retriever, bm25_retriever],
    similarity_top_k=2,
    num_queries=4,  # set this to 1 to disable query generation
    mode="reciprocal_rerank",
    use_async=True,
    verbose=True,
    # query_gen_prompt="...",  # we could override the query generation prompt here
)

Create `RetrieverQueryEngine` and run a query against it.

In [None]:
from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine(fusion_retriever)

In [None]:
query_engine.query(query).response