### **Implementing Retrieval Ranking in Your RAG System (LangChain + ChromaDB)**  
To improve the retrieval quality in your **RAG pipeline**, you can implement **retrieval ranking** before passing the retrieved documents to the LLM. This ensures the **most relevant** documents are prioritized.

---

## **🔹 Why is Retrieval Ranking Important?**
When using a **vector database like ChromaDB**, similarity search retrieves the `k` closest documents. However, these documents:
1. **May not always be in the best order** based on query relevance.
2. **Might contain redundant or less useful information**.
3. **Need re-ranking to prioritize more relevant content**.

---

## **🔹 Methods for Retrieval Ranking**
There are multiple ways to rank retrieved documents:
1. **Re-rank using an LLM** (best for complex queries).
2. **Use BM25 (Lexical Ranking)** alongside embeddings.
3. **Apply a trained re-ranking model** (e.g., `ColBERT`, `BGE-reranker`).
4. **Score documents using a similarity function**.

---

## **✅ Approach 1: LLM-Based Re-Ranking (Recommended)**
We can use an LLM to **re-rank** retrieved documents based on how well they match the query.

### **🔹 Steps:**
1. Retrieve `k` documents from ChromaDB.
2. Ask the LLM to rank them by relevance.
3. Use the **top `n` ranked documents** in the final response generation.

### **🔹 Implementation**
Modify your `query_index` function to **re-rank documents using an LLM**:

```python
from langchain.prompts import PromptTemplate
from langchain.schema import AIMessage
from langchain.schema.runnable import RunnablePassthrough
import datetime
import logging

logger = logging.getLogger("logger")

def query_index(vectordb, query, chain_type="stuff", k=25, model_name="gemini-2.0-flash", 
                date_filter=None, rerank_top_n=10):
    """
    Queries the vectorstore with retrieval ranking.
    
    Args:
        vectordb: The vector database.
        query (str): The query string.
        chain_type (str): The chain type.
        k (int): The number of documents to retrieve.
        model_name (str): The name of the language model.
        date_filter (str, optional): A date string (YYYY-MM-DD) to filter documents by. Defaults to None.
        rerank_top_n (int): Number of top documents to keep after re-ranking.
    
    Returns:
        str: The answer from the language model.
    """
    llm = ChatGoogleGenerativeAI(model=model_name, temperature=0.3)  

    # Create the retriever with filter if date is provided
    if date_filter:
        try:
            filter_date = datetime.datetime.strptime(date_filter, '%Y-%m-%d')
            formatted_date = filter_date.strftime('%Y-%m-%d')
            logger.info(f"Date filter with query: {formatted_date}")
        except ValueError:
            raise ValueError("Invalid date format. Please use YYYY-MM-DD.")

        filter_criteria = {"date": { "$eq": formatted_date}}
        retriever = vectordb.as_retriever(search_type="similarity", search_kwargs={"k": k, "filter": filter_criteria})
    else:
        retriever = vectordb.as_retriever(search_kwargs={"k": k})

    # Retrieve initial `k` documents
    retrieved_docs = retriever.invoke(query)
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])  
    logger.info(f"Retrieved {len(retrieved_docs)} documents for query: {query}")

    # Define re-ranking prompt
    rerank_prompt = PromptTemplate(
        input_variables=["query", "documents"],
        template=(
            "You are an AI assistant ranking documents based on their relevance to the query.\n\n"
            "Query: {query}\n\n"
            "Documents:\n{documents}\n\n"
            "Rank the documents from most relevant to least relevant and return the top {rerank_top_n}."
        ),
    )

    # Rank retrieved documents using the LLM
    rank_chain = rerank_prompt | llm | RunnablePassthrough()
    ranked_response = rank_chain.invoke({"query": query, "documents": context, "rerank_top_n": rerank_top_n})

    # Extract ranked top `n` documents
    ranked_docs = ranked_response.content.split("\n")[:rerank_top_n]
    final_context = "\n\n".join(ranked_docs)

    # Define final QA prompt
    qa_prompt = PromptTemplate(
        input_variables=["context", "question"],
        template=(
            "You are an AI assistant retrieving factual and structured information.\n"
            "Use the following retrieved documents to answer the question accurately.\n\n"
            "Context: {context}\n\n"
            "Question: {question}"
        ),
    )

    # Use RunnableSequence instead of deprecated LLMChain
    final_chain = qa_prompt | llm | RunnablePassthrough()
    response = final_chain.invoke({"context": final_context, "question": query})

    # Ensure response is extracted correctly
    if isinstance(response, AIMessage):
        return response.content  
    else:
        return str(response)  
```

---

### **🔹 What’s New in This Implementation?**
1. **Retrieve more documents (`k=25`) from ChromaDB.**
2. **Re-rank the documents using an LLM** to select the `rerank_top_n=10` most relevant.
3. **Use only top-ranked documents for the final response** to the user.

---

## **✅ Approach 2: Use BM25 for Hybrid Ranking**
If you want **lexical matching + embeddings**, use **BM25 + ChromaDB embeddings**:

### **Steps**
1. Retrieve top `k` documents using **ChromaDB embeddings**.
2. Use **BM25 (bag-of-words ranking)** on these documents.
3. Take **top-ranked** documents and pass them to the LLM.

### **Implementation**
- Install `rank_bm25`:
  ```bash
  pip install rank-bm25
  ```
- Add BM25 re-ranking to your pipeline:
  ```python
  from rank_bm25 import BM25Okapi
  from langchain.schema import Document

  def bm25_rerank(query, retrieved_docs, top_n=10):
      """Re-rank retrieved documents using BM25."""
      tokenized_docs = [doc.page_content.split() for doc in retrieved_docs]
      bm25 = BM25Okapi(tokenized_docs)
      scores = bm25.get_scores(query.split())

      # Sort documents based on BM25 score
      ranked_docs = [retrieved_docs[i] for i in sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)]
      return ranked_docs[:top_n]
  ```

Then, **modify `query_index` to use BM25 ranking** before passing docs to the LLM.

---

## **🔹 Which Approach Should You Use?**
| **Method**       | **Pros** | **Cons** |
|------------------|---------|----------|
| **LLM-based Re-Ranking** (Approach 1) | Best for complex queries, uses language understanding | Slower, needs API calls |
| **BM25 + Embeddings (Approach 2)** | Improves lexical matching, fast & efficient | Doesn't handle semantics well |
| **ColBERT / BGE-reranker** | State-of-the-art accuracy, deep semantic ranking | Requires a fine-tuned model |

For **best performance**, **combine both methods**:  
1. Use **BM25 ranking first** to boost keyword relevance.  
2. Then **use LLM-based ranking** on the BM25-ranked results.  

---

## **✅ Final Takeaways**
- **Use LLM re-ranking** if you want **best semantic ranking**.
- **Use BM25 + embeddings** for **fast hybrid search**.
- **Try ColBERT/BGE-reranker** if you need **state-of-the-art retrieval ranking**.

Would you like help setting up **ColBERT/BGE-reranker** in LangChain? 🚀