## Using MMR

Considering the the performance of naive baseline RAG and the baseline RAG, we can conclude that the RAG system is performing really well.

There is no need to perform **metadata filtering.**  
But we can do **one last thing** before finalizing, we can try **maximum marginal relevance search(mmr)** which is sort of weak re-ranker. We won't change anything except the search type. And then we will evaluate and see if the contextual relevancy goes up by considerable amount.

So we will use the same vector database that we used in baseline RAG, same chunk size, same overlap, same number of top-k documents, but mmr seach.

**Instead of deepeval we will custom evaluation**. Because deepval takes a lot of time. We have to find which retrieval technique is better, and for we can code use our custom rag evaluation pipeline.

### 1. RAG with MMR and Similarity Search

In [1]:
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

In [2]:
db_dir = 'chroma_dbs/baseline_db_v2'
embeddings = OllamaEmbeddings(model='nomic-embed-text:v1.5')

In [3]:
# Retriever for similarity search
def retrieve_docs_similarity(query, db_dir, embed_func=embeddings):
    """Retrieves chunks and combines them into single string!"""

    info_fetched = ""
    db = Chroma(
        persist_directory=db_dir, embedding_function=embed_func
    )
    # Maximum marginal relevance search (MMR)
    docs = db.similarity_search(query=query, k=3)
    for doc in docs:
        info_fetched += doc.page_content

    return info_fetched

In [4]:
model = ChatOllama(model='gemma3:4b', temperature=0.8)

In [5]:
# Function for augmenting and generating with similarity search
def augment_generate_similarity(query, model):
    context = retrieve_docs_similarity(query=query, db_dir=db_dir, embed_func=embeddings)
    prompt_t = ChatPromptTemplate.from_messages(
        messages=[
            ("system", "You are a helpful assistant who uses BNS(Bhartiya Nyaay Sanhita). Use the provided context to answer user query. BNS Context: {context}"),
            ("system", "You can reply within 150 words"),
            ("human", "{query}")
        ]
    )
    chain = prompt_t | model | StrOutputParser()
    res = chain.invoke({'query':query, 'context':context})
    return res

In [7]:
fetched_docs = retrieve_docs_similarity(
    db_dir=db_dir,
    embed_func=embeddings,
    query='What is considered counterfeiting of coins, Government stamps, or currency notes, and what is the punishment?'
)

len(fetched_docs)

1492

In [None]:
fetched_docs

'coin, Government stamps, currency -notes or bank - notes. 178. Whoever counterfeits, or knowingly performs any part of the process of counterfeiting, any coin, stamp issued by Government for the purpose of revenue, currency -note or bank -note, shall be punished with imprisonment for life, or with imprisonment of either description for a term which may extend to ten years, and shall also be liable to fine. Explanation.—For the purposes of this Chapter,— (1) the expression “bank -note” means aUsing as genuine, forged or counte rfeit coin, Government stamp, currency-notes or bank-notes. 179. Whoever imports or exports, or sells or delivers to, or buys or receives from, any other person, or otherwise traffics or uses as genuine, any forged or counterfeit coin, stamp, currency-note or bank-note, knowing or having reason to believe the same to be forged or counterfeit, shall be punished with imprisonment for life, or with imprisonment of either description for a term which may extend towit

In [None]:
print("\nGenerating...\n")
res = augment_generate_similarity(
    query='What is considered counterfeiting of coins, Government stamps, or currency notes, and what is the punishment?',
    model=model
)

print(res)


Generating...

According to the Bharatiya Nyay Sanhita (BNS), counterfeiting encompasses several actions related to coins, Government stamps, and currency notes (including bank notes).

**Counterfeiting Defined:**

*   **Coin:** Counterfeiting a coin means creating a fake coin or performing any part of the process to do so.
*   **Government Stamps/Currency Notes/Bank Notes:** This includes forging, creating fake versions of, or knowingly using any Government stamp, currency note, or bank note as if it were genuine.

**Punishment:**

The punishment varies depending on the offense:

*   **178 (Counterfeiting):** Imprisonment for life, or up to 10 years imprisonment and a fine.
*   **179 (Trading in Forged/Counterfeit Items):** Imprisonment for life, or up to 7 years imprisonment, a fine, or both.
*   **181 (Making/Mending Instruments):** Making or mending instruments for forging or counterfeiting carries a punishment, though the specific details aren't provided in the excerpt.

**Import

In [None]:
# Retriever for similarity search
def retrieve_docs_mmr(query, db_dir, embed_func=embeddings):
    """Retrieves chunks and combines them into single string!"""

    info_fetched = ""
    db = Chroma(
        persist_directory=db_dir, embedding_function=embed_func
    )
    # Maximum marginal relevance search (MMR)
    docs = db.max_marginal_relevance_search(query=query, lambda_mult=0.6, fetch_k=12, k=3)
    for doc in docs:
        info_fetched += doc.page_content

    return info_fetched

In [None]:
# Function for augmenting and generating with maximum relevance search
def augment_generate_mmr(query, model):
    context = retrieve_docs_mmr(query=query, db_dir=db_dir, embed_func=embeddings)
    prompt_t = ChatPromptTemplate.from_messages(
        messages=[
            ("system", "You are a helpful assistant who uses BNS(Bhartiya Nyaay Sanhita). Use the provided context to answer user query. BNS Context: {context}"),
            ("system", "You can reply within 150 words"),
            ("human", "{query}")
        ]
    )
    chain = prompt_t | model | StrOutputParser()
    res = chain.invoke({'query':query, 'context':context})
    return res

### 2. Using custom evaluation to find out better retrieval technique.

In [None]:
from custom_eval import RetrievalSearchComparision
from evaluation_dataset import questions

In [None]:
context_1 = []
context_2 = []
for i in range(len(questions)):
    cnt_1 = retrieve_docs_similarity(query=questions[i], db_dir=db_dir, embed_func=embeddings)
    cnt_2 = retrieve_docs_mmr(query=questions[i], db_dir=db_dir, embed_func=embeddings)
    context_1.append(cnt_1)
    context_2.append(cnt_2)

In [None]:
dataset = RetrievalSearchComparision(
    queries=questions,
    context_type_1=context_1,
    context_type_2=context_2
)

In [None]:
judge_llm = ChatOllama(model='mistral', temperature=0.0)
res = dataset.compare_contexts(
    model=judge_llm
)
res

Evaluating question number - 1
Evaluating question number - 2
Evaluating question number - 3
Evaluating question number - 4
Evaluating question number - 5
Evaluating question number - 6
Evaluating question number - 7
Evaluating question number - 8
Evaluating question number - 9
Evaluating question number - 10
Evaluating question number - 11
Evaluating question number - 12
Evaluating question number - 13
Evaluating question number - 14
Evaluating question number - 15
Evaluating question number - 16
Evaluating question number - 17
Evaluating question number - 18
Evaluating question number - 19
Evaluating question number - 20


{'retrieval_type_1': 90.0, 'retrieval_type_2': 10.0}

### 3. Log the results.

In [None]:
from helpers import Utility

In [None]:
Utility.log_custom_experiment(
    id='baseline-vs-reranking',
    path='../logs/log.json',
    log_data={
        "eval-score":res,
        "eval-score-description":"Percentage of relevant context from similarity search is 90 percent. And it's 10 percent for mmr. So even weak re-ranking is making it worse.",
        "log-commit-message":"Comparing similarity search and mmr",
        "log-description":"chunk-size:500, chunk-overlap:50, splitter:recursive char text, search-type:similarity/mmr(fetch_k=12) with k=3, reranker:True(mmr), metadata-filtering:false, number of queries and context:50, rag-llm:Ollama-gemma3:4b, judge-llm:Ollama-mistral-7b, eval_tool:custom_eval",
        "date": {
            "day": 30,
            "month": 1,
            "year": 2026
            }
    }
)

Added succesfully!


**Conclusion:** So we won't use a reranker and we will go with simple similarity search.