# RAG with Multiple Search Modes
This notebook compares different retrieval strategies in ChromaDB and how they affect a simple RAG pipeline.

We will experiment with cosine similarity, maximal marginal relevance (MMR), and a naive hybrid approach.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

DB_DIR = 'rag_db'
embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')

# Load the same documents as in the basic notebook
vectordb = Chroma(persist_directory=DB_DIR, embedding_function=embeddings)

In [None]:
def ask(query, mode='similarity'):
    if mode == 'mmr':
        retriever = vectordb.as_retriever(search_type='mmr', search_kwargs={'k':3})
    elif mode == 'hybrid':
        retriever = vectordb.as_retriever(search_kwargs={'k':2})
        # simple hybrid example: rerun with mmr for diversity
        retriever_mmr = vectordb.as_retriever(search_type='mmr', search_kwargs={'k':1})
        retriever.search = lambda q: retriever.search(q) + retriever_mmr.search(q)
    else:
        retriever = vectordb.as_retriever(search_kwargs={'k':3})
    qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type='stuff', retriever=retriever)
    return qa({'query': query})['result']

In [None]:
for mode in ['similarity','mmr','hybrid']:
    answer = ask('Which pets are good for apartments?', mode=mode)
    print(f'--- {mode} ---')
    print(answer)


Cosine similarity often retrieves the most relevant documents but may include redundancy. MMR tries to diversify results, while the hybrid approach combines them. Timings and token counts will vary depending on the dataset size.