# MultiQueryRetreiver

Distance-based vector database retrieval relies on high-dimensional space representations to find similar documents, but query wording changes and inadequate embeddings can lead to varying results. The MultiQueryRetriever automates prompt tuning by generating diverse queries from a user input, collecting relevant documents for each query, and combining the results to potentially overcome the limitations of distance-based retrieval and provide a more comprehensive set of results.

In [None]:
!pip install langchain
!pip install langchain-groq
!pip install langchain-openai
!pip install langchain-core
!pip install langchain-community

In [None]:
# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [None]:
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.embeddings import HuggingFaceBgeEmbeddings

llm = ChatGroq(model="meta-llama/llama-4-scout-17b-16e-instruct")

embedding_model = HuggingFaceBgeEmbeddings(
    model_name = "BAAI/bge-small-en-v1.5",
    model_kwargs = {'device':'cpu'},
    encode_kwargs = {'normalize_embeddings':True}
)

## From Scratch: Implementing Chunking Techniques for System Prompt

In [None]:
multi_query_template ="""
You are an helpful assistant that generates multiple alternate search query out of user's input query.
These alternate queries will be used to make simantic search within a vector database using  similariy metrics. 
Generate 5 alternate queries that can be formed to better understand user's input query given below

{user_query}

Strictly retrun only the alternate queries separated by new line 
"""

multi_query_prompt = ChatPromptTemplate.from_template(multi_query_template)

# Creating a chain

multi_query_chain = (
    multi_query_prompt
    | llm
    | StrOutputParser()
    | (lambda x: [i for i in x.split("\n") if x!=''])
)

# Generating Alternate Queries
multi_query_chain.invoke("Suggest me items to sell in my new outdoor adventure and expedition utilities shop.")

# ['Outdoor adventure equipment shop inventory ideas',
#  'Outdoor expedition gear store product suggestions',
#  'Best selling items for an outdoor adventure shop',
#  'Essential equipment for outdoor expeditions',
#  'Products to stock in an outdoor adventure and expedition store']

## From Langchain : MultiQueryRetriever (Combine Query & Retrieve)

In [None]:
from langchain.vectorstores import FAISS

db_file_name = './vectordb_path/ml-andrew-ng/'
vectordb = FAISS.load_local(folder_path = db_file_name, embeddings  = embedding_model) 
retreiver = vectordb.as_retriever()

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever

retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(), llm=llm)

In [None]:
question = "What are the difference between Linear Regression and Logistic Regression?"
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. How do Linear Regression and Logistic Regression differ from each other?', '2. In what ways do Linear Regression and Logistic Regression vary?', '3. Can you explain the distinctions between Linear Regression and Logistic Regression?']


7

In [None]:
# It try to rephase the question in multiple ways and then use those rephrased queries to retrieve documents from the vector database. 
# This way it can capture different aspects of the question and retrieve a more diverse set of relevant documents.
# 1. How do Linear Regression and Logistic Regression differ from each other?
# 2. In what ways do Linear Regression and Logistic Regression vary?
# 3. Can you explain the distinctions between Linear Regression and Logistic Regression?