# Multi Query Strategy Evaluation
The simple pipeline approach can be improved by using a Multi Query strategy. The goal of this strategy is to enhance the performance of the Retrieval Augmented Generation (RAG) model, particularly the context recall metrics. The key idea is to retrieve the documents that are most relevant for answering the user's query. The hypothesis is that the user's intended question may differ from the way they have written the query.

To address this, the Multi Query strategy involves rewriting the user's query 5 times from different perspectives. This is done to uncover the relevant documents that can provide the necessary context. A chunk of text is then retrieved for each of these rewritten queries. Finally, the unique union of all these retrieved documents is used as the context for answering the original user query.

The motivation behind this approach is to better capture the user's underlying informational need, even if it is not fully reflected in the initial query formulation. By diversifying the queries and aggregating the retrieved contexts, the system aims to improve the overall performance and relevance of the responses.

In [5]:
# Importing libraries
import sys
from dotenv import load_dotenv
import json
import pandas as pd
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
sys.path.insert(1, '/home/jabez/week_11/Contract-Advisor-RAG')
load_dotenv()
sys.path.insert(1, '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/scripts')
import file_loader 
import pipelines 
import evaluation

In [3]:
# Load JSON from file
json_path = '../filepath.json'

with open(json_path, 'r') as json_file:
    file_paths = json.load(json_file)
data_file_path = file_paths['data_file_path']
synthetic_test_data_path = file_paths['synthetic_test_data_path']

# loading data
data = file_loader.load_csv(data_file_path)

# loading synthetic test data
synthetic_test_data = pd.read_csv(synthetic_test_data_path)

# loading persist directory for smaller chunck vector db
persist_directory_for_smaller_chunck_vector_db = file_paths['persist_directory_for_smaller_chunck_vector_db']


In [6]:
# Load a Chroma database
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
smaller_chunck_vector_db = Chroma(persist_directory=persist_directory_for_smaller_chunck_vector_db, embedding_function=embeddings)

# Setting the retriever
retriver = smaller_chunck_vector_db.as_retriever(search_type="similarity", search_kwargs={"k": 6})

# Adding answer to test data for character based chuncking
multiquery_answer = evaluation.adding_answer_to_testdata(synthetic_test_data,pipelines.multi_query_pipeline,smaller_chunck_vector_db, retriver)

  warn_beta(
  warn_deprecated(
Failed to batch ingest runs: LangSmithRateLimitError('Rate limit exceeded for https://api.smith.langchain.com/runs/batch. HTTPError(\'429 Client Error: Too Many Requests for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"Monthly unique traces usage limit exceeded"}\')')
Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. SSLError(MaxRetryError("HTTPSConnectionPool(host=\'api.smith.langchain.com\', port=443): Max retries exceeded with url: /runs/batch (Caused by SSLError(SSLEOFError(8, \'EOF occurred in violation of protocol (_ssl.c:2426)\')))"))')
Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. SSLError(MaxRetryError("HTTPSConnectionPool(host=\'

Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. SSLError(MaxRetryError("HTTPSConnectionPool(host=\'api.smith.langchain.com\', port=443): Max retries exceeded with url: /runs/batch (Caused by SSLError(SSLEOFError(8, \'EOF occurred in violation of protocol (_ssl.c:2426)\')))"))')
Failed to batch ingest runs: LangSmithConnectionError("Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))")
Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. SSLError(MaxRetryError("HTTPSConnectionPool(hos

In [7]:
# Evaluating character based chuncking
multiquery_rag_evaluation_result = evaluation.ragas_evaluator(multiquery_answer)

Evaluating: 100%|██████████| 80/80 [00:50<00:00,  1.58it/s]


In [8]:
# Evaluation mean for chunking size 500
result = evaluation.evaluation_mean(multiquery_rag_evaluation_result)

context_precision: 88.74%, faithfulness: 79.81%, answer_relevancy: 95.55%, context_recall: 86.67%


context_precision: 92.99%, faithfulness: 85.84%, answer_relevancy: 95.41%, context_recall: 89.5%


In [10]:
multiquery_rag_evaluation_result[['context_precision','faithfulness','answer_relevancy','context_recall']]

Unnamed: 0,context_precision,faithfulness,answer_relevancy,context_recall
0,1.0,0.8,0.882616,1.0
1,1.0,1.0,0.895578,1.0
2,1.0,0.727273,0.997624,1.0
3,1.0,1.0,0.995994,0.5
4,0.755556,1.0,0.987858,1.0
5,1.0,0.75,0.980358,1.0
6,1.0,1.0,0.955961,1.0
7,0.926667,0.666667,0.9946,0.833333
8,1.0,1.0,0.983625,1.0
9,0.916667,1.0,1.0,1.0
