# Multi Query Strategy Evaluation
The simple pipeline approach can be improved by using a Multi Query strategy. The goal of this strategy is to enhance the performance of the Retrieval Augmented Generation (RAG) model, particularly the context recall metrics. The key idea is to retrieve the documents that are most relevant for answering the user's query. The hypothesis is that the user's intended question may differ from the way they have written the query.

To address this, the Multi Query strategy involves rewriting the user's query 5 times from different perspectives. This is done to uncover the relevant documents that can provide the necessary context. A chunk of text is then retrieved for each of these rewritten queries. Finally, the unique union of all these retrieved documents is used as the context for answering the original user query.

The motivation behind this approach is to better capture the user's underlying informational need, even if it is not fully reflected in the initial query formulation. By diversifying the queries and aggregating the retrieved contexts, the system aims to improve the overall performance and relevance of the responses.

In [1]:
# Importing libraries
import sys
from dotenv import load_dotenv
import pandas as pd
sys.path.insert(1, '/home/jabez/week_11/Contract-Advisor-RAG')
load_dotenv()
sys.path.insert(1, '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/scripts')
import file_loader 
import pipelines 
import evaluation

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Loading data
file_path = '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/data/cnn_dailymail_3.0.0.csv'
data = file_loader.load_csv(file_path)

In [3]:
# Setting character text splitter
chunk_size= 1000
chunk_overlap= 250
vectorstore_character = file_loader.character_text_splitter(data,chunk_size, chunk_overlap)

In [3]:
# Setting semantic text splitter
vectorstore_semantic = file_loader.semantic_text_splitter(data)

In [8]:
# Loading syntetic test data
syntetic_test_data =pd.read_csv('/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/syntetic_test_data.csv')

In [6]:
# Setting retriever for character based chuncking 
retriver = vectorstore_character.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [6]:
# Setting retriever for semantic based chuncking
retriver_semantic = vectorstore_semantic.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [9]:
# Adding answer to test data for character based chuncking
file_path = '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/multiquery_answer.csv'
multiquery_answer = evaluation.adding_answer_to_testdata(syntetic_test_data,pipelines.multi_query_pipeline, vectorstore_character, retriver,file_path)

  warn_beta(
  warn_deprecated(
Creating CSV from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 101.22ba/s]


In [7]:
# Adding answer to test data for semantic based chuncking
file_path = '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/multiquery_semantic_answer.csv'
multiquery_semantic_answer = evaluation.adding_answer_to_testdata(syntetic_test_data,pipelines.multi_query_pipeline, vectorstore_semantic, retriver_semantic,file_path)

  warn_beta(
  warn_deprecated(
Creating CSV from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 88.00ba/s]


In [10]:
# Evaluating character based chuncking
multiquery_rag_evaluation_result = evaluation.ragas_evaluator(multiquery_answer)

Evaluating: 100%|██████████| 40/40 [00:23<00:00,  1.69it/s]


In [8]:
# Evaluating semantic based chuncking
multiquery_rag_semantic_evaluation_result = evaluation.ragas_evaluator(multiquery_semantic_answer)

Evaluating: 100%|██████████| 40/40 [00:27<00:00,  1.48it/s]


In [14]:
# Evaluation mean for chunking size 800
result = evaluation.evaluation_mean(multiquery_rag_evaluation_result)

context_precision: 90.0%, faithfulness: 90.0%, answer_relevancy: 84.28%, context_recall: 91.33%


In [11]:
# Evaluation mean for chunking size 1000
result = evaluation.evaluation_mean(multiquery_rag_evaluation_result)

context_precision: 87.5%, faithfulness: 80.0%, answer_relevancy: 83.12%, context_recall: 84.17%


In [9]:
# Evaluation mean for semantic chunking
result = evaluation.evaluation_mean(multiquery_rag_semantic_evaluation_result)

context_precision: 87.77%, faithfulness: 81.33%, answer_relevancy: 83.56%, context_recall: 88.33%


In [25]:
multiquery_rag_evaluation_result

Unnamed: 0,question,answer,contexts,ground_truth,context_precision,faithfulness,answer_relevancy,context_recall
0,What was the connection between Ibragim Todash...,Ibragim Todashev was an associate of Tamerlan ...,[She acknowledged the theft and gave back the ...,Ibragim Todashev was an associate of Boston Ma...,1.0,0.8,0.867289,1.0
1,What criminal violations is Abid Naseer charge...,Abid Naseer is charged with three criminal vio...,[article: New York (CNN)A federal jury Wednesd...,Abid Naseer is charged with three criminal vio...,1.0,1.0,0.93904,1.0
2,What are some reasons why churches do not carb...,There are a few reasons why churches do not ca...,[fragments or splinters of the true cross. The...,Carbon dating is expensive and most churches d...,1.0,1.0,0.0,1.0
3,"What is Parisa Tabriz's role as the ""Google Se...","Parisa Tabriz's role as the ""Google Security P...","[article: (CNN)In fairy tales, it's usually th...","Parisa Tabriz's role as the ""Google Security P...",1.0,1.0,0.940155,1.0
4,How has Justice Ruth Bader Ginsburg's stance o...,Justice Ruth Bader Ginsburg's stance on women'...,"[family issue, virtually by definition, and an...",Justice Ruth Bader Ginsburg's stance on women'...,0.416667,0.0,0.94219,0.0
5,What role do Shia militias play in the fight a...,Shia militias also play a significant role in ...,[and Iranian interests converge in more immedi...,Shia militias play a significant role in the f...,1.0,1.0,0.89278,0.5
6,What factors caused the Buddy Holly plane cras...,The letter from the NTSB mentioned that the cr...,[highlights: Pilot error and snow were reasons...,Pilot error and inadequate weather briefing we...,1.0,1.0,0.87495,1.0
7,What unique items in Granada's boutiques may b...,The text does not provide information on what ...,[boutiques are full of local art and street ca...,The answer to given question is not present in...,0.0,0.0,0.0,0.0
8,What evidence in the co-pilot's apartment sugg...,The documents do not provide specific evidence...,[article: (CNN)The tragic news that Germanwing...,Documents revealing he'd been declared unfit f...,1.0,0.0,0.984567,1.0
9,"How does the NBA collaborate with Sandberg's ""...",The NBA is collaborating with Sheryl Sandberg'...,"[highlights: Sandberg's ""lean in"" movement is ...","The NBA collaborates with Sandberg's ""lean in""...",1.0,1.0,0.928345,1.0


In [10]:
multiquery_rag_semantic_evaluation_result

Unnamed: 0,question,answer,contexts,ground_truth,context_precision,faithfulness,answer_relevancy,context_recall
0,What was the connection between Ibragim Todash...,Ibragim Todashev was an associate of Tamerlan ...,[She acknowledged the theft and gave back the ...,Ibragim Todashev was an associate of Boston Ma...,1.0,1.0,0.886229,1.0
1,What criminal violations is Abid Naseer charge...,Abid Naseer is charged with three criminal vio...,"[Mangled bodies and dead families,"" said Assis...",Abid Naseer is charged with three criminal vio...,0.95,1.0,0.972751,1.0
2,What are some reasons why churches do not carb...,There are a few reasons why churches do not ca...,[They have been edited for style and clarity f...,Carbon dating is expensive and most churches d...,1.0,1.0,0.98609,1.0
3,"What is Parisa Tabriz's role as the ""Google Se...","Parisa Tabriz's role as the ""Google Security P...","[article: (CNN)In fairy tales, it's usually th...","Parisa Tabriz's role as the ""Google Security P...",1.0,1.0,0.959763,1.0
4,How has Justice Ruth Bader Ginsburg's stance o...,Justice Ruth Bader Ginsburg's stances on issue...,[article: (CNN)It's no secret that Supreme Cou...,Justice Ruth Bader Ginsburg's stance on women'...,0.876667,0.8,0.899134,1.0
5,What role do Shia militias play in the fight a...,"Shia militias, some backed by Iran, play a cru...",[and Iranian interests converge in more immedi...,Shia militias play a significant role in the f...,1.0,1.0,0.880661,0.833333
6,What factors caused the Buddy Holly plane cras...,The Buddy Holly plane crash was initially attr...,[article: (CNN)It's the day the music died. In...,Pilot error and inadequate weather briefing we...,0.95,1.0,0.942145,1.0
7,What unique items in Granada's boutiques may b...,The document does not provide specific informa...,[article: (CNN)The sun cuts across Lake Nicara...,The answer to given question is not present in...,0.0,0.0,0.0,0.0
8,What evidence in the co-pilot's apartment sugg...,The documents found in co-pilot Andreas Lubitz...,[article: (CNN)The tragic news that Germanwing...,Documents revealing he'd been declared unfit f...,1.0,0.333333,0.907244,1.0
9,"How does the NBA collaborate with Sandberg's ""...","The NBA collaborates with Sandberg's ""lean in""...","[article: (CNN)In the two years since ""Lean In...","The NBA collaborates with Sandberg's ""lean in""...",1.0,1.0,0.922445,1.0
