## Simple Rag Pipeline Evaluation
In this notebook we will create a simple rag pipeline with basic strategies to determine the baseline result.
These are the strategies we are following in this simple rag pipeline
- Recursive Character Splitter with 1000 chunk size and chunk overlap of 250
- OpenAI default embedding model: text-embedding-ada-002
- Vector store as the retriever 
- Generator Model: gpt4o

We are using RAGS for evaluation. The choosen RAGAS metrics are:
- Faithfulness
- Context Recall
- Answer Relevancy
- Context Precision

In [1]:
# Importing libraries
import sys
from dotenv import load_dotenv
import pandas as pd
sys.path.insert(1, '/home/jabez/week_11/Contract-Advisor-RAG')
load_dotenv()
sys.path.insert(1, '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/scripts')
import file_loader 
import pipelines 
import evaluation

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# loading data
file_path = '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/data/cnn_dailymail_3.0.0.csv'
data = file_loader.load_csv(file_path)

In [3]:
# RecursiveCharacterTextSplitter
chunk_size= 1000
chunk_overlap= 250
vectorstore_character = file_loader.character_text_splitter(data, chunk_size, chunk_overlap)

In [4]:
# Generate syntetic test data
file_path = '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/syntetic_test_data.csv'
syntetic_test_data =evaluation.generate_syntetic_testdata(data, file_path)

Filename and doc_id are the same for all nodes.                     
Generating: 100%|██████████| 10/10 [00:46<00:00,  4.61s/it]


In [4]:
# Loading syntetic test data
syntetic_test_data = pd.read_csv('/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/syntetic_test_data.csv')

In [5]:
# Setting retriever
retriver = vectorstore_character.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [6]:
# Adding answer to test data from simple pipeline
file_path_with_answer = '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/syntetic_test_data_with_answer.csv'
syntetic_test_data_with_answer = evaluation.adding_answer_to_testdata(syntetic_test_data, pipelines.simple_pipeline, vectorstore_character, retriver, file_path_with_answer)

  warn_deprecated(
Creating CSV from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 57.48ba/s]


In [7]:
# Evaluating the test data from simple pipeline
simple_rag_evaluation_result = evaluation.ragas_evaluator(syntetic_test_data_with_answer)

Evaluating: 100%|██████████| 40/40 [00:29<00:00,  1.37it/s]


In [8]:
# Evaluation mean
result = evaluation.evaluation_mean(simple_rag_evaluation_result)

context_precision: 86.83%, faithfulness: 86.75%, answer_relevancy: 85.52%, context_recall: 88.0%


In [9]:
simple_rag_evaluation_result

Unnamed: 0,question,answer,contexts,ground_truth,context_precision,faithfulness,answer_relevancy,context_recall
0,What was the connection between Ibragim Todash...,Ibragim Todashev was an associate of Tamerlan ...,[highlights: Ibragim Todashev was killed durin...,Ibragim Todashev was an associate of Boston Ma...,1.0,1.0,0.889481,1.0
1,What criminal violations is Abid Naseer charge...,Abid Naseer is charged with three criminal vio...,[article: New York (CNN)A federal jury Wednesd...,Abid Naseer is charged with three criminal vio...,1.0,1.0,0.972751,1.0
2,What are some reasons why churches do not carb...,Churches do not carbon date relics claimed to ...,[fragments or splinters of the true cross. The...,Carbon dating is expensive and most churches d...,0.966667,1.0,0.0,1.0
3,"What is Parisa Tabriz's role as the ""Google Se...","Parisa Tabriz's role as the ""Google Security P...","[article: (CNN)In fairy tales, it's usually th...","Parisa Tabriz's role as the ""Google Security P...",1.0,1.0,0.940155,1.0
4,How has Justice Ruth Bader Ginsburg's stance o...,Justice Ruth Bader Ginsburg's stance on women'...,[article: (CNN)It's no secret that Supreme Cou...,Justice Ruth Bader Ginsburg's stance on women'...,0.75,1.0,0.962445,1.0
5,What role do Shia militias play in the fight a...,Shia militias play a significant role in the f...,[world. Were the nuclear talks to leave Iran a...,Shia militias play a significant role in the f...,1.0,1.0,0.944196,0.8
6,What factors caused the Buddy Holly plane cras...,"According to the NTSB letter, the factors that...",[highlights: Pilot error and snow were reasons...,Pilot error and inadequate weather briefing we...,1.0,1.0,0.996974,1.0
7,What unique items in Granada's boutiques may b...,The unique items in Granada's boutiques that m...,[boutiques are full of local art and street ca...,The answer to given question is not present in...,0.0,0.0,0.950042,0.0
8,What evidence in the co-pilot's apartment sugg...,Evidence found in the co-pilot Andreas Lubitz'...,[no evidence suggesting Lubitz was suicidal or...,Documents revealing he'd been declared unfit f...,0.966667,0.8,0.964843,1.0
9,"How does the NBA collaborate with Sandberg's ""...","The NBA collaborates with Sheryl Sandberg's ""l...","[highlights: Sandberg's ""lean in"" movement is ...","The NBA collaborates with Sandberg's ""lean in""...",1.0,0.875,0.93094,1.0


In [None]:
simple_rag_evaluation_result.to_csv('/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/simple_rag_evaluation_result.csv', index=False)