## Simple RAG Pipeline Evaluation
In this notebook we will create a simple rag pipeline with basic strategies to determine the baseline result.
These are the strategies we are following in this simple rag pipeline
- Recursive Character Splitter with 1000 chunk size and chunk overlap of 250
- OpenAI default embedding model: text-embedding-ada-002
- Vector store as the retriever 
- Generator Model: gpt4o

We are using RAGS for evaluation. The choosen RAGAS metrics are:
- Faithfulness
- Context Recall
- Answer Relevancy
- Context Precision

In [1]:
# Importing libraries
import sys
from dotenv import load_dotenv
import pandas as pd
import json
load_dotenv()
sys.path.insert(1, '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/scripts')
import file_loader 
import pipelines 
import evaluation

  from .autonotebook import tqdm as notebook_tqdm


In [9]:
# Load JSON from file
json_path = '../filepath.json'

with open(json_path, 'r') as json_file:
    file_paths = json.load(json_file)
data_file_path = file_paths['data_file_path']
synthetic_test_data_path = file_paths['synthetic_test_data_path']

# loading data
data = file_loader.load_csv(data_file_path)

# loading synthetic test data
synthetic_test_data = pd.read_csv(synthetic_test_data_path)

In [5]:
# RecursiveCharacterTextSplitter
chunk_size= 1000
chunk_overlap= 250
vectorstore_character = file_loader.character_text_splitter(data, chunk_size, chunk_overlap)

In [3]:
# Only run this code once to generate the test data after that call the csv file
# Generate syntetic test data
file_path_test_data = '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/syntetic_test_data.csv'
syntetic_test_data =evaluation.generate_syntetic_testdata(data, file_path_test_data)

embedding nodes:  45%|████▍     | 1200/2676 [01:33<01:16, 19.17it/s]Failed to batch ingest runs: LangSmithRateLimitError('Rate limit exceeded for https://api.smith.langchain.com/runs/batch. HTTPError(\'429 Client Error: Too Many Requests for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"Monthly unique traces usage limit exceeded"}\')')
embedding nodes:  48%|████▊     | 1273/2676 [01:38<01:42, 13.69it/s]Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. SSLError(MaxRetryError("HTTPSConnectionPool(host=\'api.smith.langchain.com\', port=443): Max retries exceeded with url: /runs/batch (Caused by SSLError(SSLEOFError(8, \'EOF occurred in violation of protocol (_ssl.c:2426)\')))"))')
embedding nodes:  48%|████▊     | 1281/2676 [01:39<01:49, 12.72it/s]Failed to batch ingest runs: LangSmithConnectionError('Connection error caused

Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. SSLError(MaxRetryError("HTTPSConnectionPool(host=\'api.smith.langchain.com\', port=443): Max retries exceeded with url: /runs/batch (Caused by SSLError(SSLEOFError(8, \'EOF occurred in violation of protocol (_ssl.c:2426)\')))"))')
Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. SSLError(MaxRetryError("HTTPSConnectionPool(host=\'api.smith.langchain.com\', port=443): Max retries exceeded with url: /runs/batch (Caused by SSLError(SSLEOFError(8, \'EOF occurred in violation of protocol (_ssl.c:2426)\')))"))')
Failed to batch ingest runs: LangSmithRateLimitError('Rate limit exceeded for https://api.smith.langchain.com/runs/batch. HTTPError(\'429 Cl

In [10]:
synthetic_test_data

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What is the significance of the U.S. and Iran ...,[' that the deal being discussed is a bad one ...,The answer to given question is not present in...,simple,[{'source': 'Longtime foes U.S. and Iran now f...,True
1,What is the concept behind Surf Simply?,['article: (CNN)Riding the crisp green waves t...,The concept behind Surf Simply is to create a ...,simple,[{'source': 'Ru Hill trained in fine art in Lo...,True
2,What led to T.J. Maxx pulling the offensive T-...,['article: (CNN)T.J. Maxx has pulled a T-shirt...,The offensive T-shirt featuring the phrase 'Ha...,simple,[{'source': 'T.J. Maxx pulled a T-shirt many f...,True
3,What is the significance of the 14-karat-gold-...,"['article: (CNN)""Dancing With the Stars"" seaso...",The significance of the 14-karat-gold-plated M...,simple,"[{'source': ""The show celebrated its 10-year a...",True
4,What advantages do mass timber panels offer in...,['article: (CNN)If the 20th century was the ce...,"Mass timber panels, particularly cross-laminat...",simple,[{'source': 'Wood is making a comeback as a co...,True
5,How did Leonard Nimoy inspire scientists and c...,"[', Nimoy challenged us to understand our huma...",Nimoy challenged us to understand our human na...,simple,"[{'source': ""Star Trek actor Leonard Nimoy die...",True
6,What is the Egyptian army battling in North Si...,"[""article: (CNN)An explosive-laden truck blew ...",The Egyptian army is battling an Islamist insu...,simple,[{'source': 'The injured include 42 Egyptian s...,True
7,How does player safety relate to the decision ...,[' to his family asking that his brain be sent...,Player safety is a significant factor in the d...,simple,"[{'source': 'Chris Borland tells ESPN that ""I ...",True
8,How many World Cup wins does Lindsey Vonn have...,['article: (CNN)Lindsey Vonn may have missed o...,Lindsey Vonn has a record-extending 65 World C...,simple,[{'source': 'Lindsey Vonn notches the 65th Wor...,True
9,How can super-fast memory chips revolutionize ...,"['article: (CNN)In 1971, a physicist conceptua...",Super-fast memory chips can revolutionize comp...,simple,"[{'source': ""The memristor is a new type of el...",True


In [6]:
# Setting retriever
retriver = vectorstore_character.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [7]:
# Adding answer to test data from simple pipeline
file_path_with_answer = '/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/syntetic_test_data_with_answer.csv'
syntetic_test_data_with_answer = evaluation.adding_answer_to_testdata(synthetic_test_data, pipelines.simple_pipeline, vectorstore_character, retriver)

  warn_deprecated(
Creating CSV from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 69.04ba/s]


In [8]:
# Evaluating the test data from simple pipeline
simple_rag_evaluation_result = evaluation.ragas_evaluator(syntetic_test_data_with_answer)

Evaluating: 100%|██████████| 80/80 [00:49<00:00,  1.63it/s]


In [9]:
# Evaluation mean
result = evaluation.evaluation_mean(simple_rag_evaluation_result)

context_precision: 95.47%, faithfulness: 87.48%, answer_relevancy: 86.66%, context_recall: 88.92%


In [10]:
simple_rag_evaluation_result

Unnamed: 0,question,answer,contexts,ground_truth,context_precision,faithfulness,answer_relevancy,context_recall
0,What role did Actor Benedict Cumberbatch play ...,"Actor Benedict Cumberbatch, who is a distant c...","[for the long-dead King, swinging incense and ...",Actor Benedict Cumberbatch read a poem at the ...,0.966667,0.75,0.882254,1.0
1,What was the connection between Ibragim Todash...,Ibragim Todashev was an associate of Tamerlan ...,[highlights: Ibragim Todashev was killed durin...,Ibragim Todashev was an associate of Boston Ma...,1.0,1.0,0.913261,1.0
2,"What are some pros and cons of the movie ""Pret...","The movie ""Pretty Woman"" has several pros and ...",[article: (CNN)For a generation of moviegoers ...,"The movie ""Pretty Woman"" has some pros such as...",1.0,1.0,0.999299,1.0
3,How does Parisa Tabriz contribute to cybersecu...,Parisa Tabriz contributes to cybersecurity at ...,[highlights: Parisa Tabriz is the 31-year-old ...,Parisa Tabriz contributes to cybersecurity at ...,1.0,0.666667,0.990183,0.75
4,What is Maurizio Cattelan known for in the art...,Maurizio Cattelan is known for his humorous an...,"[Known for his humorous and satirical art, Cat...",Maurizio Cattelan is known for his humorous an...,0.833333,0.666667,0.991169,1.0
5,How have foreign tourists been affected by Isl...,Foreign tourists in parts of North and sub-Sah...,[article: (CNN)The attack on Tunisia's famed B...,Foreign tourists have been affected by Islamis...,1.0,1.0,0.989343,0.8
6,What are potential buyers asking about regardi...,"Potential buyers are asking ""lots of serious q...","[only claim to fame, Sells says. The former ma...","Potential buyers are asking ""lots of serious q...",0.916667,1.0,0.955961,1.0
7,What are the responsibilities and goals of Aym...,"As the leader of the opposition in Israel, Aym...",[highlights: Ayman Odeh is a new face in polit...,Ayman Odeh's responsibilities as the leader of...,1.0,0.833333,0.994708,0.833333
8,How could the memristor revolutionize electron...,The memristor could revolutionize electronics ...,"[At the moment, manufacturing costs are still ...",The memristor could revolutionize electronics ...,1.0,1.0,0.961102,1.0
9,What are some features of the city of Granada ...,Some features of the city of Granada that cont...,[boutiques are full of local art and street ca...,"Colonial buildings, narrow alleyways, garden c...",0.95,0.5,0.998726,1.0


In [None]:
simple_rag_evaluation_result.to_csv('/home/jabez/rizzbuzz with poetry/RAG-Optimization-System/test_data/simple_rag_evaluation_result.csv', index=False)