# GOAL

- Evaluar el rendimiento del pipeline RAG **baseline**
- Evaluar el rendimiento del pipeline RAG **con reranker**

Utilizaremos la librería [`ragas`](https://docs.ragas.io/en/latest/getstarted/) para evaluar el rendimiento del pipeline RAG.
y la librería [`Reranker`](https://github.com/AnswerDotAI/rerankers) para reordenar los documentos relevantes.


In [2]:
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
from langchain_community.document_loaders import PyPDFLoader
from dotenv import load_dotenv

load_dotenv()

file_path = (
    "../practicos-rag/data/benchmark_data/Reglamento 1333 2008.pdf"
)
loader = PyPDFLoader(file_path)
docs = loader.load()

In [4]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=50)
splits = text_splitter.split_documents(docs)

In [5]:
from langchain_ollama import OllamaEmbeddings
from langchain_openai import OpenAIEmbeddings
import os
#embeddings = OllamaEmbeddings(model="llama3")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [6]:
from langchain_qdrant import QdrantVectorStore
from langchain_qdrant import RetrievalMode

qdrant = QdrantVectorStore.from_documents(
    splits,
    embedding=embeddings,
    location=":memory:",
    collection_name="my_documents",
    retrieval_mode=RetrievalMode.DENSE,
)

In [7]:
retriever = qdrant.as_retriever()

In [8]:
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain_ollama import OllamaLLM
from langchain_openai import ChatOpenAI


llm = ChatOpenAI(model="gpt-4-turbo-preview") 

# Define prompt template
template = """Utilize the retrieved context below to answer the question.
If you're unsure of the answer, simply state you don't know and apologies
Keep your response concise, limited to two sentences.
Question: {question}
Context: {context}
"""

prompt = ChatPromptTemplate.from_template(template)

# Setup RAG pipeline
rag_chain = (
    {"context": retriever,  "question": RunnablePassthrough()} 
    | prompt 
    | llm
    | StrOutputParser() 
)

In [9]:
rag_chain.invoke("What is the purpose of the regulation?")

'The purpose of Regulation (EC) No 1333/2008 is to lay down rules on food additives used in foods with the aim of ensuring food safety and protection for consumers.'

## Creación del dataset de preguntas y respuestas

In [10]:
QA_generation_prompt = ChatPromptTemplate.from_template("""
Your task is to write a factoid question and an answer given a context.
Your factoid question should be answerable with a specific, concise piece of factual information from the context.
Your factoid question should be formulated in the same style as questions users could ask in a search engine.
This means that your factoid question MUST NOT mention something like "according to the passage" or "context".

Provide your answer as follows:

Output:::
Factoid question: (your factoid question)
Answer: (your answer to the factoid question)

Now here is the context.

Context: {context}
Output:::""")

# Crear una cadena para crear una pregunta
question_chain = (
    {"context": RunnablePassthrough()}
    | QA_generation_prompt
    | llm
    | StrOutputParser()
)

### Obtener una muestra aleatoria de documentos

In [11]:
import random
from tqdm import tqdm

sampled_docs = random.sample(docs, 15)
sampled_docs_processed = [doc.page_content for doc in sampled_docs]

#### Generar preguntas y respuestas

In [12]:

questions = [question_chain.invoke({"context": sampled_context}) for sampled_context in tqdm(sampled_docs_processed)]

100%|██████████| 15/15 [00:28<00:00,  1.88s/it]


In [13]:
questions

['Factoid question: What document number is associated with the regulation of the European Parliament and of the Council mentioned in the context?\nAnswer: EC No 1333/2008',
 'Factoid question: What is the code for Sodium metabisulphite according to Regulation (EC) No 1333/2008?\nAnswer: E 223',
 'Factoid question: What is the maximum level of E 459 Beta-cyclodextrin allowed in flavoured teas?\nAnswer: 500 mg/l in final food',
 'Factoid question: What is the maximum level of propane-1, 2-diol (propylene glycol) allowed in final food products according to Regulation (EC) No 1333/2008?\nAnswer: 1 000 mg/kg',
 'Factoid question: What is the E number for Calcium ascorbate?\nAnswer: E 302',
 'Factoid question: What is the main purpose of Regulation (EC) No 1333/2008?\nAnswer: The main purpose of Regulation (EC) No 1333/2008 is to lay down rules on food additives used in foods to ensure the effective functioning of the market while ensuring a high level of protection of human health and cons

In [14]:
questions_processed = []
ground_truth = []
for question in questions:
    questions_processed.append(question.split("Factoid question: ")[-1].split("Answer: ")[0])
    ground_truth.append(question.split("Factoid question: ")[-1].split("Answer: ")[1])

In [15]:
questions_processed

['What document number is associated with the regulation of the European Parliament and of the Council mentioned in the context?\n',
 'What is the code for Sodium metabisulphite according to Regulation (EC) No 1333/2008?\n',
 'What is the maximum level of E 459 Beta-cyclodextrin allowed in flavoured teas?\n',
 'What is the maximum level of propane-1, 2-diol (propylene glycol) allowed in final food products according to Regulation (EC) No 1333/2008?\n',
 'What is the E number for Calcium ascorbate?\n',
 'What is the main purpose of Regulation (EC) No 1333/2008?\n',
 'What is the deadline for foods labelled before 20 January 2010 that do not comply with Article 22(1)(i) and (4) of Regulation (EC) No 1333/2008 to be marketed until?\n',
 'What is the E-number for Xylitol according to Regulation (EC) No 1333/2008?\n',
 'When were the words in Articles 12 and 13 of Regulation (EC) No 1333/2008 substituted according to The Food Additives, Flavourings, Enzymes and Extraction Solvents (Amendmen

In [16]:
ground_truth

['EC No 1333/2008',
 'E 223',
 '500 mg/l in final food',
 '1 000 mg/kg',
 'E 302',
 'The main purpose of Regulation (EC) No 1333/2008 is to lay down rules on food additives used in foods to ensure the effective functioning of the market while ensuring a high level of protection of human health and consumer protection.',
 'Their date of minimum durability or use-by-date.',
 'E 967',
 '31.12.2020',
 'Commission Regulation (EU) No 380/2012 of 3 May 2012.',
 '2023-12-07',
 '31 July 2014',
 '9 March 2012',
 'E 200',
 'Sunset yellow (E 110), Quinoline yellow (E 104), Carmoisine (E 122), Allura red (E 129), Tartrazine (E 102), Ponceau 4R (E 124)']

In [17]:
# Parsear preguntas y respuestas
contexts = []
answers = []
# Inferencia
for query in questions_processed:
    answers.append(rag_chain.invoke(query))
    contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])


  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])


In [18]:
data = {
    "question": questions,
    "answer": answers,
    "reference": ground_truth,
    "retrieved_contexts": contexts
}

In [19]:
from datasets import Dataset

# Convert dict to dataset
dataset = Dataset.from_dict(data)

In [20]:
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
)

In [21]:
result = evaluate(
    dataset = dataset,
    llm=llm,
    embeddings=embeddings,
    metrics=[
        context_recall,
        faithfulness,
        answer_relevancy,
        context_precision,
    ],)

df = result.to_pandas()
df

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[36]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4-turbo-preview in organization org-1IccE2reLGyjkhxuj0CwkN90 on tokens per min (TPM): Limit 30000, Used 29902, Requested 1216. Please try again in 2.236s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[32]: TimeoutError()


Unnamed: 0,user_input,retrieved_contexts,response,reference,context_recall,faithfulness,answer_relevancy,context_precision
0,Factoid question: What document number is asso...,[Regulation (EC) No 1333/2008 of the European ...,The document number associated with the regula...,EC No 1333/2008,1.0,1.0,0.752113,1.0
1,Factoid question: What is the code for Sodium ...,[European Parliament and of the Council. Any c...,The code for Sodium metabisulphite according t...,E 223,1.0,1.0,0.885014,1.0
2,Factoid question: What is the maximum level of...,[foodE 459Beta-cyclodextrinEncapsulated flavou...,The maximum level of E 459 Beta-cyclodextrin a...,500 mg/l in final food,1.0,1.0,0.903697,0.833333
3,Factoid question: What is the maximum level of...,[E 1520 shall be 1 000 mg/l from all sources.E...,"The maximum level of propane-1, 2-diol (propyl...",1 000 mg/kg,1.0,0.5,0.870294,1.0
4,Factoid question: What is the E number for Cal...,[E 302 Calcium ascorbate\nE 304 Fatty acid est...,The E number for Calcium ascorbate is E 302.,E 302,1.0,1.0,0.863201,0.805556
5,Factoid question: What is the main purpose of ...,[Regulation (EC) No 1333/2008 of the European ...,"I'm sorry, but based on the provided context, ...",The main purpose of Regulation (EC) No 1333/20...,0.0,0.0,0.0,0.0
6,Factoid question: What is the deadline for foo...,[with Article 22(1)(i) and (4) may be marketed...,Foods labelled before 20 January 2010 that do ...,Their date of minimum durability or use-by-date.,1.0,0.666667,0.763349,0.75
7,Factoid question: What is the E-number for Xyl...,[E 965 Maltitols\nE 966 Lactitol\nE 967 Xylito...,The E-number for Xylitol according to Regulati...,E 967,1.0,1.0,0.849066,1.0
8,Factoid question: When were the words in Artic...,[Directive 89/398/EEC.\nTextual Amendments\nF1...,I don't know.,31.12.2020,,0.0,0.0,1.0
9,Factoid question: What Commission Regulation a...,[Annexes II and III to Regulation (EC) No 1333...,Commission Regulation (EU) No 1130/2011 of 11 ...,Commission Regulation (EU) No 380/2012 of 3 Ma...,,0.0,0.543782,0.0


In [22]:
df.to_csv("baseline_ragas_results.csv", index=False)

In [23]:
# get mean of the metrics column by column
print("Mean Faithfulness: ", round(df["faithfulness"].mean(), 4))
print("Mean Answer relevancy: ", round(df["answer_relevancy"].mean(), 4))
print("Mean Context recall: ", round(df["context_recall"].mean(), 4))
print("Mean Context precision: ", round(df["context_precision"].mean(), 4))


Mean Faithfulness:  0.5444
Mean Answer relevancy:  0.5897
Mean Context recall:  0.7692
Mean Context precision:  0.7593


## Adding a reranker step

In [None]:
query = "What is the purpose of the regulation?"

retrieved_docs = retriever.get_relevant_documents(query, kwargs={"k": 10})


In [34]:
qdrant.similarity_search_with_score(query, k=10)

[(Document(metadata={'source': '../practicos-rag/data/benchmark_data/Reglamento 1333 2008.pdf', 'page': 3, '_id': 'da42de520e5e4c6ba84f2f9795589abd', '_collection_name': 'my_documents'}, page_content='Regulation and to adopt appropriate transitional measures. Since those measures are\nof general scope and are designed to amend non-essential elements of this Regulation,'),
  0.46665629111211765),
 (Document(metadata={'source': '../practicos-rag/data/benchmark_data/Reglamento 1333 2008.pdf', 'page': 5, '_id': '67b487e0039d4aaa8e56eda9d6dbb2a5', '_collection_name': 'my_documents'}, page_content='HAVE ADOPTED THIS REGULATION:\nCHAPTER I\nSUBJECT MATTER, SCOPE AND DEFINITIONS\nArticle 1\nSubject matter\nThis Regulation lays down rules on food additives used in foods with a view to ensuring'),
  0.4592167729175051),
 (Document(metadata={'source': '../practicos-rag/data/benchmark_data/Reglamento 1333 2008.pdf', 'page': 6, '_id': 'b01e219628454a078aa6a971839b0aae', '_collection_name': 'my_docu

In [25]:
retrieved_docs

[Document(metadata={'source': '../practicos-rag/data/benchmark_data/Reglamento 1333 2008.pdf', 'page': 3, '_id': 'da42de520e5e4c6ba84f2f9795589abd', '_collection_name': 'my_documents'}, page_content='Regulation and to adopt appropriate transitional measures. Since those measures are\nof general scope and are designed to amend non-essential elements of this Regulation,'),
 Document(metadata={'source': '../practicos-rag/data/benchmark_data/Reglamento 1333 2008.pdf', 'page': 5, '_id': '67b487e0039d4aaa8e56eda9d6dbb2a5', '_collection_name': 'my_documents'}, page_content='HAVE ADOPTED THIS REGULATION:\nCHAPTER I\nSUBJECT MATTER, SCOPE AND DEFINITIONS\nArticle 1\nSubject matter\nThis Regulation lays down rules on food additives used in foods with a view to ensuring'),
 Document(metadata={'source': '../practicos-rag/data/benchmark_data/Reglamento 1333 2008.pdf', 'page': 6, '_id': 'b01e219628454a078aa6a971839b0aae', '_collection_name': 'my_documents'}, page_content='Regulation (EC) No 1333/200

In [26]:
import cohere as co
cohere_client = co.Client(os.getenv("COHERE_API_KEY"))
def rerank_docs(query, retrieved_docs):
    reranked_docs = cohere_client.rerank(
        model="rerank-english-v3.0",
        query=query,
        documents=retrieved_docs,
        rank_fields=["page_content"],
        return_documents=True
    )
    return reranked_docs

from rerankers import Reranker

def open_source_reranker(query, retrieved_docs):
    #reranker = Reranker('cross-encoder', verbose=0,model_type='cross-encoder')
    reranker = Reranker("colbert", verbose=0)
    retrieved_docs = [doc.page_content for doc in retrieved_docs]
    reranked_docs = reranker.rank(query, retrieved_docs)
    return reranked_docs


In [27]:
reranked_docs = open_source_reranker(query, retrieved_docs)

Loading default colbert model for language en
Default Model: colbert-ir/colbertv2.0
Loading ColBERTRanker model colbert-ir/colbertv2.0 (this message can be suppressed by setting verbose=0)
No device set
Using device mps
No dtype set
Using dtype torch.float32
Loading model colbert-ir/colbertv2.0, this might take a while...
Linear Dim set to: 128 for downcasting


In [28]:
reranked_docs.results

[Result(document=Document(document_type='text', text='Regulation and to adopt appropriate transitional measures. Since those measures are\nof general scope and are designed to amend non-essential elements of this Regulation,', base64=None, image_path=None, doc_id=0, metadata={}), score=0.9069003462791443, rank=1),
 Result(document=Document(document_type='text', text='HAVE ADOPTED THIS REGULATION:\nCHAPTER I\nSUBJECT MATTER, SCOPE AND DEFINITIONS\nArticle 1\nSubject matter\nThis Regulation lays down rules on food additives used in foods with a view to ensuring', base64=None, image_path=None, doc_id=1, metadata={}), score=0.7887976169586182, rank=2),
 Result(document=Document(document_type='text', text='Regulation (EC) No 1333/2008 of the European Parliament and of the Council of...\nDocument Generated: 2023-12-07\n87\nChanges to legislation: \nThere are outstanding changes not yet made to Regulation (EC) No 1333/2008 of the European', base64=None, image_path=None, doc_id=3, metadata={})

In [29]:
contexts = []
answers = []
# Inference
for query in questions:
    answers.append(rag_chain.invoke(query))
    retrieved_docs = retriever.get_relevant_documents(query)
    reranked_docs = open_source_reranker(query, retrieved_docs)
    if reranked_docs.results:  # Check if there are any results
        contexts.append([reranked_docs.results[0].document.text])

data = {
    "question": questions,
    "answer": answers,
    "reference": ground_truth,
    "retrieved_contexts": contexts
}

Loading default colbert model for language en
Default Model: colbert-ir/colbertv2.0
Loading ColBERTRanker model colbert-ir/colbertv2.0 (this message can be suppressed by setting verbose=0)
No device set
Using device mps
No dtype set
Using dtype torch.float32
Loading model colbert-ir/colbertv2.0, this might take a while...
Linear Dim set to: 128 for downcasting
Loading default colbert model for language en
Default Model: colbert-ir/colbertv2.0
Loading ColBERTRanker model colbert-ir/colbertv2.0 (this message can be suppressed by setting verbose=0)
No device set
Using device mps
No dtype set
Using dtype torch.float32
Loading model colbert-ir/colbertv2.0, this might take a while...
Linear Dim set to: 128 for downcasting
Loading default colbert model for language en
Default Model: colbert-ir/colbertv2.0
Loading ColBERTRanker model colbert-ir/colbertv2.0 (this message can be suppressed by setting verbose=0)
No device set
Using device mps
No dtype set
Using dtype torch.float32
Loading model c

In [30]:
reranked_dataset = Dataset.from_dict(data)
result = evaluate(
    dataset = reranked_dataset,
    llm=llm,
    embeddings=embeddings,
    metrics=[
        context_recall,
        faithfulness,
        answer_relevancy,
        context_precision,
    ],)
reranked_df = result.to_pandas()
reranked_df.to_csv("reranked_ragas_results.csv", index=False)


Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

In [31]:
# get mean of the metrics column by column
print("Mean Faithfulness: ", round(reranked_df["faithfulness"].mean(), 4))
print("Mean Answer relevancy: ", round(reranked_df["answer_relevancy"].mean(), 4))
print("Mean Context recall: ", round(reranked_df["context_recall"].mean(), 4))
print("Mean Context precision: ", round(reranked_df["context_precision"].mean(), 4))

Mean Faithfulness:  0.66
Mean Answer relevancy:  0.6593
Mean Context recall:  0.8667
Mean Context precision:  0.8667
