# Testing the RAG agent

The objective on this notebook is to test the RAG agent to identify prices!

The previous notebook can be found: [`experiments/agent.ipynb`](agent.ipynb).

This notebook will test:
* Context: 
    * Precision;
    * Recall. 
* Faithfulness
* Medium latency;

---

Importing packages:

In [1]:
!pip install ragas rapidfuzz pandas



In [None]:
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_ollama import ChatOllama
from ollama import Client

from pandas import DataFrame

from ragas.llms import LangchainLLMWrapper
from ragas.dataset_schema import SingleTurnSample, EvaluationDataset
from ragas import evaluate
from ragas.metrics import (
    Faithfulness, 
    AnswerCorrectness,
    AnswerRelevancy,
    ContextRecall,
    ContextPrecision,
    NonLLMContextPrecisionWithReference,
    NonLLMContextRecall
    )

from ragas.config import run

from time import time
from typing import TypedDict
from json import load

In [3]:
%cd ..

/home/gabael/projetos/precificador


In [37]:
from src.ollama.llm import chat
from src.ollama.embedding import embeddings

from src.graph.rag import generate_price
from src.rag.rag import generate_vector_store, generate_documents_from_web

---

Prepare test environment:

In [None]:
OLLAMA_URL = "http://localhost:11434"
EVALUATE_MODEL = "llama3-2:1b"

In [6]:
client = Client(OLLAMA_URL)

In [7]:
client.pull(EVALUATE_MODEL)

ProgressResponse(status='success', completed=None, total=None, digest=None)

In [8]:
evaluate_chat = ChatOllama(base_url=OLLAMA_URL, model=EVALUATE_MODEL)

In [9]:
class Question(TypedDict):
    query: str
    product: str

In [10]:
with open("dataset/questions.json", 'r') as f:
    questions: list[Question] = load(f)

In [11]:
products = set(q['product'] for q in questions)

In [12]:
documents = {
    product: generate_documents_from_web(product)
    for product in products
    }

In [13]:
vector_stores = {
    product: generate_vector_store(documents[product])
    for product in products
}

In [14]:
retrievals = [
    vector_stores[question['product']].similarity_search(question['query'], 10)
    for question in questions
]

In [15]:
def break_line() -> str:
    return '\n'

answers = [
    chat.invoke([
        SystemMessage(f"""Você é um assistente especializado em realizar pesquisas de preço. Responda apenas com o preço e onde comprar.
                                  
                        Utilize esta base de conhecimento: {
                            break_line().join([
                                doc.page_content for doc in retrieval
                            ])}."""),
        HumanMessage(question['query'])])
    for retrieval, question in zip(retrievals, questions)
]

In [16]:
single_samples = [
    SingleTurnSample(
        user_input=question['query'],
        response=str(answer.content),
        reference=question['query'],
        retrieved_contexts=[r.page_content for r in retrieval],
        reference_contexts=[doc.page_content for doc in documents[question['product']]]
        ) 
    for (question, retrieval, answer) 
    in zip(questions, retrievals, answers)
]

In [33]:
dataset = EvaluationDataset(single_samples)

In [34]:
#llm = LangchainLLMWrapper(evaluate_chat)
llm = LangchainLLMWrapper(chat)

---

Evaluate:

In [40]:
metrics = [
    Faithfulness(), 
    AnswerCorrectness(),
    AnswerRelevancy(),
    ContextRecall(),
    ContextPrecision(),
    NonLLMContextPrecisionWithReference(),
    NonLLMContextRecall()
]

In [None]:
score = evaluate(
    dataset=dataset,
    metrics=metrics,
    llm=llm,
    embeddings=embeddings,
    raise_exceptions=False,
)

Evaluating:   5%|▌         | 7/140 [04:23<1:50:08, 49.68s/it]Exception raised in Job[14]: TimeoutError()
Exception raised in Job[16]: TimeoutError()
Exception raised in Job[17]: TimeoutError()
Exception raised in Job[18]: TimeoutError()
Exception raised in Job[7]: TimeoutError()
Exception raised in Job[1]: TimeoutError()
Exception raised in Job[10]: TimeoutError()
Exception raised in Job[0]: TimeoutError()
Exception raised in Job[11]: TimeoutError()
Exception raised in Job[3]: TimeoutError()
Exception raised in Job[15]: TimeoutError()
Exception raised in Job[4]: TimeoutError()
Exception raised in Job[21]: TimeoutError()
Exception raised in Job[8]: TimeoutError()
Exception raised in Job[9]: TimeoutError()
Evaluating:  21%|██▏       | 30/140 [09:01<49:55, 27.23s/it] Exception raised in Job[25]: TimeoutError()
Exception raised in Job[23]: TimeoutError()
Exception raised in Job[30]: TimeoutError()
Exception raised in Job[36]: TimeoutError()
Exception raised in Job[29]: TimeoutError()
Excep

In [None]:
df = score.to_pandas()

In [None]:
df

In [None]:
df.to_csv("dataset/evaluation.csv")

---

Context:

In [19]:
non_llm_context_recall = NonLLMContextRecall() 
non_llm_context_precision_with_reference = NonLLMContextPrecisionWithReference()

In [20]:
non_llm_context_recall_list = [non_llm_context_recall.single_turn_score(sample) for sample in dataset]

In [21]:
non_llm_context_precision_with_reference_list = [non_llm_context_precision_with_reference.single_turn_score(sample) for sample in dataset]

In [22]:
context_evaluation_table = DataFrame(
    data=[list(pair) for pair in zip(non_llm_context_recall_list, non_llm_context_precision_with_reference_list)],
    columns=["context recall", "context precision"]
    )

context_evaluation_table

Unnamed: 0,context recall,context precision
0,0.097826,1.0
1,0.373288,1.0
2,0.33871,1.0
3,0.47619,1.0
4,0.087248,1.0
5,0.102326,1.0
6,0.842105,1.0
7,0.55,1.0
8,0.777778,1.0
9,0.785714,1.0


In [23]:
context_evaluation_table['context recall'].mean()

0.4489125374182918

In [24]:
context_evaluation_table['context precision'].mean()

0.9999999999896666

---

Medium Latency:

In [27]:
times = list()

for question in questions:
    begin = time()
    price_state = generate_price({
        'category': '', 
        'product': question['product'], 
        'query': question['query']
        })
    end = time() - begin
    times.append(end)

In [28]:
print("Medium latency:", sum(times) / len(times))

Medium latency: 63.65097672939301
