# Testing the RAG agent

The objective on this notebook is to test the RAG agent to identify prices!

The previous notebook can be found: [`experiments/agent.ipynb`](agent.ipynb).

This notebook will test:
* Context: 
    * Precision;
    * Recall. 
* Faithfulness
* Medium latency;
* footprint: 
    * RAM;
    * CPU.

---

Importing packages:

In [3]:
!pip install ragas rapidfuzz pandas



In [30]:
from langchain_core.messages import SystemMessage, HumanMessage

from pandas import DataFrame

from ragas.llms import LangchainLLMWrapper
from ragas.dataset_schema import SingleTurnSample, EvaluationDataset
from ragas.metrics import (
    NonLLMContextRecall, 
    NonLLMContextPrecisionWithReference,
    Faithfulness
    )

from time import time
from typing import TypedDict
from json import load

In [5]:
%cd ..

/home/gabael/projetos/precificador


In [6]:
from src.ollama.llm import chat

from src.graph.rag import generate_price
from src.rag.rag import generate_vector_store, generate_documents_from_web

USER_AGENT environment variable not set, consider setting it to identify your requests.


---

Prepare test environment:

In [7]:
class Question(TypedDict):
    query: str
    product: str

In [8]:
with open("dataset/questions.json", 'r') as f:
    questions: list[Question] = load(f)

In [9]:
products = set(q['product'] for q in questions)

In [10]:
documents = {
    product: generate_documents_from_web(product)
    for product in products
    }

In [11]:
vector_stores = {
    product: generate_vector_store(documents[product])
    for product in products
}

In [12]:
retrievals = [
    vector_stores[question['product']].similarity_search(question['query'], 10)
    for question in questions
]

In [13]:
def break_line() -> str:
    return '\n'

answers = [
    chat.invoke([
        SystemMessage(f"""Você é um assistente especializado em realizar pesquisas de preço. Responda apenas com o preço e onde comprar.
                                  
                        Utilize esta base de conhecimento: {
                            break_line().join([
                                doc.page_content for doc in retrieval
                            ])}."""),
        HumanMessage(question['query'])])
    for retrieval, question in zip(retrievals, questions)
]

In [14]:
single_samples = [
    SingleTurnSample(
        user_input=question['query'],
        response=str(answer.content),
        reference=question['query'],
        retrieved_contexts=[r.page_content for r in retrieval],
        reference_contexts=[doc.page_content for doc in documents[question['product']]]
        ) 
    for (question, retrieval, answer) 
    in zip(questions, retrievals, answers)
]

In [15]:
dataset = EvaluationDataset(single_samples)

In [16]:
llm = LangchainLLMWrapper(chat)

---

Context:

In [17]:
non_llm_context_recall = NonLLMContextRecall() 
non_llm_context_precision_with_reference = NonLLMContextPrecisionWithReference()

In [18]:
non_llm_context_recall_list = [non_llm_context_recall.single_turn_score(sample) for sample in dataset]

In [19]:
non_llm_context_precision_with_reference_list = [non_llm_context_precision_with_reference.single_turn_score(sample) for sample in dataset]

In [22]:
context_evaluation_table = DataFrame(
    data=[list(pair) for pair in zip(non_llm_context_recall_list, non_llm_context_precision_with_reference_list)],
    columns=["context recall", "context precision"]
    )

context_evaluation_table

Unnamed: 0,context recall,context precision
0,0.070423,1.0
1,0.091255,1.0
2,0.161017,1.0
3,0.458333,1.0
4,0.096,1.0
5,0.095238,1.0
6,1.0,1.0
7,0.55,1.0
8,0.764706,1.0
9,0.916667,1.0


In [33]:
context_evaluation_table['context recall'].mean()

0.4203638214806557

In [34]:
context_evaluation_table['context precision'].mean()

0.999999999989889

---

Faithfulness:

In [31]:
faithfulness = Faithfulness(llm=llm)

In [32]:
faithfulness_list = [faithfulness.single_turn_score(sample) for sample in dataset]

Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt n_l_i_statement_prompt failed to parse output: The output parser failed to parse the output including retries.


RagasOutputParserException: The output parser failed to parse the output including retries.

---

Medium Latency:

In [35]:
times = list()

for question in questions:
    begin = time()
    price_state = generate_price({
        'category': '', 
        'product': question['product'], 
        'query': question['query']
        })
    end = time() - begin
    times.append(end)

In [36]:
print("Medium latency:", sum(times) / len(times))

Medium latency: 51.04858076572418
