# Task 3: RAG Core Logic & Qualitative Evaluation

This notebook evaluates the Retrieval-Augmented Generation (RAG) system built
using precomputed complaint embeddings provided in `complaints_embeddings.parquet`.

The goal is to assess how well the system retrieves relevant complaint excerpts
and generates grounded, accurate answers to user questions.


In [1]:
import pandas as pd

df = pd.read_parquet(
    "../data/raw/complaint_embeddings.parquet",
#columns=["embedding", "complaint_text"]  # only necessary columns
)


In [3]:
df.columns
df.info

<bound method DataFrame.info of                  id                                           document  \
0        14069121_0  a card was opened under my name by a fraudster...   
1        14061897_0  i made the mistake of using my wellsfargo debi...   
2        14061897_1  and got a letter stating my dispute was reject...   
3        14047085_0  dear cfpb, i have a secured credit card with c...   
4        14047085_1  y confirmation whatsoever to report to the pol...   
...             ...                                                ...   
1375322   6238123_1  tract i had hey and i explained to them that i...   
1375323   6238123_2  my balance and i have the documents to show th...   
1375324   6238123_3  alled a crew and then looking back at the cont...   
1375325   6238123_4  know my car was repossessed on now i've been c...   
1375326   6238123_5                                               this   

                                                 embedding  \
0        [-0.0427

In [4]:
import pandas as pd
import sys, os
sys.path.insert(0, os.path.abspath(".."))


from src.rag_pipeline import RAGPipeline


In [5]:
rag = RAGPipeline()


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Device set to use cpu


In [7]:
evaluation_questions = [
    "What are the most common credit card complaints?",
    "Do customers complain about late payment fees?",
    "What issues do customers face with loan approvals?",
    "Are there complaints related to customer service responsiveness?",
    "What fraud-related issues are mentioned in the complaints?",
    "Do customers report problems with interest rates?",
    "What issues are raised regarding account closures?"
]


In [8]:
results = []

for question in evaluation_questions:
    output = rag.run(question)
    
    results.append({
        "Question": question,
        "Generated Answer": output["answer"],
        "Retrieved Sources": " | ".join(output["sources"]),
        "Quality Score (1-5)": "",  # Fill manually
        "Comments / Analysis": ""   # Fill manually
    })

df = pd.DataFrame(results)
df


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Unnamed: 0,Question,Generated Answer,Retrieved Sources,Quality Score (1-5),Comments / Analysis
0,What are the most common credit card complaints?,The most common credit card complaints are abo...,any. i am sure that you had many other complai...,,
1,Do customers complain about late payment fees?,"Yes, based on the context provided, the custom...","in their hands, not somewhere magically in lim...",,
2,What issues do customers face with loan approv...,Customers have expressed concerns about being ...,thus ruining my chances for approvals with oth...,,
3,Are there complaints related to customer servi...,"Yes, based on the context provided, there are ...",e about their customers that they hire custome...,,
4,What fraud-related issues are mentioned in the...,The complaint mentions fraudulent transactions...,stently since first reporting the fraud claim ...,,
5,Do customers report problems with interest rates?,"Yes, based on the context provided, customers ...",interests rate they simply dont care to help c...,,
6,What issues are raised regarding account closu...,The customers raised issues regarding the sudd...,", and have never engaged in any suspicious or ...",,


## Evaluation Guidelines

Quality Score:
- **5** – Fully accurate, well-grounded, clear
- **4** – Mostly correct, minor missing details
- **3** – Partially correct or vague
- **2** – Weak grounding or incomplete
- **1** – Incorrect or hallucinated

Fill in the Quality Score and Comments columns manually based on these criteria.


In [9]:
df.loc[0, "Quality Score (1-5)"] = 4
df.loc[0, "Comments / Analysis"] = "Answer is accurate and grounded but lacks specific examples."

df.loc[1, "Quality Score (1-5)"] = 5
df.loc[1, "Comments / Analysis"] = "Clearly explains late fee complaints using retrieved context."

df


Unnamed: 0,Question,Generated Answer,Retrieved Sources,Quality Score (1-5),Comments / Analysis
0,What are the most common credit card complaints?,The most common credit card complaints are abo...,any. i am sure that you had many other complai...,4.0,Answer is accurate and grounded but lacks spec...
1,Do customers complain about late payment fees?,"Yes, based on the context provided, the custom...","in their hands, not somewhere magically in lim...",5.0,Clearly explains late fee complaints using ret...
2,What issues do customers face with loan approv...,Customers have expressed concerns about being ...,thus ruining my chances for approvals with oth...,,
3,Are there complaints related to customer servi...,"Yes, based on the context provided, there are ...",e about their customers that they hire custome...,,
4,What fraud-related issues are mentioned in the...,The complaint mentions fraudulent transactions...,stently since first reporting the fraud claim ...,,
5,Do customers report problems with interest rates?,"Yes, based on the context provided, customers ...",interests rate they simply dont care to help c...,,
6,What issues are raised regarding account closu...,The customers raised issues regarding the sudd...,", and have never engaged in any suspicious or ...",,


In [10]:
print(df.to_markdown(index=False))


| Question                                                         | Generated Answer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | Retrieved Sources                                                                                                                                                                                                                                                                                                                                                     