# KIAM8

## Task 3: Building the RAG Core Logic and Evaluation

Objective: Build the retrieval and generation pipeline using the pre-built full-scale vector store, and evaluate its effectiveness.

In [1]:
import os
import warnings
import ssl
import logging
import requests
from urllib3.exceptions import InsecureRequestWarning

# 1. SET ENVIRONMENT VARIABLES FIRST (Before any other imports)
os.environ['HF_HUB_DISABLE_SSL_VERIFY'] = '1'
os.environ['CURL_CA_BUNDLE'] = ''
os.environ['REQUESTS_CA_BUNDLE'] = ''

# 2. SUPPRESS WARNINGS AT THE HIGHEST LEVEL
from urllib3.exceptions import InsecureRequestWarning
import requests

# This must happen before you import transformers or sentence_transformers
requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)
warnings.filterwarnings("ignore", category=InsecureRequestWarning)

# 3. NOW IMPORT YOUR LLM LIBRARIES
import logging
import ssl
# from transformers import ...
# from sentence_transformers import ...

# 4. LOGGING SETUP
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def disable_ssl_verification():
    logger.warning("SSL Verification is DISABLED globally.")
    ssl._create_default_https_context = ssl._create_unverified_context
    os.environ['PYTHONHTTPSVERIFY'] = '0'

disable_ssl_verification()



In [2]:
# Import Essential Libraries

import pandas as pd
import numpy as np
import os
import sys

# Add Scripts Path to Envirpment to import pre-defined sctipts

import sys
import os

cwd = os.getcwd()
scripts_path = os.path.join(cwd,'..','src')
scripts_abs_path = os.path.abspath(scripts_path)

if scripts_abs_path not in sys.path and os.path.isdir(scripts_abs_path):
    sys.path.append(scripts_abs_path)
    print ('Scripts Path Added to Environment')
else:
    print('Invalid File Path or Scripts Path Already Added to Environment')


Scripts Path Added to Environment


### Sub Tasks

#### ●	Load the pre-built vector store provided in the dataset resources. This contains embeddings for the complete filtered dataset.

In [None]:
# Import the Vector Store

from text_vectorization import FAISSVectorStore

# For the full dataset, load the pre-built embeddings

df_full_embeddings = pd.read_parquet('../data/raw/complaint_embeddings-001.parquet')

print(f"Loaded {len(df_full_embeddings)} chunks")
print(f"Columns: {df_full_embeddings.columns.tolist()}")

# Create full-scale vector store
full_vector_store = FAISSVectorStore(embedding_dim=384)

# Prepare metadata
full_metadata = df_full_embeddings.drop('embedding', axis=1).to_dict('records')

# Add embeddings
full_embeddings = np.array(df_full_embeddings['embedding'].tolist())
full_vector_store.add_embeddings(full_embeddings, full_metadata)

# Save
full_vector_store.save(
    index_path='../vector_store/FAISS/faiss_full_git index.bin',
    metadata_path='../vector_store/FAISS/full_metadata.pkl'
)

INFO:faiss.loader:Loading faiss with AVX2 support.
INFO:faiss.loader:Successfully loaded faiss with AVX2 support.


Loaded 1375327 chunks
Columns: ['id', 'document', 'embedding', 'metadata']


INFO:text_vectorization:Added 1375327 vectors. Total store size: 1375327
INFO:text_vectorization:Successfully persisted FAISS index and metadata.


#### ●	Retriever Implementation:
        ○	Create a function that takes a user's question (string) as input.
        ○	Embeds the question using the same model from Task 2 (all-MiniLM-L6-v2).
        ○	Performs a similarity search against the vector store to retrieve the top-k most relevant text chunks. k=5 is a good starting point.


In [4]:
from retriver import ComplaintRetriever

from sentence_transformers import SentenceTransformer

# Initialize retriever

embed_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
retriever = ComplaintRetriever(full_vector_store, embed_model)

# Test retrieval
test_query = "Why are customers complaining about credit card fees?"
results = retriever.retrieve(test_query, k=5)

print(f"Query: {test_query}\n")
for idx, result in enumerate(results, 1):
    print(f"Result {idx} (score: {result['score']:.3f}):")
    print(f"  Product: {result['metadata']['product_category']}")
    print(f"  Text: {result['document'][:200]}...")
    print()

  from .autonotebook import tqdm as notebook_tqdm
INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cuda:0
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
Batches: 100%|██████████| 1/1 [00:00<00:00,  3.50it/s]

Query: Why are customers complaining about credit card fees?

Result 1 (score: 0.663):
  Product: Credit Card
  Text: this is what is wrong and many in our country. credit card companies provide credit and credit limits with no fees and low interest rates. once people actually use the credit and have balances they do...

Result 2 (score: 0.658):
  Product: Credit Card
  Text: sted we complain to them to reduce their fees to . it feels like we, the consumer, is simply being used to as a lobbying tool in a political game. the convenience of paying by credit card benefits bot...

Result 3 (score: 0.655):
  Product: Credit Card
  Text: they are charging so much in fees can not get the card paid down due to fees....

Result 4 (score: 0.653):
  Product: Credit Card
  Text: fees 25.00 i consider this to be a very deceptive practice by the credit card company. therefore i am lodging this complaint....

Result 5 (score: 0.650):
  Product: Credit Card
  Text: i open credit cards and then i dont 




In [5]:
from pipeline_RAG import ComplaintGenerator , RAGPipeline

# Initialize RAG pipeline

generator = ComplaintGenerator()  # or OpenAIGenerator()
rag_pipeline = RAGPipeline(retriever, generator)

# Test the pipeline
result = rag_pipeline.answer_question(
    "What are the main issues customers have with credit cards?"
)


print(f"Question: {result['question']}\n")
print(f"Answer: {result['answer']}\n")
print(f"\nBased on {result['num_sources']} sources")

INFO:pipeline_RAG:Using device: cuda
`torch_dtype` is deprecated! Use `dtype` instead!
INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|██████████| 3/3 [00:07<00:00,  2.51s/it]
Device set to use cuda:0
Batches: 100%|██████████| 1/1 [00:00<00:00, 63.37it/s]


Question: What are the main issues customers have with credit cards?

Answer: Based on the provided complaint excerpts, customers express concerns about the lack of service and clarity during times of stress or issues with their credit cards. Specific complaints include problems making critical purchases due to credit card issues, frustration with the credit card application process, and an excessive number of problems encountered in a short period of time. Some customers also mention dissatisfaction with the customer service response to their issues and feeling that their concerns were not adequately addressed. Additionally, there is a theme of feeling let down by the financial institution, leading some customers to consider discontinuing their use of credit cards altogether. However, it should be noted that the complaint excerpts do not provide enough context to determine the specific products or issues causing these concerns.


Based on 5 sources


#### ●	Qualitative Evaluation: This is the most important step.
        ○	Create a list of 5-10 representative questions you want your system to answer well 
        ○	For each question, run your RAG pipeline and analyze the results.
        ○	Create an evaluation table in your report (Markdown format is fine) with columns: Question, Generated Answer, Retrieved Sources (show 1-2), Quality Score (1-5), and Comments/Analysis.


In [9]:
# Create evaluation test cases
evaluation_questions = [
    {
        "question": "What are the most common complaints about credit card fees?",
        "expected_topics": ["annual fees", "late fees", "interest rates"]
    },
    {
        "question": "Why are customers unhappy with personal loan approval processes?",
        "expected_topics": ["denied applications", "credit score", "documentation"]
    },
    {
        "question": "What issues do customers face with money transfers?",
        "expected_topics": ["transfer delays", "fees", "failed transactions"]
    },
    {
        "question": "What are the main problems with savings accounts?",
        "expected_topics": ["fees", "account access", "interest rates"]
    },
    {
        "question": "Are there complaints about fraudulent credit card charges?",
        "expected_topics": ["unauthorized charges", "fraud detection", "disputes"]
    }
]

# Run evaluation
evaluation_results = []

for test_case in evaluation_questions:
    question = test_case["question"]
    result = rag_pipeline.answer_question(question, k=5)
    
    # Manual quality scoring (1-5)
    print(f"\n{'='*80}")
    print(f"Question: {question}")
    print(f"\nAnswer:\n{result['answer']}")
    print(f"\nTop 2 Sources:")
    for idx, source in enumerate(result['sources'][:2], 1):
        print(f"\n[Source {idx}] {source['metadata']['product_category']} - {source['metadata']['issue']}")
        print(f"{source['document'][:150]}...")
    
    quality_score = input("\nQuality Score (1-5): ")
    comments = input("Comments: ")
    
    evaluation_results.append({
        'question': question,
        'answer': result['answer'],
        'num_sources': result['num_sources'],
        'quality_score': int(quality_score),
        'comments': comments
    })

# Save evaluation results
eval_df = pd.DataFrame(evaluation_results)
eval_df.to_csv('../data/processed/evaluation_results.csv', index=False)

Batches: 100%|██████████| 1/1 [00:00<00:00, 57.01it/s]



Question: What are the most common complaints about credit card fees?

Answer:
The most common complaints about credit card fees include unexpected annual fees (Source 4), deceptive practices regarding the assessment of fees (Source 2), and excessive late fees that accumulate rapidly (Source 1 and Source 5). These complaints suggest that customers feel that the fees are unfair and can put them in a precarious financial position. In some cases, the fees are perceived as deceptive due to a lack of clear communication about changes to the terms of the account.

Top 2 Sources:

[Source 1] Credit Card - Fees or interest
rges ... other than late fees on top of late fees. i'm aware it's legal for credit card companies to assess these fees and that i was informed of such...

[Source 2] Credit Card - Fees or interest
fees 25.00 i consider this to be a very deceptive practice by the credit card company. therefore i am lodging this complaint....


Batches: 100%|██████████| 1/1 [00:00<00:00, 60.82it/s]



Question: Why are customers unhappy with personal loan approval processes?

Answer:
The complaint excerpts suggest that customers are unhappy with the unclear and inconsistent reasons for personal loan approvals or denials. Some customers have felt misled or pressured into applying for loans when they were not yet eligible. Others have experienced invasions of privacy, such as unauthorized discussions of their finances. The lack of clear communication and explanation regarding the approval process has led to frustration and mistrust among customers. Additionally, some customers have expressed concern that previous loan history, even if paid in full, may negatively impact their future loan applications.

Top 2 Sources:

[Source 1] Personal Loan - Getting the loan
to approved for personal loan when the person help me applied for my loan said it ok even i'm still new customer less than 1 month with ! if i'm a new...

[Source 2] Personal Loan - Getting the loan
their people are very nice 

Batches: 100%|██████████| 1/1 [00:00<00:00, 61.02it/s]



Question: What issues do customers face with money transfers?

Answer:
Customers have reported issues with their money being transferred to incorrect recipients or accounts, funds being held without explanation, and unauthorized account closures. These issues have caused financial and emotional hardship, as well as the need to maintain multiple accounts to avoid hassles. Some customers believe that banks and money transfer companies are in collusion and are not adequately investigating or protecting their clients from scams. Additionally, there are concerns about predatory practices, such as holding funds without cause and long wait times for resolutions.

Top 2 Sources:

[Source 1] Money Transfer - Other transaction problem
les and money transfers rather than passing the entire burden to the customers to do that. moreover they negatively impacted all the customers. this i...

[Source 2] Money Transfer - Managing, opening, or closing your mobile wallet account
s causing severe financi

Batches: 100%|██████████| 1/1 [00:00<00:00, 62.08it/s]



Question: What are the main problems with savings accounts?

Answer:
Based on the provided context, the main issues mentioned with savings accounts include:

1. Declining interest rates: Some customers mentioned that the interest rates on their savings accounts dropped significantly after the financial crisis in 2008, making it unattractive to keep a large balance in the savings account.
2. Account closure: A customer mentioned that their savings account was closed without their consent, which may have been due to a consolidation or rollover of older accounts.
3. Hidden fees: No specific mention of hidden fees was made in the context, but one customer expressed feeling trapped with an account they couldn't pay off due to additional costs.

It is important to note that the context does not provide enough information to make a definitive conclusion about the prevalence or cause of these issues. However, these themes appear in the complaints, and further investigation may be necessary to

Batches: 100%|██████████| 1/1 [00:00<00:00, 58.75it/s]



Question: Are there complaints about fraudulent credit card charges?

Answer:
Yes, based on the provided complaint excerpts, there are complaints about fraudulent credit card charges. Specifically, in sources 2, 3, 4, and 5, customers have reported finding fraudulent charges on their credit cards. In source 4, the customer even mentions that they have disputed these charges but were denied and never told why. In source 5, the customer states that they have been experiencing fraudulent charges for over two and a half years since their credit card was reported stolen.

Top 2 Sources:

[Source 1] Credit Card - Closing/Cancelling account
re n't any fraudulent charges? are there any credit card issuers whose fraud departments are more reasonable to deal with?...

[Source 2] Credit Card - Billing disputes
there were fraudulent charges on my credit card....
