# RAG Complaint Analysis Notebook

- This notebook demonstrates running the full RAG pipeline with the Mistral-7B-Instruct model, retrieving complaint chunks, generating answers, and performing qualitative evaluation in a clean and reusable way.

Import Libraries

In [1]:
import sys
import os
import pandas as pd

# -------------------------
#  FIX PYTHON PATH FOR NOTEBOOK
# -------------------------

# Path to project root (one level above 'notebooks')
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))

# Path to `src` directory
scripts_path = os.path.join(project_root, "src")

# Add both project root and scripts folder to sys.path
if project_root not in sys.path:
    sys.path.append(project_root)

if scripts_path not in sys.path:
    sys.path.append(scripts_path)

print("Project root:", project_root)
print("Scripts path:", scripts_path)

# -------------------------
#  IMPORTS
# -------------------------

try:
    from src.rag_pipeline import build_rag_pipeline
    from src.evaluation import run_evaluation, results_to_dataframe, results_to_markdown_table, EVALUATION_QUESTIONS
    from src.vector_store import ComplaintVectorStore
    from src.retriever import build_retriever, ComplaintRetriever
    from src.generator import build_generator, RAGGenerator
    print("Imports successful!")
except ModuleNotFoundError as e:
    print("Import failed:", e)
    raise

Project root: /Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot
Scripts path: /Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/src


  from .autonotebook import tqdm as notebook_tqdm


Imports successful!


In [3]:
vectorstore = ComplaintVectorStore.from_parquet(
    parquet_path="/Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/data/raw/complaint_embeddings.parquet",
    index_path="/Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/faiss.index",
    meta_path="/Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/metadata.json",
    batch_size=5000
)

Building FAISS index: 100%|██████████| 276/276 [18:06<00:00,  3.94s/it]

✅ FAISS index built and saved to /Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/faiss.index
✅ Metadata saved to /Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/metadata.json





In [3]:
vectorstore = ComplaintVectorStore.load(
    index_path="/Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/faiss.index",
    meta_path="/Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/metadata.json"
)

#### Initialize RAG Pipeline\n",
Build retriever + generator with Mistral model

In [8]:
# Pass the actual file paths as strings
retriever = build_retriever(
    index_path="/Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/faiss.index",
    meta_path="/Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/metadata.json"
)



In [9]:
generator = build_generator(
    llm_model_path="/Users/elbethelzewdie/Downloads/rag-complaint-chatbot/rag-complaint-chatbot/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf"
)

llama_context: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_set_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_c4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 

In [10]:
questions = [
    # Credit Card
    "Why do customers report being charged unexpected fees on their credit cards?",
    "What are the common complaints regarding credit card customer service or dispute handling?",
    # Personal Loan
    "Why are some personal loan applications denied, according to customer complaints?",
    "What issues do customers report about personal loan repayment or interest rates?",
    # Savings Account
    "What problems do customers report with opening or managing savings accounts?",
    "Are there complaints about delayed transactions or account access issues in savings accounts?",
    # Money Transfers
    "What are the most frequent complaints customers have about domestic or international money transfers?",
    "Why do customers report delays or failures when making money transfers?"
]


In [11]:
results = []

# Using your EVALUATION_QUESTIONS list defined earlier
for q in EVALUATION_QUESTIONS:
    # 1. Retrieve the chunks using your ComplaintRetriever
    # 'retriever' must be an instance created by build_retriever
    retrieved_chunks = retriever.retrieve(q, k=5)

    # 2. Generate the answer using your RAGGenerator
    # 'generator' must be an instance created by build_generator
    answer = generator.generate(q, retrieved_chunks)

    # 3. Format sources from the retrieved metadata
    # We use .get() because your metadata is a dictionary
    sources = [
        f"{item['metadata'].get('company', 'N/A')} - {item['metadata'].get('issue', 'N/A')}"
        for item in retrieved_chunks
    ]

    results.append({
        "Question": q,
        "Generated Answer": answer,
        "Retrieved Sources": " | ".join(sources[:2]), # Showing top 2 for the table
        "Quality Score (1-5)": "", # Placeholder for your manual evaluation
        "Comments": ""             # Placeholder for your analysis
    })

# 4. Display results in a clean DataFrame for your report
import pandas as pd
df_evaluation = pd.DataFrame(results)

# If in a Jupyter notebook, this will display the table nicely
df_evaluation




Unnamed: 0,Question,Generated Answer,Retrieved Sources,Quality Score (1-5),Comments
0,Why do customers report being charged unexpect...,Customers report being charged unexpected fees...,"Bread Financial Holdings, Inc. - Fees or inter...",,
1,What are the common complaints regarding credi...,"Based on the provided context, common complain...","BANK OF AMERICA, NATIONAL ASSOCIATION - Proble...",,
2,Why are some personal loan applications denied...,Personal loan applications can be denied for s...,"CITIBANK, N.A. - Getting the loan | ENOVA INTE...",,
3,What issues do customers report about personal...,Customers report issues related to personal lo...,JPMORGAN CHASE & CO. - Managing an account | O...,,
4,What problems do customers report with opening...,"Based on the provided context, the issues cust...","BMO Bank, N.A. - Closing an account | BARCLAYS...",,
5,Are there complaints about delayed transaction...,"Based on the provided context, there is no dir...",ALLY FINANCIAL INC. - Managing an account | BM...,,
6,What are the most frequent complaints customer...,"Based on the provided context, the most freque...",U.S. BANCORP - Managing an account | Revolut T...,,
7,Why do customers report delays or failures whe...,Customers report delays or failures when makin...,"BANK OF AMERICA, NATIONAL ASSOCIATION - Money ...",,
