GitHub - alinaleo27/ai-rag-eval-qa: AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.

alinaleo27 / ai-rag-eval-qa Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
notebooks		notebooks
prompts		prompts
.gitignore		.gitignore
README.rd		README.rd
requirements.txt		requirements.txt

Repository files navigation

📘 AI RAG Evaluation – QA Test Suite

This project demonstrates how an AI QA Specialist can evaluate a RAG (Retrieval-Augmented Generation) system using Ragas.
The repository covers:
LLM / RAG quality evaluation
retrieval error analysis (missing / wrong / irrelevant context)
automated RAG metrics (precision, recall, faithfulness)
basic prompt testing for a banking / fintech chatbot

🧩 Project structure
ai-rag-eval-qa/
├── README.md
├── requirements.txt
├── .gitignore
├── data/
│   └── rag_eval_dataset.jsonl
├── notebooks/
│   └── ragas_evaluation.py
└── prompts/
    └── prompt_tests.md

⚙️ Installation
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

🔒 API Key Configuration (.env)
This project uses OpenAI for semantic evaluation.
Create a .env file in the project root:
OPENAI_API_KEY=your_api_key_here
The .env file is in .gitignore, so your API key will stay on your machine only

🧪 Running evaluation
python3 notebooks/ragas_evaluation.py
The script loads the dataset and evaluates:
context_precision
context_recall
faithfulness
(You can also run in offline mode by disabling LLM usage.)

⚠️ Note on OpenAI Quota
The full evaluation requires an active OpenAI quota.
If the account has no credits or quota, you will see:
openai.RateLimitError: insufficient_quota
This is expected behavior and not an error in the project.
To run offline (no API calls):
results = evaluate(dataset, metrics=metrics, llm=None)
Note: faithfulness does not work without an LLM.

📂 Dataset
rag_eval_dataset.jsonl contains 10 fintech/banking examples for RAG evaluation.

💬 Prompt tests
Located in:
prompts/prompt_tests.md
Includes:
JSON output validation
jailbreak attempts
safety tests
consistency checks

🎯 Purpose
This repository serves as a compact example for:
RAG evaluation
LLM QA
retrieval diagnostics
prompt testing

Designed for AI QA / LLM QA roles.