Skip to content

ashwinirajm/ai-quality-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Quality Evaluation Framework

Enterprise-style AI Quality Assurance framework for validating Generative AI / LLM applications using automated evaluation techniques.

Built using:

  • Python
  • Pytest
  • Gemini API
  • Sentence Transformers
  • GitHub Actions

Features

  • AI response relevancy validation
  • Hallucination detection
  • Prompt regression testing
  • Latency testing
  • Semantic similarity scoring
  • Mock fallback mode
  • CI/CD quality gates
  • Automated AI evaluations

Evaluation Types

Evaluation Purpose Metric
Relevancy Testing Validate semantic correctness Cosine Similarity
Hallucination Testing Detect unsupported responses Context Similarity
Prompt Regression Detect prompt quality degradation Semantic Score
Latency Testing Validate response performance Response Time

Project Structure

ai-quality-framework/
│
├── datasets/
│   ├── hallucination_dataset.json
│   ├── latency_dataset.json
│   ├── prompt_regression_dataset.json
│   └── relevancy_dataset.json
│
├── evaluators/
│   ├── hallucination_evaluator.py
│   └── relevancy_evaluator.py
│
├── prompts/
│   ├── prompt_v1.txt
│   └── prompt_v2.txt
│
├── services/
│   └── gemini_service.py
│
├── tests/
│   ├── test_hallucination.py
│   ├── test_latency.py
│   ├── test_prompt_regression.py
│   └── test_relevancy.py
│
├── .github/workflows/
│   └── ai-evals.yml
│
├── requirements.txt
└── README.md


Setup

git clone <repo_url>
cd ai-quality-framework

python3 -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

Create .env

GOOGLE_API_KEY=your_api_key

Run tests:

pytest tests -s

CI/CD Pipeline

GitHub Actions automatically:

  • installs dependencies
  • executes AI evaluation tests
  • validates AI quality gates

Pipeline file:

.github/workflows/ai-evals.yml

Sample Test Execution Result

Screenshot 2026-05-10 at 6 05 14 PM

Mock Mode

Framework supports:

  • Live LLM testing
  • Mock fallback testing

Benefits:

  • deterministic test execution
  • stable CI pipelines
  • reduced API cost
  • offline testing support

About

Enterprise-style AI Quality Evaluation Framework for testing Generative AI/LLM applications using automated evals, hallucination detection, prompt regression testing, latency validation, and CI/CD quality gates.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages