Enterprise-style AI Quality Assurance framework for validating Generative AI / LLM applications using automated evaluation techniques.
Built using:
- Python
- Pytest
- Gemini API
- Sentence Transformers
- GitHub Actions
- AI response relevancy validation
- Hallucination detection
- Prompt regression testing
- Latency testing
- Semantic similarity scoring
- Mock fallback mode
- CI/CD quality gates
- Automated AI evaluations
| Evaluation | Purpose | Metric |
|---|---|---|
| Relevancy Testing | Validate semantic correctness | Cosine Similarity |
| Hallucination Testing | Detect unsupported responses | Context Similarity |
| Prompt Regression | Detect prompt quality degradation | Semantic Score |
| Latency Testing | Validate response performance | Response Time |
ai-quality-framework/
│
├── datasets/
│ ├── hallucination_dataset.json
│ ├── latency_dataset.json
│ ├── prompt_regression_dataset.json
│ └── relevancy_dataset.json
│
├── evaluators/
│ ├── hallucination_evaluator.py
│ └── relevancy_evaluator.py
│
├── prompts/
│ ├── prompt_v1.txt
│ └── prompt_v2.txt
│
├── services/
│ └── gemini_service.py
│
├── tests/
│ ├── test_hallucination.py
│ ├── test_latency.py
│ ├── test_prompt_regression.py
│ └── test_relevancy.py
│
├── .github/workflows/
│ └── ai-evals.yml
│
├── requirements.txt
└── README.md
git clone <repo_url>
cd ai-quality-framework
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCreate .env
GOOGLE_API_KEY=your_api_keyRun tests:
pytest tests -sGitHub Actions automatically:
- installs dependencies
- executes AI evaluation tests
- validates AI quality gates
Pipeline file:
.github/workflows/ai-evals.yml
Framework supports:
- Live LLM testing
- Mock fallback testing
Benefits:
- deterministic test execution
- stable CI pipelines
- reduced API cost
- offline testing support