The LLM Evaluation Framework
-
Updated
May 24, 2025 - Python
The LLM Evaluation Framework
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪
The official evaluation suite and dynamic data release for MixEval.
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.
Develop reliable AI apps
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Benchmarking Large Language Models for FHIR
FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts
Realign is a testing and simulation framework for AI applications.
Code for "Prediction-Powered Ranking of Large Language Models", NeurIPS 2024.
Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.
TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Multilingual Evaluation Toolkits
Community Plugin for Genkit to use Promptfoo
Shin Rakuda is a comprehensive framework for evaluating and benchmarking Japanese large language models, offering researchers and developers a flexible toolkit for assessing LLM performance across diverse datasets.
An open-source evaluation framework for measuring LLM steerability.
Hackable, simple, llm evals on preference datasets
Add a description, image, and links to the llm-evaluation-framework topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation-framework topic, visit your repo's landing page and select "manage topics."