CEval is a production-grade continuous evaluation platform for Large Language Models (LLMs). Monitor and evaluate LLM outputs in real-time across multiple critical dimensions.
- Multi-Metric Evaluation: Hallucination detection, toxicity scoring, compliance checks, latency tracking, and domain-specific accuracy.
- Drift Detection: Statistical anomaly detection with configurable alerts.
- Multi-Provider Support: Works with OpenAI, Anthropic, Google Gemini via LiteLLM.
- Multiple SDKs: Native support for both Python and TypeScript/JavaScript.
- Real-Time Alerts: Slack, email, and webhook notifications.
- REST API: Easy integration with any programming language.
- CLI Tool: Rich terminal interface for administration and local testing.
To guarantee 100% evaluation coverage and simplify development, CEval now includes a powerful LLM Gateway.
The gateway acts as a central proxy for all LLM calls. Instead of your apps calling OpenAI or other providers directly, they call the gateway. The gateway forwards the request, gets the response, and automatically triggers a CEval evaluation in the background before returning the result to your app.
This provides:
- Automatic, guaranteed evaluation for all traffic.
- Centralized API key management.
- Unified API for multiple LLM backends.
For a detailed explanation of the architecture and benefits, see the LLM Gateway Documentation.
The project is now organized into a backend application, a gateway, and two separate SDKs.
ceval/
βββ backend/ # Main evaluation application (FastAPI Server, CLI)
β βββ pyproject.toml
β βββ main.py
βββ gateway/ # Centralized LLM Gateway (FastAPI Proxy)
β βββ pyproject.toml
β βββ main.py
βββ sdk/ # Core Python SDK
β βββ ceval/
β βββ pyproject.toml
βββ typescript-sdk/ # TypeScript SDK for Node.js/Browsers
β βββ src/
β βββ package.json
β βββ tsconfig.json
βββ docs/ # Documentation
βββ README.md
- Python 3.10+ and Poetry
- Node.js and npm (for the TypeScript SDK)
- PostgreSQL (with TimescaleDB extension recommended)
- Redis (for async job queue)
The backend is configured to use the sdk as a local, editable package.
# This installs the backend's dependencies AND the local Python SDK
cd backend
poetry installFrom the backend directory:
cp .env.example .env
# Edit .env with your database and API key configurationEnsure you are in the backend directory.
# Initialize the database schema
poetry run ceval init
# Start the API server
poetry run ceval serveThe API will be available at http://localhost:8000.
You can interact with CEval through the REST API or by using one of the dedicated SDKs.
The ceval Python SDK provides a direct and efficient way to integrate the evaluation logic into your Python applications.
from ceval.orchestrator import EvaluationOrchestrator
from ceval.db import SessionLocal
db = SessionLocal()
orchestrator = EvaluationOrchestrator(db)
eval_run = await orchestrator.evaluate(
prompt="What is compound interest?",
response="Compound interest is interest calculated on principal and accumulated interest.",
model_name="gpt-4o",
domain="finance",
)
for metric in eval_run.metrics:
print(f"{metric.metric_name}: {metric.score:.3f}")For JavaScript and TypeScript environments, you can use the ceval-sdk-ts package.
Installation & Build
cd typescript-sdk
npm install
npm run buildUsage The client provides a clean, type-safe interface to the REST API.
import { CevalClient, EvaluationParams } from './typescript-sdk/dist';
// Initialize the client with the base URL of your CEval server
const ceval = new CevalClient('http://localhost:8000');
async function runExample() {
const params: EvaluationParams = {
prompt: "What is the capital of France?",
response: "The capital of France is Paris.",
model_name: "gpt-4o",
domain: "general"
};
const submission = await ceval.evaluate(params);
const results = await ceval.getEvaluationResult(submission.run_id);
console.log("--- Evaluation Complete ---");
for (const metric of results.metrics) {
console.log(` - ${metric.metric_name}: ${metric.score.toFixed(2)} (Passed: ${metric.passed})`);
}
}
runExample();You can also call the REST API directly from any language.
curl -X POST http://localhost:8000/api/v1/evaluate \\\
-H "Content-Type: application/json" \\\
-d '{
"prompt": "What is ML?",
"response": "Machine learning is...",
"model_name": "gpt-4o",
"domain": "general"
}'- Core evaluation engine
- Python SDK
- TypeScript SDK
- REST API & CLI
- React dashboard
- Celery integration for async jobs
- Multi-tenancy support
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE file for details.