CEval - Continuous Evaluation Platform for LLMs

CEval is a production-grade continuous evaluation platform for Large Language Models (LLMs). Monitor and evaluate LLM outputs in real-time across multiple critical dimensions.

🚀 Features

Multi-Metric Evaluation: Hallucination detection, toxicity scoring, compliance checks, latency tracking, and domain-specific accuracy.
Drift Detection: Statistical anomaly detection with configurable alerts.
Multi-Provider Support: Works with OpenAI, Anthropic, Google Gemini via LiteLLM.
Multiple SDKs: Native support for both Python and TypeScript/JavaScript.
Real-Time Alerts: Slack, email, and webhook notifications.
REST API: Easy integration with any programming language.
CLI Tool: Rich terminal interface for administration and local testing.

✨ New: The LLM Gateway

To guarantee 100% evaluation coverage and simplify development, CEval now includes a powerful LLM Gateway.

The gateway acts as a central proxy for all LLM calls. Instead of your apps calling OpenAI or other providers directly, they call the gateway. The gateway forwards the request, gets the response, and automatically triggers a CEval evaluation in the background before returning the result to your app.

This provides:

Automatic, guaranteed evaluation for all traffic.
Centralized API key management.
Unified API for multiple LLM backends.

For a detailed explanation of the architecture and benefits, see the LLM Gateway Documentation.

📁 Project Structure

The project is now organized into a backend application, a gateway, and two separate SDKs.

ceval/
├── backend/           # Main evaluation application (FastAPI Server, CLI)
│   ├── pyproject.toml
│   └── main.py
├── gateway/           # Centralized LLM Gateway (FastAPI Proxy)
│   ├── pyproject.toml
│   └── main.py
├── sdk/               # Core Python SDK
│   ├── ceval/
│   └── pyproject.toml
├── typescript-sdk/    # TypeScript SDK for Node.js/Browsers
│   ├── src/
│   ├── package.json
│   └── tsconfig.json
├── docs/              # Documentation
└── README.md

🛠️ Quick Start

Prerequisites

Python 3.10+ and Poetry
Node.js and npm (for the TypeScript SDK)
PostgreSQL (with TimescaleDB extension recommended)
Redis (for async job queue)

Installation

The backend is configured to use the sdk as a local, editable package.

# This installs the backend's dependencies AND the local Python SDK
cd backend
poetry install

Configuration

From the backend directory:

cp .env.example .env
# Edit .env with your database and API key configuration

Initialize Database & Start Server

Ensure you are in the backend directory.

# Initialize the database schema
poetry run ceval init

# Start the API server
poetry run ceval serve

The API will be available at http://localhost:8000.

📖 Usage

You can interact with CEval through the REST API or by using one of the dedicated SDKs.

Python SDK

The ceval Python SDK provides a direct and efficient way to integrate the evaluation logic into your Python applications.

from ceval.orchestrator import EvaluationOrchestrator
from ceval.db import SessionLocal

db = SessionLocal()
orchestrator = EvaluationOrchestrator(db)

eval_run = await orchestrator.evaluate(
    prompt="What is compound interest?",
    response="Compound interest is interest calculated on principal and accumulated interest.",
    model_name="gpt-4o",
    domain="finance",
)

for metric in eval_run.metrics:
    print(f"{metric.metric_name}: {metric.score:.3f}")

TypeScript SDK

For JavaScript and TypeScript environments, you can use the ceval-sdk-ts package.

Installation & Build

cd typescript-sdk
npm install
npm run build

Usage The client provides a clean, type-safe interface to the REST API.

import { CevalClient, EvaluationParams } from './typescript-sdk/dist';

// Initialize the client with the base URL of your CEval server
const ceval = new CevalClient('http://localhost:8000');

async function runExample() {
  const params: EvaluationParams = {
    prompt: "What is the capital of France?",
    response: "The capital of France is Paris.",
    model_name: "gpt-4o",
    domain: "general"
  };

  const submission = await ceval.evaluate(params);
  const results = await ceval.getEvaluationResult(submission.run_id);

  console.log("--- Evaluation Complete ---");
  for (const metric of results.metrics) {
    console.log(`  - ${metric.metric_name}: ${metric.score.toFixed(2)} (Passed: ${metric.passed})`);
  }
}

runExample();

REST API

You can also call the REST API directly from any language.

curl -X POST http://localhost:8000/api/v1/evaluate \\\
  -H "Content-Type: application/json" \\\
  -d '{
    "prompt": "What is ML?",
    "response": "Machine learning is...",
    "model_name": "gpt-4o",
    "domain": "general"
  }'

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
.idx		.idx
backend		backend
ceval		ceval
docs		docs
examples		examples
gateway		gateway
sdk		sdk
typescript-sdk		typescript-sdk
.env.example		.env.example
.gitignore		.gitignore
API_README.md		API_README.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
GITHUB_SETUP.md		GITHUB_SETUP.md
LICENSE		LICENSE
README.md		README.md
SDK_README.md		SDK_README.md
llm.md		llm.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CEval - Continuous Evaluation Platform for LLMs

🚀 Features

✨ New: The LLM Gateway

📁 Project Structure

🛠️ Quick Start

Prerequisites

Installation

Configuration

Initialize Database & Start Server

📖 Usage

Python SDK

TypeScript SDK

REST API

🚦 Roadmap

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CEval - Continuous Evaluation Platform for LLMs

🚀 Features

✨ New: The LLM Gateway

📁 Project Structure

🛠️ Quick Start

Prerequisites

Installation

Configuration

Initialize Database & Start Server

📖 Usage

Python SDK

TypeScript SDK

REST API

🚦 Roadmap

🤝 Contributing

📄 License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages