Skip to content

TheCoder2010-create/CEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CEval - Continuous Evaluation Platform for LLMs

Python 3.10+ License: MIT

CEval is a production-grade continuous evaluation platform for Large Language Models (LLMs). Monitor and evaluate LLM outputs in real-time across multiple critical dimensions.

πŸš€ Features

  • Multi-Metric Evaluation: Hallucination detection, toxicity scoring, compliance checks, latency tracking, and domain-specific accuracy.
  • Drift Detection: Statistical anomaly detection with configurable alerts.
  • Multi-Provider Support: Works with OpenAI, Anthropic, Google Gemini via LiteLLM.
  • Multiple SDKs: Native support for both Python and TypeScript/JavaScript.
  • Real-Time Alerts: Slack, email, and webhook notifications.
  • REST API: Easy integration with any programming language.
  • CLI Tool: Rich terminal interface for administration and local testing.

✨ New: The LLM Gateway

To guarantee 100% evaluation coverage and simplify development, CEval now includes a powerful LLM Gateway.

The gateway acts as a central proxy for all LLM calls. Instead of your apps calling OpenAI or other providers directly, they call the gateway. The gateway forwards the request, gets the response, and automatically triggers a CEval evaluation in the background before returning the result to your app.

This provides:

  • Automatic, guaranteed evaluation for all traffic.
  • Centralized API key management.
  • Unified API for multiple LLM backends.

For a detailed explanation of the architecture and benefits, see the LLM Gateway Documentation.

πŸ“ Project Structure

The project is now organized into a backend application, a gateway, and two separate SDKs.

ceval/
β”œβ”€β”€ backend/           # Main evaluation application (FastAPI Server, CLI)
β”‚   β”œβ”€β”€ pyproject.toml
β”‚   └── main.py
β”œβ”€β”€ gateway/           # Centralized LLM Gateway (FastAPI Proxy)
β”‚   β”œβ”€β”€ pyproject.toml
β”‚   └── main.py
β”œβ”€β”€ sdk/               # Core Python SDK
β”‚   β”œβ”€β”€ ceval/
β”‚   └── pyproject.toml
β”œβ”€β”€ typescript-sdk/    # TypeScript SDK for Node.js/Browsers
β”‚   β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ package.json
β”‚   └── tsconfig.json
β”œβ”€β”€ docs/              # Documentation
└── README.md

πŸ› οΈ Quick Start

Prerequisites

  • Python 3.10+ and Poetry
  • Node.js and npm (for the TypeScript SDK)
  • PostgreSQL (with TimescaleDB extension recommended)
  • Redis (for async job queue)

Installation

The backend is configured to use the sdk as a local, editable package.

# This installs the backend's dependencies AND the local Python SDK
cd backend
poetry install

Configuration

From the backend directory:

cp .env.example .env
# Edit .env with your database and API key configuration

Initialize Database & Start Server

Ensure you are in the backend directory.

# Initialize the database schema
poetry run ceval init

# Start the API server
poetry run ceval serve

The API will be available at http://localhost:8000.

πŸ“– Usage

You can interact with CEval through the REST API or by using one of the dedicated SDKs.

Python SDK

The ceval Python SDK provides a direct and efficient way to integrate the evaluation logic into your Python applications.

from ceval.orchestrator import EvaluationOrchestrator
from ceval.db import SessionLocal

db = SessionLocal()
orchestrator = EvaluationOrchestrator(db)

eval_run = await orchestrator.evaluate(
    prompt="What is compound interest?",
    response="Compound interest is interest calculated on principal and accumulated interest.",
    model_name="gpt-4o",
    domain="finance",
)

for metric in eval_run.metrics:
    print(f"{metric.metric_name}: {metric.score:.3f}")

TypeScript SDK

For JavaScript and TypeScript environments, you can use the ceval-sdk-ts package.

Installation & Build

cd typescript-sdk
npm install
npm run build

Usage The client provides a clean, type-safe interface to the REST API.

import { CevalClient, EvaluationParams } from './typescript-sdk/dist';

// Initialize the client with the base URL of your CEval server
const ceval = new CevalClient('http://localhost:8000');

async function runExample() {
  const params: EvaluationParams = {
    prompt: "What is the capital of France?",
    response: "The capital of France is Paris.",
    model_name: "gpt-4o",
    domain: "general"
  };

  const submission = await ceval.evaluate(params);
  const results = await ceval.getEvaluationResult(submission.run_id);

  console.log("--- Evaluation Complete ---");
  for (const metric of results.metrics) {
    console.log(`  - ${metric.metric_name}: ${metric.score.toFixed(2)} (Passed: ${metric.passed})`);
  }
}

runExample();

REST API

You can also call the REST API directly from any language.

curl -X POST http://localhost:8000/api/v1/evaluate \\\
  -H "Content-Type: application/json" \\\
  -d '{
    "prompt": "What is ML?",
    "response": "Machine learning is...",
    "model_name": "gpt-4o",
    "domain": "general"
  }'

🚦 Roadmap

  • Core evaluation engine
  • Python SDK
  • TypeScript SDK
  • REST API & CLI
  • React dashboard
  • Celery integration for async jobs
  • Multi-tenancy support

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

MIT License - see LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors