Built an open source LLM evaluation framework on top of LiteLLM #29902

vignesh2027 · 2026-06-08T05:39:03Z

vignesh2027
Jun 8, 2026

Hey LiteLLM community!

LiteLLM's unified API is the backbone of this project, so wanted to share it here first.

I built an open source LLM Evaluation Framework that uses LiteLLM's acompletion to benchmark any model in parallel across 5 metrics:

What it does:

Accuracy (4-strategy cascade: exact, normalized, MC letter, fuzzy)
Latency p50/p95/p99 via time.perf_counter() around each acompletion call
Cost per 1K tokens from real token counts in LiteLLM responses
Hallucination Rate (linguistic signal analysis, runs locally)
Reasoning Quality (chain-of-thought depth score 1-10)

How it uses LiteLLM:
Since LiteLLM gives a unified interface, one command benchmarks any provider:

from litellm import acompletion
resp = await acompletion(model="gpt-4o-mini", messages=[...])
# cost = resp.usage.prompt_tokens * price_in + resp.usage.completion_tokens * price_out

One benchmark run compares GPT-4o-mini vs Gemini Flash vs Claude Haiku with zero config changes.

Results from 100 prompts:
GPT-4o scored 88.2% at $0.008/1K. Gemini Flash scored 76.8% at $0.0001/1K. 80x cost difference for 11% accuracy gap.

Live demo (no API key): https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

71 tests, 82% coverage, full CI/CD. Feedback welcome, especially on LiteLLM integration patterns!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Built an open source LLM evaluation framework on top of LiteLLM #29902

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Built an open source LLM evaluation framework on top of LiteLLM #29902

Uh oh!

vignesh2027 Jun 8, 2026

Replies: 0 comments

vignesh2027
Jun 8, 2026