LLM Eval Studio

Interactive LLM Output Quality Evaluation Dashboard

Paste any LLM output alongside its source text. Get instant quality scores across 5 dimensions, visualized on a radar chart with claim-level hallucination analysis.

Shipping LLMs without evaluation guardrails is flying blind. LLM Eval Studio gives you multi-dimensional quality scoring for any LLM output -- single evaluations, side-by-side comparisons, or batch CSV processing.

Features

Single Evaluation Mode

Two side-by-side text areas: Source / Ground Truth and LLM Output
Click "Evaluate" to score the output across 5 quality dimensions
Results panel with:
- Radar chart showing all 5 metrics at a glance
- Overall score (large number, color-coded: green > 80, yellow > 60, red < 60)
- Per-metric cards with score, explanation, and tooltip
- Claim-level breakdown: each claim in the output marked as Supported or Unsupported

Comparison Mode

Add a third text area for LLM Output B
Side-by-side radar charts comparing Output A vs. Output B
Winner highlighted for each metric
"Which output is better?" summary with explanation

Batch Evaluation Mode

Upload a CSV file with source and output columns
Progress bar during evaluation
Results include:
- Summary statistics (mean, min, max per metric)
- Sortable table of all evaluations
- Score distribution bar charts
- Average metrics chart
Download Report button -- exports CSV with all scores + JSON with detailed breakdowns

Pre-loaded Examples

Three built-in examples to try instantly:

Example	Demonstrates
Good Output	High scores across all dimensions
Hallucinated Output	Low faithfulness score -- catches fabricated claims
Incomplete Output	Low completeness score -- detects missing information

Evaluation History

All evaluations stored in localStorage
Sliding history panel to revisit past evaluations
Click any entry to re-view its full results
"Clear history" button

Evaluation Metrics

Metric	Method	What It Measures
ROUGE-L	Algorithmic (LCS F1)	Longest common subsequence overlap between source and output
Semantic Similarity	Gemini Embeddings + Cosine Similarity	How closely the output's meaning matches the source
Accuracy	LLM-as-Judge (Gemini 2.5 Flash, 1--10)	Factual correctness relative to the source
Completeness	LLM-as-Judge (Gemini 2.5 Flash, 1--10)	Coverage of key information from the source
Faithfulness	LLM-as-Judge (Gemini 2.5 Flash, 1--10)	Absence of hallucinated or fabricated claims

All scores are normalized to 0--100 for the radar chart and overall score.

Architecture

                    +---------------------+
                    |   LLM Eval Studio   |
                    |      Frontend       |
                    +----------+----------+
                               |
            +------------------+------------------+
            |                  |                  |
            v                  v                  v
     /api/evaluate      /api/batch        /api/embeddings
     (Single eval)      (CSV batch)       (Vector embeddings)
            |                  |                  |
            +--------+---------+------------------+
                     |
          +----------+----------+
          |                     |
          v                     v
   ROUGE-L (local)     Google Gemini API
   LCS algorithm       - Embeddings (cosine similarity)
                       - LLM-as-Judge (accuracy,
                         completeness, faithfulness)

Tech Stack

Layer	Technology
Framework	Next.js 16 (App Router, Server Components)
Language	TypeScript 5
Styling	Tailwind CSS 4 (light theme, dashboard aesthetic)
LLM	Google Gemini 2.5 Flash (LLM-as-Judge)
Embeddings	Gemini Embedding Model (cosine similarity)
Charts	Recharts (radar, bar, distribution)
CSV Parsing	PapaParse
Animations	Framer Motion
Icons	Lucide React
Storage	localStorage (evaluation history)

Getting Started

Prerequisites

Node.js 18 or later
A Google Gemini API key (free tier works)

Installation

git clone https://github.com/Samarth0211/LLMEvalStudio.git
cd LLMEvalStudio
npm install

Environment Variables

Create a .env.local file in the project root:

GOOGLE_API_KEY=your_gemini_api_key_here

Run Development Server

npm run dev

Open http://localhost:3000 in your browser.

Production Build

npm run build
npm start

Project Structure

LLMEvalStudio/
  src/
    app/
      page.tsx                # Main evaluation page (single + comparison)
      batch/page.tsx          # Batch evaluation page
      layout.tsx              # Root layout with navbar
      api/
        evaluate/route.ts     # Single evaluation (ROUGE-L + embeddings + LLM judge)
        batch/route.ts        # Batch CSV evaluation
        embeddings/route.ts   # Gemini embeddings endpoint
    components/
      Navbar.tsx              # Navigation with mode switching
      RadarChart.tsx          # Recharts radar visualization
      ScoreCard.tsx           # Individual metric card with tooltip
      OverallScore.tsx        # Circular progress score display
      ClaimsList.tsx          # Claim-level Supported/Unsupported analysis
      HistoryPanel.tsx        # Sliding evaluation history sidebar
      ExampleButton.tsx       # Pre-loaded examples dropdown
    lib/
      types.ts                # TypeScript type definitions
      examples.ts             # Pre-loaded example data
      history.ts              # localStorage history management
      utils.ts                # Utility functions
  .env.local                  # Environment variables (not committed)
  package.json

API Routes

Endpoint	Method	Description
`/api/evaluate`	POST	Evaluate a single source/output pair across all 5 metrics
`/api/batch`	POST	Evaluate an array of source/output pairs with summary stats
`/api/embeddings`	POST	Generate Gemini embeddings for an array of texts

Example: Single Evaluation Request

curl -X POST http://localhost:3000/api/evaluate \
  -H "Content-Type: application/json" \
  -d '{
    "source": "The capital of France is Paris.",
    "output": "Paris is the capital city of France, located in Europe."
  }'

Deployment

Deployed on Vercel. To deploy your own instance:

Fork this repo
Import it into Vercel
Add GOOGLE_API_KEY as an environment variable in Vercel project settings
Deploy

Author

Samarth Bhamare -- AI/ML Engineer

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
public		public
src		src
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Eval Studio

Interactive LLM Output Quality Evaluation Dashboard

Features

Single Evaluation Mode

Comparison Mode

Batch Evaluation Mode

Pre-loaded Examples

Evaluation History

Evaluation Metrics

Architecture

Tech Stack

Getting Started

Prerequisites

Installation

Environment Variables

Run Development Server

Production Build

Project Structure

API Routes

Example: Single Evaluation Request

Deployment

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Eval Studio

Interactive LLM Output Quality Evaluation Dashboard

Features

Single Evaluation Mode

Comparison Mode

Batch Evaluation Mode

Pre-loaded Examples

Evaluation History

Evaluation Metrics

Architecture

Tech Stack

Getting Started

Prerequisites

Installation

Environment Variables

Run Development Server

Production Build

Project Structure

API Routes

Example: Single Evaluation Request

Deployment

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages