Evaluation Review

A browser-only review workbench for LLM information-extraction evaluation results.

The app lets a reviewer upload one evaluation .zip, inspect summary metrics and dialogue-level gold/prediction comparisons, annotate each dialogue, and export the annotations as JSONL. The zip is parsed in the browser; no files are uploaded to a server.

Expected Artifact

The uploaded zip should include these files, either at the zip root or inside one common directory:

event_eval_summary.json
row_audit_report.jsonl
event_eval_details.jsonl
one prediction .jsonl
optional *.failures.jsonl

The prediction JSONL supplies dialogue text and model-predicted events. Gold, prediction, and match details are read from the audit reports.

To keep the browser responsive, the app rejects artifacts larger than 100 MB, archives with more than 500 entries, and any parsed text file larger than 50 MB.

Local Development

Use Bun for local commands:

bun install
bun run dev

Run checks before shipping changes:

bun run test
bun run build

Vercel

This is a static Vite app and can be deployed on Vercel with the default Vite settings:

Install command: bun install
Build command: bun run build
Output directory: dist

No backend, database, or environment variables are required.

Annotation Export

Reviewer annotations are saved in browser localStorage per artifact name. The export button downloads a JSONL file containing one record per non-empty annotation, ordered by row_index and dialogue_id.

Each exported record includes:

artifact and dialogue identifiers
review status and review note
row-level gold/prediction/match counts when available
export timestamp

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs/superpowers		docs/superpowers
public		public
src		src
.gitignore		.gitignore
README.md		README.md
bun.lockb		bun.lockb
index.html		index.html
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation Review

Expected Artifact

Local Development

Vercel

Annotation Export

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Evaluation Review

Expected Artifact

Local Development

Vercel

Annotation Export

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages