feat(bench): add multi-dimensional RAG evaluation metrics to ragcsv by starpit · Pull Request #882 · IBM/spnl

starpit · 2026-02-18T20:59:11Z

Summary

Add non-LLM string metrics (token F1, exact match, BLEU-1) computed at zero cost on every row
Add --metrics flag / RAGCSV_METRICS env var to selectively enable LLM-judge metrics (accuracy, faithfulness, relevancy, all)
Extract print_quantile_report helper and add a summary table to the output

Test plan

cargo check -p spnl-cli compiles cleanly
spnl bench ragcsv --help shows the new --metrics flag
Run with --metrics accuracy and verify only accuracy LLM-judge runs
Run with --metrics all and verify all 3 LLM-judge metrics + string metrics report
Run with --metrics faithfulness,relevancy and verify accuracy is skipped

🤖 Generated with Claude Code

Align ragcsv with RAG-Workbench methodology by adding non-LLM string metrics (token F1, exact match, BLEU-1) and LLM-judge faithfulness & relevancy evaluators alongside the existing accuracy metric. Metrics are selectable via --metrics/RAGCSV_METRICS and LLM-judge calls run Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Nick Mitchell <nickm@us.ibm.com>

starpit added the made with opus4.6 label Feb 18, 2026

starpit merged commit fb8bdb3 into IBM:main Feb 18, 2026
55 of 61 checks passed

starpit deleted the ragcsv-metrics branch February 18, 2026 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): add multi-dimensional RAG evaluation metrics to ragcsv#882

feat(bench): add multi-dimensional RAG evaluation metrics to ragcsv#882
starpit merged 1 commit intoIBM:mainfrom
starpit:ragcsv-metrics

starpit commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

starpit commented Feb 18, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant