Skip to content

feat(bench): add multi-dimensional RAG evaluation metrics to ragcsv#882

Merged
starpit merged 1 commit intoIBM:mainfrom
starpit:ragcsv-metrics
Feb 18, 2026
Merged

feat(bench): add multi-dimensional RAG evaluation metrics to ragcsv#882
starpit merged 1 commit intoIBM:mainfrom
starpit:ragcsv-metrics

Conversation

@starpit
Copy link
Copy Markdown
Member

@starpit starpit commented Feb 18, 2026

Summary

  • Add non-LLM string metrics (token F1, exact match, BLEU-1) computed at zero cost on every row
  • Add --metrics flag / RAGCSV_METRICS env var to selectively enable LLM-judge metrics (accuracy, faithfulness, relevancy, all)
  • Extract print_quantile_report helper and add a summary table to the output

Test plan

  • cargo check -p spnl-cli compiles cleanly
  • spnl bench ragcsv --help shows the new --metrics flag
  • Run with --metrics accuracy and verify only accuracy LLM-judge runs
  • Run with --metrics all and verify all 3 LLM-judge metrics + string metrics report
  • Run with --metrics faithfulness,relevancy and verify accuracy is skipped

🤖 Generated with Claude Code

Align ragcsv with RAG-Workbench methodology by adding non-LLM string
metrics (token F1, exact match, BLEU-1) and LLM-judge faithfulness &
relevancy evaluators alongside the existing accuracy metric. Metrics
are selectable via --metrics/RAGCSV_METRICS and LLM-judge calls run

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Nick Mitchell <nickm@us.ibm.com>
@starpit starpit merged commit fb8bdb3 into IBM:main Feb 18, 2026
55 of 61 checks passed
@starpit starpit deleted the ragcsv-metrics branch February 18, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant