Closed
Description
The current file, in evals/readme.md, when you give the script 1 prompt and 4 models under test, doesn't correctly report the compliance rate for the MUTs:
# Eval summary
## Test Results
- % represent compliance rate
|prompt|rules|rules grounded|tests|gpt-4o-mini-2024-07-18|gemma2:9b|qwen2.5:3b|llama3.2:1b|
|-|-|-|-|-|-|-|-|
|speech\-tag|8|7|24|\-\-|\-\-|\-\-|\-\-|