Update readme.md in evals directory when processing multiple prompt inputs with paper script

The current file, in evals/readme.md, when you give the script 1 prompt and 4 models under test, doesn't correctly report the compliance rate for the MUTs:

```
# Eval summary
  
## Test Results

- % represent compliance rate

|prompt|rules|rules grounded|tests|gpt-4o-mini-2024-07-18|gemma2:9b|qwen2.5:3b|llama3.2:1b|
|-|-|-|-|-|-|-|-|
|speech\-tag|8|7|24|\-\-|\-\-|\-\-|\-\-|
```