Skip to content

Using paper script overview.csv and readme.md contains a couple of issues #120

Closed
@bzorn

Description

@bzorn

When baseline tests requested, overview does not report the compliance rate of baseline tests, but instead shows "--".
Also, 2 rows reported per model with 2nd row with 0 tests when no scenarios specified.

With speech-tag sample, overview.csv contains:

model,scenario,errors,tests,tests compliant,tests compliance unknown,baseline compliant,tests positive,tests positive compliant,tests negative,tests negative compliant,baseline,tests valid,tests valid compliant
gpt-4o-mini-2024-07-18,,0,24,100%,0%,--,24,24,0,24,0,24,24
gpt-4o-mini-2024-07-18,,0,0,--,--,100%,0,0,0,0,24,0,0
gemma2:9b,,0,24,96%,0%,--,24,23,0,23,0,24,23
gemma2:9b,,0,0,--,--,100%,0,0,0,0,24,0,0
qwen2.5:3b,,0,24,92%,0%,--,24,22,0,22,0,24,22
qwen2.5:3b,,0,0,--,--,92%,0,0,0,0,24,0,0
llama3.2:1b,,0,24,25%,0%,--,24,6,0,6,0,24,6
llama3.2:1b,,0,0,--,--,46%,0,0,0,0,24,0,0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions