Skip to content

feat: extend code-grader to accept plain-text and exit-code output #1210

@christso

Description

@christso

Background

Issue #1207 was implemented with a new `shell` grader type. On review, this violates the design principle of auditing existing primitives first — `code-grader` already runs shell commands. The `shell` type is just `code-grader` with a simpler output protocol.

Promptfoo's equivalent (`javascript`/`python` assertions) solves this by having a graceful fallback: if a script returns a plain boolean, number, or string instead of the full structured result, it's accepted. They do not have a `shell` type.

Problem with current code-grader

`code-grader` currently requires stdout to be a JSON object with `{ score, assertions }`. Two behaviors make simple one-liners unusable:

  1. Empty stdout + exit 0 → score 0 (wrong; should be score 1)
  2. Non-zero exit code → throws as execution error (should be score 0 / fail)

Desired Behavior

Extend `code-grader` with a plain-text fallback: when stdout is not valid JSON, interpret it as:

stdout (trimmed, case-insensitive) score
empty 1 (exit-code convention: 0 = pass)
`"true"`, `"pass"`, `"1"` 1
`"false"`, `"fail"`, `"0"` 0
numeric string clamped float score
anything else 0

Non-zero exit with non-JSON stdout → score 0 (fail), not execution error.

Outcome

Simple workspace checks become concise without a new grader type:

```yaml

  • type: code-grader
    command: ["bash", "-c", "[ $(pdfinfo report.pdf | grep Pages | awk '{print $2}') -ge 5 ]"]

  • type: code-grader
    command: ["bash", "-c", "echo $(wc -l < output.txt)"] # stdout numeric → score
    ```

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions