Background
Issue #1207 was implemented with a new `shell` grader type. On review, this violates the design principle of auditing existing primitives first — `code-grader` already runs shell commands. The `shell` type is just `code-grader` with a simpler output protocol.
Promptfoo's equivalent (`javascript`/`python` assertions) solves this by having a graceful fallback: if a script returns a plain boolean, number, or string instead of the full structured result, it's accepted. They do not have a `shell` type.
Problem with current code-grader
`code-grader` currently requires stdout to be a JSON object with `{ score, assertions }`. Two behaviors make simple one-liners unusable:
- Empty stdout + exit 0 → score 0 (wrong; should be score 1)
- Non-zero exit code → throws as execution error (should be score 0 / fail)
Desired Behavior
Extend `code-grader` with a plain-text fallback: when stdout is not valid JSON, interpret it as:
| stdout (trimmed, case-insensitive) |
score |
| empty |
1 (exit-code convention: 0 = pass) |
| `"true"`, `"pass"`, `"1"` |
1 |
| `"false"`, `"fail"`, `"0"` |
0 |
| numeric string |
clamped float score |
| anything else |
0 |
Non-zero exit with non-JSON stdout → score 0 (fail), not execution error.
Outcome
Simple workspace checks become concise without a new grader type:
```yaml
-
type: code-grader
command: ["bash", "-c", "[ $(pdfinfo report.pdf | grep Pages | awk '{print $2}') -ge 5 ]"]
-
type: code-grader
command: ["bash", "-c", "echo $(wc -l < output.txt)"] # stdout numeric → score
```
Related
Background
Issue #1207 was implemented with a new `shell` grader type. On review, this violates the design principle of auditing existing primitives first — `code-grader` already runs shell commands. The `shell` type is just `code-grader` with a simpler output protocol.
Promptfoo's equivalent (`javascript`/`python` assertions) solves this by having a graceful fallback: if a script returns a plain boolean, number, or string instead of the full structured result, it's accepted. They do not have a `shell` type.
Problem with current code-grader
`code-grader` currently requires stdout to be a JSON object with `{ score, assertions }`. Two behaviors make simple one-liners unusable:
Desired Behavior
Extend `code-grader` with a plain-text fallback: when stdout is not valid JSON, interpret it as:
Non-zero exit with non-JSON stdout → score 0 (fail), not execution error.
Outcome
Simple workspace checks become concise without a new grader type:
```yaml
type: code-grader
command: ["bash", "-c", "[ $(pdfinfo report.pdf | grep Pages | awk '{print $2}') -ge 5 ]"]
type: code-grader
command: ["bash", "-c", "echo $(wc -l < output.txt)"] # stdout numeric → score
```
Related