Skip to content

Commit 41f4662

Browse files
committed
Add docs to readme and fixup pass rate logic
1 parent b2cb120 commit 41f4662

File tree

2 files changed

+20
-3
lines changed

2 files changed

+20
-3
lines changed

README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,20 @@ Run the extension with output from a command. This uses single-shot mode.
6666
cat README.md | gh models run openai/gpt-4o-mini "summarize this text"
6767
```
6868

69+
#### Evaluating prompts
70+
71+
Run evaluation tests against a model using a `.prompt.yml` file:
72+
```shell
73+
gh models eval my_prompt.prompt.yml
74+
```
75+
76+
The evaluation will run test cases defined in the prompt file and display results in a human-readable format. For programmatic use, you can output results in JSON format:
77+
```shell
78+
gh models eval my_prompt.prompt.yml --json
79+
```
80+
81+
The JSON output includes detailed test results, evaluation scores, and summary statistics that can be processed by other tools or CI/CD pipelines.
82+
6983
## Notice
7084

7185
Remember when interacting with a model you are experimenting with AI, so content mistakes are possible. The feature is

cmd/eval/eval.go

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,9 @@ func NewEvalCommand(cfg *command.Config) *cobra.Command {
7676
string:
7777
contains: "hello"
7878
79+
By default, results are displayed in a human-readable format. Use the --json flag
80+
to output structured JSON data for programmatic use or integration with CI/CD pipelines.
81+
7982
See https://docs.github.com/github-models/use-github-models/storing-prompts-in-github-repositories#supported-file-format for more information.
8083
`),
8184
Example: "gh models eval my_prompt.prompt.yml",
@@ -172,7 +175,7 @@ func (h *evalCommandHandler) runEvaluation(ctx context.Context) error {
172175
}
173176

174177
// Calculate pass rate
175-
passRate := 0.0
178+
passRate := 100.0
176179
if totalTests > 0 {
177180
passRate = float64(passedTests) / float64(totalTests) * 100
178181
}
@@ -238,9 +241,9 @@ func (h *evalCommandHandler) printSummary(passedTests, totalTests int, passRate
238241
// Summary
239242
h.cfg.WriteToOut("Evaluation Summary:\n")
240243
if totalTests == 0 {
241-
h.cfg.WriteToOut("Passed: 0/0 (0.0%)\n")
244+
h.cfg.WriteToOut("Passed: 0/0 (0.00%)\n")
242245
} else {
243-
h.cfg.WriteToOut(fmt.Sprintf("Passed: %d/%d (%.1f%%)\n",
246+
h.cfg.WriteToOut(fmt.Sprintf("Passed: %d/%d (%.2f%%)\n",
244247
passedTests, totalTests, passRate))
245248
}
246249

0 commit comments

Comments
 (0)