Skip to content

chore: improve reward eval harness auditing#79

Merged
jbarnes850 merged 1 commit intomainfrom
chore/reward-eval-audit-config
Oct 24, 2025
Merged

chore: improve reward eval harness auditing#79
jbarnes850 merged 1 commit intomainfrom
chore/reward-eval-audit-config

Conversation

@jbarnes850
Copy link
Copy Markdown
Contributor

Summary

  • add configs/eval/reward_system.yaml so judge presets/combos can be edited without touching code, falling back to baked-in defaults if missing
  • extend scripts/eval_reward_models.py with a --collect-audit flag, safe audit payload serialization, and automatic Markdown report emission alongside JSON output
  • document the new flags and config-driven workflow in docs/reward_eval.md

Testing

  • python -m scripts.eval_reward_models --dataset atlas/data/reward_eval_trajectories.jsonl --judge-combos gemini_pair --baseline gemini_pair --repeats 1 --concurrency 1 --collect-audit --output results/reward/latest_gemini.json

Copilot AI review requested due to automatic review settings October 24, 2025 02:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces configuration-driven judge presets and audit logging for the reward evaluation harness. It moves judge configuration from hardcoded constants into an external YAML file, adds a --collect-audit flag to capture LLM interactions for debugging, and automatically generates Markdown summary reports alongside JSON output.

Key Changes:

  • Externalized judge presets and combos to configs/eval/reward_system.yaml with fallback to defaults
  • Added audit collection capability to track LLM prompts/responses during evaluation
  • Implemented automatic Markdown report generation with timestamp-based naming

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File Description
scripts/eval_reward_models.py Added YAML config loading, audit collection in evaluator, JSON sanitization helper, and Markdown report generation
configs/eval/reward_system.yaml New configuration file defining judge presets and combos for reward evaluation
docs/reward_eval.md Updated documentation to reference config-first workflow and new CLI flags

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread scripts/eval_reward_models.py Outdated
Comment thread scripts/eval_reward_models.py
Comment thread scripts/eval_reward_models.py Outdated
@jbarnes850 jbarnes850 self-assigned this Oct 24, 2025
@jbarnes850 jbarnes850 added the enhancement New feature or request label Oct 24, 2025
@jbarnes850 jbarnes850 force-pushed the chore/reward-eval-audit-config branch from 060f994 to 6801278 Compare October 24, 2025 02:35
@jbarnes850 jbarnes850 merged commit c4b7bae into main Oct 24, 2025
1 check passed
@jbarnes850 jbarnes850 deleted the chore/reward-eval-audit-config branch October 24, 2025 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Development

Successfully merging this pull request may close these issues.

2 participants