Skip to content

Add LLM-as-judge task output scorer (#94/#59 foundation)#226

Open
neoneye wants to merge 1 commit intomainfrom
feature/94-task-output-scorer
Open

Add LLM-as-judge task output scorer (#94/#59 foundation)#226
neoneye wants to merge 1 commit intomainfrom
feature/94-task-output-scorer

Conversation

@neoneye
Copy link
Member

@neoneye neoneye commented Mar 9, 2026

Summary

Test plan

  • ast.parse() all new files
  • Verify imports succeed in project venv
  • derive_task_name() correctly parses task filenames
  • DEFAULT_WEIGHTS sum to 1.0
  • Manual test with a real LLM and run directory

🤖 Generated with Claude Code

Foundation for autonomous prompt optimization (#94) and A/B testing
promotion (#59). Scores pipeline task outputs against a 5-dimension
rubric (Specificity, Actionability, Completeness, Internal Consistency,
Conciseness) using structured LLM output.

Includes CLI helper for scoring tasks from completed run directories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@neoneye neoneye deleted the branch main March 10, 2026 00:48
@neoneye neoneye closed this Mar 10, 2026
@neoneye neoneye reopened this Mar 10, 2026
@neoneye neoneye changed the base branch from feature/plan-resume-tool to main March 10, 2026 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant