Skip to content

Eval score gate: block normal tasks until eval score reaches 80/100 #100

@fazxes

Description

@fazxes

Problem

Eval fix tasks (#97-0102) exist but never get picked because lower-numbered tasks always win in the queue. The E2E eval score is 66/100 — nightshift doesn't reliably work against real repos. Same deprioritization pattern as the release problem.

Fix

Add an eval score gate to docs/prompt/evolve-auto.md:

EVAL SCORE GATE: After running Step 0 evaluation, check the score. If the latest evaluation in docs/evaluations/ scored below 80/100, you MUST pick an eval-related task (any task created by an evaluation report, or any task that would improve eval dimensions) before any other normal-priority task. The product doesn't work in production until the eval score proves it does. This gate overrides the lowest-number-first rule for normal tasks.

Acceptance Criteria

  • Rule added to evolve-auto.md
  • Agent picks eval-related tasks when score < 80/100
  • Eval score trends upward across sessions
  • All docs updated

Metadata

Metadata

Assignees

No one assigned

    Labels

    taskHuman task for daemon to pick upurgentUrgent priority

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions